Perse Documentation#

Perse is an experimental Python package that merges essential functionalities from Pandas, Polars, and DuckDB into a unified DataFrame object. Perse simplifies data handling, manipulation, SQL queries, and visualization by combining features from these powerful libraries.

Introduction#

Perse combines some of the most-used functions from Pandas, Polars, and DuckDB to provide an efficient and versatile data manipulation experience. This package is currently experimental, with plans to expand its functionality.

Installation#

To install Perse, use pip:

pip install perse

Getting Started#

Here’s a quick example to get you started with Perse:

from perse import DataFrame
import numpy as np
import polars as pl

# Sample data generation
np.random.seed(42)
data = {
    "A": np.random.randint(0, 100, 10),
    "B": np.random.random(10),
    "C": np.random.choice(["X", "Y", "Z"], 10),
}


df = DataFrame(data)
shape: (10, 3)
┌─────┬──────────┬─────┐
│ A   ┆ B        ┆ C   │
│ --- ┆ ---      ┆ --- │
│ i64 ┆ f64      ┆ str │
╞═════╪══════════╪═════╡
│ 51  ┆ 0.866176 ┆ Y   │
│ 92  ┆ 0.601115 ┆ X   │
│ 14  ┆ 0.708073 ┆ X   │
│ 71  ┆ 0.020584 ┆ X   │
│ 60  ┆ 0.96991  ┆ Z   │
│ 20  ┆ 0.832443 ┆ Z   │
│ 82  ┆ 0.212339 ┆ Z   │
│ 86  ┆ 0.181825 ┆ Y   │
│ 74  ┆ 0.183405 ┆ Z   │
│ 74  ┆ 0.304242 ┆ Y   │
└─────┴──────────┴─────┘

Functionality Overview#

Data Manipulation#

These methods allow for common data manipulations like adding columns, filtering rows, and generating summary statistics. The methods in this group are essential for basic data handling.

Examples:

# Add a new column to the DataFrame
df.add_column("D", np.random.random(10))

# Filter rows where column "A" is greater than 50
df2 = df.filter_rows(df.dl["A"] > 50)

# Get a summary of the data using Pandas' describe method
print(df2.describe())

SQL Querying#

Leverage DuckDB to run SQL queries directly on the DataFrame. This feature allows advanced data manipulations using SQL syntax and enables filtering, aggregating, and joining data.

Example:

# Use DuckDB SQL to filter rows
result = df.query("SELECT * FROM this WHERE A > 50")
print(result)

Indexing and Selection#

Provides methods for accessing specific rows or columns using Pandas-like .loc and .iloc properties. Supports conditions and positional indexing.

Examples:

# Selecting rows where A > 50 using .loc
df2  = df.loc[df["A"] > 50, :]
print(df2)

# Display first few rows of the DataFrame
print(df2.head(3))

Visualization#

Create visualizations using Matplotlib. This includes scatter plots, bar charts, and more to help visualize data directly from the Perse DataFrame.

Examples:

# Scatter plot for columns "A" and "B"
df.plot(
    x="A",
    y="B",
    kind="scatter",
    title="Scatter Plot of A vs B",
    xlabel="A values",
    ylabel="B values",
)

# Bar plot for category "C" by values in column "A"
df.plot(kind="bar", x="C", y="A", title="Bar Plot by Category C")

API Reference#

DataFrame#

The core class in Perse that combines Polars, Pandas, and DuckDB functionality.

Attributes#

  • df: Returns the Pandas version of the DataFrame, converting from Polars as needed.

  • dl: The Polars version of the DataFrame.

  • locked: Prevents further modifications to the DataFrame until unlock is called.

Methods#

  • __init__(data): Initializes the DataFrame with data from a dictionary, file path, or existing DataFrame.

  • query(sql): Runs SQL on the DataFrame using DuckDB. Use “this” in the query to refer to the table.

  • add_column(name, values): Adds a new column to the DataFrame.

  • filter_rows(condition): Filters rows based on a given condition.

  • lock(): Locks the DataFrame to prevent modifications.

  • unlock(): Unlocks the DataFrame to allow modifications.

  • plot(kind, x, y): Plots data using Matplotlib.

Future Plans#

Perse is in early development, with plans to include:

  • Advanced SQL querying features.

  • More data manipulation functions inspired by Pandas and Polars.

  • Enhanced visualization options.

Contributing#

Contributions are welcome! If you have ideas or suggestions for improving Perse, please open an issue or submit a pull request.

License#

This project is licensed under the MIT License.