Changelog
Guides:
Nice comparison between Pandas and Polars: https://kevinheavey.github.io/modern-polars/
(as someone familiar with numpy, and has prior tests with pandas)
import polars as pl
DataFrames are the same as in Pandas: a 2D array of tabular data, each column associated with a name.
pl.DataFrame, with any dictionary mapping or 2D array..glimpse() (dense preview) or .describe() (statistics).row(). To iterate through rows, use .iter_rows()..item().>>> df.glimpse() Rows: 1510 Columns: 3 $ datetime <datetime[μs]> 2025-12-05 04:56:28, 2025-12-05 04:56:28, 2025-12-05 04:56:29, ... $ wl_nm <f64> 1309.5, 1309.51, 1309.52, 1309.53, 1309.54, 1309.55, 1309.56, ... $ voltage <f64> 0.017, 0.017, 0.017, 0.017, 0.017, 0.018, 0.018, 0.019, 0.02, 0.02 >>> df.describe() shape: (9, 4) ┌────────────┬────────────────────────────┬──────────┬──────────┐ │ statistic ┆ datetime ┆ wl_nm ┆ voltage │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ str ┆ f64 ┆ f64 │ ╞════════════╪════════════════════════════╪══════════╪══════════╡ │ count ┆ 1510 ┆ 1510.0 ┆ 1510.0 │ │ null_count ┆ 0 ┆ 0.0 ┆ 0.0 │ │ mean ┆ 2025-12-05 05:03:36.948344 ┆ 1310.25 ┆ 0.608575 │ │ std ┆ null ┆ 0.436034 ┆ 0.64654 │ │ min ┆ 2025-12-05 04:56:28 ┆ 1309.5 ┆ 0.015 │ │ 25% ┆ 2025-12-05 05:00:04 ┆ 1309.87 ┆ 0.016 │ │ 50% ┆ 2025-12-05 05:03:37 ┆ 1310.25 ┆ 0.162 │ │ 75% ┆ 2025-12-05 05:07:11 ┆ 1310.63 ┆ 1.392 │ │ max ┆ 2025-12-05 05:10:45 ┆ 1311.0 ┆ 1.53 │ └────────────┴────────────────────────────┴──────────┴──────────┘
Dataframes can be operated on by columns (use select() unless you want to preserve the original columns as well with with_columns()). Use alias() or keyword args to change the column name,
>>> df.select(
... pl.col("wl_nm").alias("wl") * 1e-9,
... volts=pl.col("voltage") - dark_current,
... )
Filtering rows with .filter():
>>> df.filter(
... pl.col("voltage") > 0.02,
... )
Doing aggregations (with optional grouping). Since the select result may be out-of-order, doing a .sort() call after is generally useful. If custom expressions not required, a simple string label is generally sufficient.
>>> result = (
... df.sort("wl_nm")
... .group_by("wl_nm")
... .agg(
... pl.col("voltage").mean().name.suffix("_mean"),
... pl.col("voltage").std().name.suffix("_stdev"),
... )
... )
Note the extensive use of the fluent interface. Notice that most of the power comes from the column object in Polars, see docs for pl.col().
Lazy API is recommended! See docs here.