Repository

Looks good to me!

User Tools

Site Tools


Action disabled: diff
kb:lang:scripting:python:polars:start

Polars

Changelog

  • 2025-12-07: Init

Guides:

Nice comparison between Pandas and Polars: https://kevinheavey.github.io/modern-polars/

Context

Introduction

(as someone familiar with numpy, and has prior tests with pandas)

import polars as pl

DataFrames are the same as in Pandas: a 2D array of tabular data, each column associated with a name.

  • Create with pl.DataFrame, with any dictionary mapping or 2D array.
  • Get overview of dataframe with .glimpse() (dense preview) or .describe() (statistics)
  • Note that data frames do not have an index like in Pandas. To get a specific row, use .row(). To iterate through rows, use .iter_rows().
  • To get the single scalar value from a dataframe, use .item().
>>> df.glimpse()
Rows: 1510
Columns: 3
$ datetime <datetime[μs]> 2025-12-05 04:56:28, 2025-12-05 04:56:28, 2025-12-05 04:56:29, ...
$ wl_nm             <f64> 1309.5, 1309.51, 1309.52, 1309.53, 1309.54, 1309.55, 1309.56, ...
$ voltage           <f64> 0.017, 0.017, 0.017, 0.017, 0.017, 0.018, 0.018, 0.019, 0.02, 0.02

>>> df.describe()
shape: (9, 4)
┌────────────┬────────────────────────────┬──────────┬──────────┐
│ statistic  ┆ datetime                   ┆ wl_nm    ┆ voltage  │
│ ---        ┆ ---                        ┆ ---      ┆ ---      │
│ str        ┆ str                        ┆ f64      ┆ f64      │
╞════════════╪════════════════════════════╪══════════╪══════════╡
│ count      ┆ 1510                       ┆ 1510.0   ┆ 1510.0   │
│ null_count ┆ 0                          ┆ 0.0      ┆ 0.0      │
│ mean       ┆ 2025-12-05 05:03:36.948344 ┆ 1310.25  ┆ 0.608575 │
│ std        ┆ null                       ┆ 0.436034 ┆ 0.64654  │
│ min        ┆ 2025-12-05 04:56:28        ┆ 1309.5   ┆ 0.015    │
│ 25%        ┆ 2025-12-05 05:00:04        ┆ 1309.87  ┆ 0.016    │
│ 50%        ┆ 2025-12-05 05:03:37        ┆ 1310.25  ┆ 0.162    │
│ 75%        ┆ 2025-12-05 05:07:11        ┆ 1310.63  ┆ 1.392    │
│ max        ┆ 2025-12-05 05:10:45        ┆ 1311.0   ┆ 1.53     │
└────────────┴────────────────────────────┴──────────┴──────────┘

Dataframes can be operated on by columns (use select() unless you want to preserve the original columns as well with with_columns()). Use alias() or keyword args to change the column name,

>>> df.select(
...     pl.col("wl_nm").alias("wl") * 1e-9,
...     volts=pl.col("voltage") - dark_current,
... )

Filtering rows with .filter():

>>> df.filter(
...     pl.col("voltage") > 0.02,
... )

Doing aggregations (with optional grouping). Since the select result may be out-of-order, doing a .sort() call after is generally useful. If custom expressions not required, a simple string label is generally sufficient.

>>> result = (
...     df.sort("wl_nm")
...     .group_by("wl_nm")
...     .agg(
...         pl.col("voltage").mean().name.suffix("_mean"),
...         pl.col("voltage").std().name.suffix("_stdev"),
...     )
... )

Note the extensive use of the fluent interface. Notice that most of the power comes from the column object in Polars, see docs for pl.col().

Lazy API is recommended! See docs here.

kb/lang/scripting/python/polars/start.txt · Last modified: 34 hours ago ( 7 December 2025) by justin