refactor: migrate all notebooks from pandas to polars #44

raulk · 2026-01-14T00:03:13Z

Summary

Migrates all data transformation logic from pandas to polars for improved performance and more expressive data manipulation.

Base branch: feat/network-overview (PR #43)

Changes

Add polars>=1.0 to dependencies
Migrate all 9 notebooks to use polars for data transformations
Keep pandas only where needed for plotly compatibility (.to_pandas())

Migration pattern

# Before (pandas)
df = load_parquet("dataset", target_date)
df_grouped = df.groupby("col").agg({"value": "sum"})

# After (polars)
df = pl.from_pandas(load_parquet("dataset", target_date))
df_grouped = df.group_by("col").agg(pl.col("value").sum())
fig = px.bar(df_grouped.to_pandas(), ...)  # Convert for plotly

Key polars patterns used

pandas	polars
`df.groupby().agg()`	`df.group_by().agg()`
`df[df["col"] > 0]`	`df.filter(pl.col("col") > 0)`
`df["col"] = ...`	`df.with_columns(...)`
`df.sort_values()`	`df.sort()`
`df.drop_duplicates()`	`df.unique()`
`df.fillna()`	`df.fill_null()`
`df["col"].map({...})`	`pl.when().then().otherwise()`
`df.merge()`	`df.join()`
`df.pivot()`	`df.pivot()`
`df.melt()`	`df.unpivot()`

Benefits

Performance: Polars is significantly faster than pandas for large datasets
Memory efficiency: Lazy evaluation and better memory management
Expressiveness: Chainable API with clear intent
Type safety: Better handling of nulls and type conversions

Test plan

Run just fetch <date> to fetch data
Run just render <date> to verify all notebooks render correctly
Verify visualizations display correctly in rendered HTML

Migrate data transformation logic from pandas to polars for improved performance and more expressive data manipulation. Changes: - Add polars>=1.0 to dependencies - Migrate all 9 notebooks to use polars for data transformations - Keep pandas only where needed for plotly compatibility (.to_pandas()) - Pattern: pl.from_pandas(load_parquet(...)) -> transform -> .to_pandas() Key polars patterns used: - group_by().agg() instead of groupby().agg() - filter(pl.col(...)) instead of df[df["col"]] - with_columns() instead of df["col"] = ... - sort() instead of sort_values() - unique() instead of drop_duplicates() - fill_null() instead of fillna() - pl.when().then().otherwise() instead of map()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: migrate all notebooks from pandas to polars #44

refactor: migrate all notebooks from pandas to polars #44

Uh oh!

raulk commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: migrate all notebooks from pandas to polars #44

Are you sure you want to change the base?

refactor: migrate all notebooks from pandas to polars #44

Uh oh!

Conversation

raulk commented Jan 14, 2026

Summary

Changes

Migration pattern

Key polars patterns used

Benefits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants