Polars

Polars ShapeError: Length Mismatch - How to Fix It

Answer

This Polars error means you're trying to combine or assign data with different row counts. Fix it by ensuring all columns or expressions in an operation have the same length, or use appropriate join/concat operations instead of direct assignment.

Why This Happens

Polars requires all columns in a DataFrame to have identical lengths. When you try to add a column with fewer or more rows than the existing DataFrame, or combine expressions that produce different-length results, Polars raises ShapeError. Unlike pandas which might silently align by index, Polars is strict about shape consistency.

Solution

The rule: all columns in a DataFrame must have the same length. Use with_columns() for same-length additions, concat() for stacking, and join() for combining different-length data by key.

import polars as pl

df = pl.DataFrame({
    'a': [1, 2, 3, 4, 5],
    'b': [10, 20, 30, 40, 50]
})

# ❌ Problematic: assigning column with wrong length
new_col = pl.Series('c', [100, 200, 300])  # 3 elements, df has 5
df.with_columns(new_col)
# ShapeError: length mismatch: expected 5, got 3

# ✅ Fixed: ensure same length
new_col = pl.Series('c', [100, 200, 300, 400, 500])
df.with_columns(new_col)

# ❌ Problematic: filtered result assigned back
filtered = df.filter(pl.col('a') > 2)  # 3 rows
df.with_columns(filtered.select('a').alias('filtered_a'))
# ShapeError: can't add column with different length

# ✅ Fixed: use join instead of direct assignment
filtered = df.filter(pl.col('a') > 2).with_columns(
    pl.col('a').alias('filtered_a')
)
# Work with filtered DataFrame separately

# ✅ Fixed: use when/then/otherwise for conditional column
df.with_columns(
    pl.when(pl.col('a') > 2)
    .then(pl.col('a'))
    .otherwise(None)
    .alias('filtered_a')
)

# ❌ Problematic: aggregation mixed with row-level data
df.select([
    pl.col('a'),
    pl.col('b').sum()  # single value, not 5 values
])
# ShapeError: length mismatch

# ✅ Fixed: broadcast aggregation with over() or separate query
df.with_columns(
    pl.col('b').sum().alias('total_b')  # broadcasts to all rows
)

# ✅ Debug: check lengths before combining
print(f"DataFrame rows: {df.height}")
print(f"New column length: {len(new_col)}")

# ✅ Concatenate DataFrames of different lengths
df1 = pl.DataFrame({'x': [1, 2]})
df2 = pl.DataFrame({'x': [3, 4, 5]})
pl.concat([df1, df2])  # vertical concat works fine

Better Workflow

In Zerve, each block displays its output shape. Instantly spot where row counts change without adding print statements everywhere. Isolate each filter, join, or groupby in its own block. Re-run individual steps without re-executing the entire pipeline. The visual DAG reveals exactly where row counts diverge. Trace data lineage from source to error in seconds. What takes 30 minutes of debugging in a notebook takes 2 minutes in Zerve.