Polars ShapeError: Length Mismatch - How to Fix It
Answer
This Polars error means you're trying to combine or assign data with different row counts. Fix it by ensuring all columns or expressions in an operation have the same length, or use appropriate join/concat operations instead of direct assignment.
Why This Happens
Polars requires all columns in a DataFrame to have identical lengths. When you try to add a column with fewer or more rows than the existing DataFrame, or combine expressions that produce different-length results, Polars raises ShapeError. Unlike pandas which might silently align by index, Polars is strict about shape consistency.
Solution
The rule: all columns in a DataFrame must have the same length. Use with_columns() for same-length additions, concat() for stacking, and join() for combining different-length data by key.
import polars as pl
df = pl.DataFrame({
'a': [1, 2, 3, 4, 5],
'b': [10, 20, 30, 40, 50]
})
# โ Problematic: assigning column with wrong length
new_col = pl.Series('c', [100, 200, 300]) # 3 elements, df has 5
df.with_columns(new_col)
# ShapeError: length mismatch: expected 5, got 3
# โ
Fixed: ensure same length
new_col = pl.Series('c', [100, 200, 300, 400, 500])
df.with_columns(new_col)
# โ Problematic: filtered result assigned back
filtered = df.filter(pl.col('a') > 2) # 3 rows
df.with_columns(filtered.select('a').alias('filtered_a'))
# ShapeError: can't add column with different length
# โ
Fixed: use join instead of direct assignment
filtered = df.filter(pl.col('a') > 2).with_columns(
pl.col('a').alias('filtered_a')
)
# Work with filtered DataFrame separately
# โ
Fixed: use when/then/otherwise for conditional column
df.with_columns(
pl.when(pl.col('a') > 2)
.then(pl.col('a'))
.otherwise(None)
.alias('filtered_a')
)
# โ Problematic: aggregation mixed with row-level data
df.select([
pl.col('a'),
pl.col('b').sum() # single value, not 5 values
])
# ShapeError: length mismatch
# โ
Fixed: broadcast aggregation with over() or separate query
df.with_columns(
pl.col('b').sum().alias('total_b') # broadcasts to all rows
)
# โ
Debug: check lengths before combining
print(f"DataFrame rows: {df.height}")
print(f"New column length: {len(new_col)}")
# โ
Concatenate DataFrames of different lengths
df1 = pl.DataFrame({'x': [1, 2]})
df2 = pl.DataFrame({'x': [3, 4, 5]})
pl.concat([df1, df2]) # vertical concat works fineBetter Workflow
In Zerve, each block displays its output shape. Instantly spot where row counts change without adding print statements everywhere. Isolate each filter, join, or groupby in its own block. Re-run individual steps without re-executing the entire pipeline. The visual DAG reveals exactly where row counts diverge. Trace data lineage from source to error in seconds. What takes 30 minutes of debugging in a notebook takes 2 minutes in Zerve.
)
&w=1200&q=75)
&w=1200&q=75)
&w=1200&q=75)