Pandas

How to Concatenate Dataframes Vertically and Horizontally in Pandas

Answer

Use pd.concat([df1, df2]) for vertical stacking (adding rows) and pd.concat([df1, df2], axis=1) for horizontal stacking (adding columns). Use ignore_index=True to reset the index after concatenating.

Why This Happens

You often need to combine data from multiple sources — monthly files into one dataset (vertical), or joining feature sets side by side (horizontal). Getting the axis wrong or mishandling indexes leads to messy results or errors.

Solution

The rule: axis=0 (default) stacks vertically (more rows), axis=1 stacks horizontally (more columns). Always use ignore_index=True unless you need to preserve the original indexes.

import pandas as pd

df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
df2 = pd.DataFrame({'a': [5, 6], 'b': [7, 8]})

# ✅ Vertical concatenation (stack rows)
pd.concat([df1, df2])
# Result: 4 rows, 2 columns

# ✅ Reset index after vertical concat
pd.concat([df1, df2], ignore_index=True)

# ✅ Horizontal concatenation (stack columns)
pd.concat([df1, df2], axis=1)
# Result: 2 rows, 4 columns

# ✅ Handle mismatched columns (vertical)
df3 = pd.DataFrame({'a': [9, 10], 'c': [11, 12]})
pd.concat([df1, df3])  # missing columns become NaN

# ✅ Only keep matching columns
pd.concat([df1, df3], join='inner')

# ✅ Concatenate multiple dataframes at once
pd.concat([df1, df2, df3], ignore_index=True)

# ✅ Add identifier for source dataframe
pd.concat([df1, df2], keys=['first', 'second'])

Better Workflow

In Zerve, your source dataframes run as parallel blocks that feed into a single concat block — the canvas shows exactly which sources combine into your final dataset. The three data blocks execute simultaneously (not sequentially like traditional notebooks), and changes to any source automatically invalidate only the downstream blocks. No re-running your entire workflow to see the effect of a change.