Notebooks made sense when only humans wrote the code. Today, code writes back.
The Synopsis
Zerve CPO, Greg Michaelson, sat down with Joel Grus, the author of Data Science from Scratch, and known hater of notebooks, to explore why classic notebooks frustrate teams, what continues to make them valuable, and how modern approaches are solving the problems that have persisted for years. The discussion focused on what notebooks get right, where they fail, and how new ways of thinking about execution and collaboration can make them reliable again.
Why Notebooks Won
Notebooks transformed data science by combining code, narrative, and results in one interactive space. They let data scientists experiment, visualize, and explain in a single flow. By mixing code with Markdown, notebooks made work readable, flexible, and transparent, revealing both the process and the outcome. Their simplicity and immediacy made them a core tool for teaching, research, and analysis.
Where Notebooks Fail
Despite their popularity, notebooks are fragile. Their flexibility makes them easy to use but easy to break. Running cells out of order causes inconsistent states and drifting variables, making results hard to reproduce and collaboration messy. Restarting and rerunning helps, but dependency and environment issues still create mismatched outputs. When shared, notebooks often fail because colleagues lack the right setup, data, or context. Dependency management and versioning continue to be major pain points.
What Still Works With Notebooks
Despite their flaws, notebooks remain one of the most intuitive environments for exploration. They are excellent for prototyping, teaching, and presenting analyses. The problem is not that notebooks are bad tools, but that they are often pushed beyond their natural limits.
Using notebooks as artifacts rather than live production systems keeps their strengths intact. They are best treated as records of discovery, combining code and insight for review or communication.
A Couple of Potential Workarounds for Overcoming Notebook Limitations
Reactive Execution
Some solutions track dependencies so that code changes force downstream code to execute. While this does keep results consistent, it can be irritating to use because of kicking off excessive compute when repeated code changes are being made.
Append Only Discipline
Another interesting open source project has played with “append only” coding. Which effectively deletes code after your edit point so that if you want to go back and edit your code, subsequent code is cleared or eliminated. This is a novel solution to the problem, but isn’t practical for most coders, in our opinion.
Some Remaining Objections
Determinism and Reproducibility
While setting random seeds, locking dependency versions, and storing configs reduces variance, many applications (like multi-GPU applications or LLM outputs) still have non-deterministic (i.e., non-reproducible) outputs. This isn’t so much of a failure of notebooks, but rather something to be aware of and plan for in your workflows.
Real Collaboration
Collaboration means shared context without chaos. Teams need clear data lineage, safe edits, and transparent dependencies to work together effectively. Very few tools in the stack allow real, synchronous collaboration within teams.
Multi Language Workflows
Teams often have experts in a variety of languages, being made up of a mix Python, SQL, and R coders. Very few environments support beyond the barest minimum interoperability between various languages. That means disjointed project files and disconnected team members.
From Artifact to Workflow Integration
The next step in notebook evolution is bridging the gap between exploration and deployment. Teams need to carry insights from an exploratory notebook into scheduled jobs, APIs, or production models without rewriting code or gluing scripts together. Tracking data lineage and execution history ensures that each run can be traced, validated, and reused.
How Zerve’s Notebook View Fits In
Zerve’s notebook view builds on the original idea of a notebook but adds structure where older tools fall short. Each cell clearly shows its dependencies and outputs, and when code runs, the outputs are stored and shared across collaborators. This keeps results consistent and eliminates the hidden state problem that classic notebooks face.
Key Takeaways from the Livestream
- Very little has changed in the notebook space since 2018, with surprisingly little innovation happening in data science coding.
- Hidden state is the core notebook problem, and reactive execution and dependency tracking address it, but introduce significant constraints on the user that many will find unacceptable.
- Developers have a variety of preferences, many of which are idiosyncratic and not related to productivity. Developers tend to prefer how they’ve worked in the past and there’s significant inertia to overcome when it comes to adopting new (and frequently better) technologies.
- There is still skepticism in the development community about the feasibility of productionizing notebooks.
- LLMs may reduce collaboration between developers, but will undoubtedly result in increased collaboration between agent and coder.
- LLMs are also changing the landscape of individual work and nothing will likely stop the impact of coding agents, but likely many improvements in UI/UX will improve the developer experience. This problem isn’t yet solved.
Watch the Replay Now
Watch the full conversation between Greg Michaelson and Joel Grus to see the discussion unfold, with examples of reactive execution, reproducible environments, and what the future of data science notebooks looks like.
FAQs (Frequently Asked Questions)
What are the main limitations of traditional notebooks in AI-first teams?
Traditional notebooks struggle with fragile and inconsistent states, making it difficult to maintain reproducibility and collaboration. Their flexible nature often leads to hidden state problems where code execution order affects results unpredictably.
How does reactive execution improve notebook reliability?
Reactive execution automatically recalculates dependent cells after any upstream change, ensuring consistent runs without manual resets. This approach addresses the hidden state problem by maintaining a clear and accurate execution flow.
What practices help overcome notebook limitations for better team collaboration?
Teams adopt strategies like append-only discipline to maintain a clear timeline, use config-driven experiments to keep notebooks clean, ensure determinism by setting random seeds and locking dependencies, and foster real collaboration through shared context, transparent dependencies, and safe edits.
How do multi-language workflows enhance notebook usability?
By leveraging shared data formats such as Parquet, teams can seamlessly move results between languages like Python, SQL, and R. This supports diverse tasks such as analysis in one language and visualization in another without friction, enabling more flexible workflows.
What is the next evolution step from notebooks as artifacts to integrated workflows?
The evolution involves bridging exploration with deployment by carrying insights from exploratory notebooks into scheduled jobs, APIs, or production models without rewriting code. Tracking data lineage and execution history ensures traceability, validation, and reusability of each run.
How does Zerve's Notebook View address classic notebook challenges?
Zerve’s Notebook View adds structure by clearly showing each cell's dependencies and outputs. It stores outputs upon code execution and shares them across collaborators, eliminating hidden state issues and keeping results consistent for AI-first teams.