Zerve x Joel Grus: The Notebook Reimagined for AI-First Teams
October 20, 2025
Notebooks were made for people, not for AI. Zerve rebuilt them for both. In this livestream, Greg Michaelson, Zerve Co-Founder and Chief Product Officer, chats with data science guru Joel Grus (of “I don’t like notebooks” fame), and unveils the latest Zerve release: a new kind of notebook where agents and humans collaborate in real time to go from question to deployed solution 10x faster. Learn how Zerve carries ideas from exploration to live apps and APIs — no rewrites, no orchestration tools, no context switching.
4:45
Good morning everybody. Uh this is an exciting day for me. I'm Greg Michaelelsson, one of the co-founders
4:50
here at Zerve. And today we're talking about notebooks. Um the first time I was
4:56
exposed to a notebooks were in graduate school. Uh I don't know if people know but I think notebooks were invented in
5:02
back in the 80s by uh a guy at Berkeley who I actually met at Jupyter Con a
5:08
while back. Um but anyway my first exposure to notebooks was uh in class
5:14
learning data science back at the University of Alabama and uh we use tools like like Jupyter of course it was
5:21
called IPython back then. Uh R Studio is another notebook environment. I learned
5:26
R in graduate school too. Uh and throughout the the years that I've been
5:31
doing data science, uh notebooks played a a pretty significant role. Um seeing
5:37
as they're they're just incredible incredibly popular. I think we did the math when we started serve that there
5:43
were about five notebooks created per second for uh the uh the entire you know
5:50
from like 2020 to you know whatever right it's an outrageous number of notebooks that are being created. Um
5:56
there's a company I think data lore did a study that showed that there were millions and millions of notebooks made
6:03
public on GitHub uh over the the course of of a two to threeyear period. So kind
6:10
of a remarkable uh explosion of usage in in recent years and they are they're super truly useful. Uh but I have here
6:16
with me today Joel Grus who is one of my heroes uh an idol of by many uh who's
6:24
famous for a talk that he gave back in 2018 uh called I don't like notebooks at
6:30
Jupyter Con ironically uh a conference all about Jupyter notebooks. So uh
6:35
welcome Joel. Would you uh just give us a quick intro and talk to us a bit about when you first encountered notebooks and
6:41
what your uh what your experience is? Yeah. So, uh, I've had a long pretty
6:47
windy career. Uh, I studied math originally, started out doing quantitative finance, pivoted to data
6:56
analysis, pivoted to data science, wrote a book on data science, then pivoted to
7:02
software engineering, uh, to AI, uh, machine learning
7:08
engineering, and now I'm back, I guess, doing AI and engineering once again.
7:14
uh what were my first boy so I've been using Python for a really long time I I started using it around 2005 I was also
7:22
in grad school and I took a class in probability modeling that was based in
7:28
mat lab and the way things worked is that the campus had a mat lab site
7:34
license that only worked on campus and I lived off campus and so I couldn't do my
7:39
assignments at home so I didn't have access to mat lab and so I discovered that there was a language called Python
7:46
and a tool at the time it was called numeric today it's numpy that basically
7:52
gave me a mat lab- like experience and that was when I started using python and then I've used it kind of off and on
7:58
ever since I think you know when I got into data science I started using notebooks because that was what people
8:04
used uh for data science I think even the first edition of data science from scratch had a section at the end that
8:10
said oh you should check out notebooks that's what people use for data science Um, so I use them a little bit. I never
8:17
thought about them that much. In 2016, I joined the Allen Institute for
8:22
Artificial Intelligence, which is a AI research nonprofit in Seattle. Uh, this was before kind of the transformer
8:29
revolution. And so a lot of NLP stuff back then was done on the Java stack
8:35
using Stanford core NLP. So when I joined it was really a Scola shop. Everything was in Scola. um which was
8:41
not great for me. one because I don't scowl that much and do some more of a Python person. But anyway, as you know,
8:47
Transformers caught on and BERT caught on and GPD caught on, uh, all of NLP
8:52
started switching to Python. And so, uh, a lot of my co-workers who'd been doing Scola and Java forever needed to learn
8:59
Python. So, they came to me because I was, you know, the Python guy. Um, and
9:04
one of my co-workers came to me and she said, "I don't understand Python." And like she's a very smart engineer, like
9:10
very good. And I was like, how can you not understand Python? It's not that complicated. And so I was like, bring
9:17
your computer, show me. And she started showing me. And it turned out that she
9:22
understood Python just fine, but she'd been working in a Jupyter notebook because someone told her to use a Jupyter notebook. And she had gotten the
9:29
state all out of whack. And so none of the variables equaled what she thought they equaled. And that was why why she
9:36
was confused. and like she wasted a few hours this way. And so I got mad. I went on Twitter and I tweeted, "If anyone
9:42
wants me to give a talk about why Jupyter notebooks are bad, um, let me know. I'd love to have an excuse to
9:47
write it." Someone at Jupiter Con saw the tweet and responded and said, "We
9:53
can't promise we'll accept it, but you know, you should submit it and uh, let's
9:58
see what happens." So, I submitted it. They accepted it, and then I was like, "Uhoh. Uh, this is going to be like one
10:06
of the most hostile audiences I've ever faced, so I better get my ducks in a row and and know what I'm talking about,
10:12
right? Um, and so I I I spent all I spent more time
10:18
working on that talk than any talk I've ever worked on. But I spent a lot of time really digging into um what is it
10:25
that I do like about notebooks and what is it that I don't like about notebooks just and beyond this you know primary
10:31
issue of hidden states which was my starting point and you know it turned
10:36
into a talk about a number of other things that really touched around some of my other big interests at the time
10:42
which was engineering best practices for researchers and for data scientists. Um
10:50
and now you know my joke is that instead of giving that talk I should have
10:56
started a a company about better notebooks because a lot of people have done that and been pretty successful
11:02
with that whereas my talk you know people like it but it's just just a talk. So
11:09
that's awesome. It's funny that you would say that because uh at Zerve we had exactly the same experience. So when
11:15
uh when we started Zerve, we were sitting in a room uh me and the two other co-founders and we had uh Mike
11:21
Mcccleintoch there who's our head of engineering and he'd never been exposed to notebooks at all. And here we are
11:27
building uh what at the time was a development environment for for data science and uh we're explaining to him
11:34
how notebooks work and let me actually just share my screen and kind of show what what I showed him a version of it
11:42
anyway. Uh, and it was basically this only we did it in a notebook. Actually, I'll pull it up in a notebook. So, I I
11:49
said a equals one. Uh, and then I said a equals a + one and then print a, right?
11:56
So, it would be two. But then I did this and this and this and this. And he was
12:03
stunned. Mike was absolutely stunned uh that he was like, "No, it can't possibly work that way." Like, there's no way
12:09
that somebody built it to do that. Uh, and so that's why I pulled up the screen and showed it to him. Uh, and maybe you
12:16
can explain kind of what's going on there for folks that maybe haven't been in a notebook or maybe more on the
12:21
engineering side and have been scripting and things like that. Yeah. So, you know, when it comes to
12:28
having a notebook like this, what's going on is not just what you see on the screen. So, you run the first cell and
12:35
it sets A to one. You run the second cell, it increases a by one and then
12:41
prints it. So then a becomes two. And then you can run that second cell as many times as you want to and it will
12:48
keep changing the value of a. And now it also keeps incrementing you know the number next to the cell. So it says this
12:55
is so obviously there's some kind of funny business going on here and that the first thing we ran was a equals 1.
13:02
The second thing, you know, the seventh thing we ran was a equals a + 1 print a.
13:08
And clearly we did some other things between 1 and seven. But looking at this
13:13
notebook, there's no way to actually know what were those things. And so
13:19
that's where you can get confused from. Yeah. And once you start going back and
13:24
forth, it gets even uh trickier. Right. If you have a really long complex notebook, I've seen notebooks where
13:30
there are like markdown blocks telling you what order to run the cells in. Uh, and there are some really sort of
13:37
pernicious bugs out there if if you happen to run the cells in the wrong order or run a cell too many times or
13:42
something like that where it's like nothing equals what it's supposed to equal, but the code actually is written
13:48
properly. And really, the only way to fix it is to push this button here, which restarts your kernel. it basically
13:54
clears all the memory so that none of the variables have any value and then rerunning everything from scratch in the right order
14:00
for sure and one thing that I found um so there are a lot of people who really like notebooks and there you know uh
14:06
Jeremy Howard gave an entire talk at a conference on why I'm wrong about notebooks and and so
14:13
when you push people on this issue of you know hidden things have happened and you don't know what they are they'll say
14:20
yes of course that you're using the notebook wrong. You should know to restart and run all anytime you make a
14:27
change like that. And I said, well, that's good, but most people, you know, either don't know that or know it, but
14:33
don't do it. So, it causes a lot of problems still. Um,
14:38
yeah, to me, it seems like putting the turn it off and turn it on button as a top level navigation button in the app
14:44
is a is a sign of a design flaw. Uh, but really notebooks were designed from the beginning just to be scratch pads uh in
14:51
like an academic setting. I'm not even sure it always was a
14:57
button. I I think it might have been a drop down menu at some point, but or maybe not even present in in uh uh
15:03
No, I mean there was always a there was always a restart and run all but option I think um because that was a a common
15:10
thing to do. But yeah, now since you gave your talk, there have been lots and lots of of startups that
15:16
have popped up uh doing notebook type environments, but nobody really addressed that hidden state issue. Have
15:21
you have you seen any movement on it apart from Zerve? We'll talk about Zerve in a bit, but uh have you seen much
15:27
change? Yeah, there's I mean there's another notebook called uh Merimo that um also
15:34
does reactive notebooks. It's interesting actually at that Jupyter con that I went to um they had a poster
15:40
session the night before the conference where people were presenting sort of experimental work and there was a guy
15:45
there in 2018 who had made an experimental reactive kernel for uh
15:52
Jupiter and so uh a kernel is basically the computational you know engine that
15:58
sits behind it and reactive kernel just means that whenever you change a cell
16:04
all the cells that depend on it update. And so that makes it so that you can't get in this weird hidden state because
16:10
if I change a cell that something else depends on, that thing that depends on it recalculates. And so the state stays
16:18
in sync. And so I I argued with this guy for a while. I tried to get him to bite the bullet that every Jupyter notebook
16:25
should work in this reactive kind of way. And he he wouldn't bite that bullet. But um it is a little bit
16:31
surprising to me that you know the core notebook product hasn't
16:37
you know gone through that many changes that would help with this. It's been what seven years.
16:43
What is it that you think people like about notebooks so much because there's no question they've been deeply adopted
16:49
by the data science and data analy anal analysis communities. So there there are a few things that
16:55
people like. uh one is this notion of literate programming, right? So in addition to having these code blocks, I
17:01
can put in here's a markdown block that's going to explain, you know, here's what I'm about to do. Here's what
17:07
I'm about to do. And so it makes for a very readable document. Whereas if you had a script in a Python file, it would
17:13
be much less readable as a document. Along similar lines, one thing that they do really nicely is that they allow you
17:20
to intermingle graphical output and text output. Right? So you say here I would like to run a
17:26
regression and now I'm going to, you know, put the graph and a table of the
17:33
regression coefficients right underneath it. And so as I read this thing top to bottom, it's got code, it's got
17:39
equations, it's got explanations, it's got charts, it's got tables, and it
17:44
makes uh it makes a really nice artifact, right? um this here's the
17:50
exact details of what I did and what the results were. And so to the extent that
17:57
you use it as an artifact, I think it's actually, you know, pretty nice and
18:02
gives you a good way to look back at here's what I did. Where you start running into trouble is when you know,
18:09
okay, this is not just an artifact. It's more of a living document. I don't want to go back and change this part of it and then what happens to the rest of it
18:15
and you know people sort of tie themselves into knots and you know maybe
18:22
they haven't specified here's the exact dependencies I need and so um it it it
18:30
they get sold not just as artifacts but also as this is the key to doing reproducible research and
18:37
it's like half the key to doing reproducible research the other half is having the right environment and you
18:45
know you can put some pip commands in the top cell and do things like that but
18:50
it require again it requires a lot of discipline to use them in a way that's really reproducible.
18:56
Yeah, depend dependency management is is a problem. I think probably 90% maybe more of people that work in notebooks
19:02
run them locally. Uh so that that's another potential issue. collaboration
19:08
is is pretty horrible. Sending like trying to manage notebooks in in like a GitHub or or sending files back and
19:15
forth like that's a headache trying to get someone's someone else's notebook running. Um have you ever done any work
19:22
collaborative work like using maybe like a Google Collab or some of these more modern kind of notebook environments
19:28
that they've essentially cloned Jupiter and they put it on the on the web and stuff like that?
19:33
I haven't I I mostly work in code. And so for me to collaborate is
19:40
really around writing scripts. Um, you know, when I when
19:47
I was at AI2, we we did a lot of experiments and the way that we aimed to
19:52
have them be kind of reproducible um was by having them driven by command
20:00
line tools with kind of these hyperparameter configuration files. Right? So, here's a script that's going
20:07
to read in from a config file that says, you know, use this many hidden layers, use this many input dimensions, use this
20:14
embedding model, and so on and so forth. And that way, if you have um, you know,
20:20
a variety of these blobs describing experiments, you can just say, let's run all these and and collect the results.
20:26
And so, that's the way that I've typically worked. Um, you know, when
20:31
when when Streamlit was big, I used it for a lot of I would say more
20:37
interactive demos than notebook replacement per se. More here's
20:42
a model. I want to give you an easy way to play with it. Um, you being someone who's not a coder maybe necessarily. And
20:49
so I've had a good lot of good luck with that. Um, I haven't done much of that recently, but yeah.
20:55
Yeah. Got it. Well, maybe we'll segue a little bit into Zerve and what we've been doing
21:00
in the space because we are releasing on Wednesday what we're calling uh notebook view. And on Friday, I think we gave you
21:08
access to it uh to kind of look at it and play around a bit. So, you're you by by no means have gotten to to get your
21:14
fingers as dirty as I'm sure you want to. But, I did want to pop up a few things. Let me share my screen here and
21:20
just talk a bit about kind of what we've been thinking about uh in this space and
21:26
uh maybe just go back to our original example. A equals 1 and then a add one
21:31
to a and then print it. And I've gone ahead and run these cells and you'll notice that a here is uh two. It's
21:38
always going to be two uh because of the way zer is is architected. So we call this notebook view but it's not really
21:45
uh a notebook in the sense of any traditional notebook. It's more in that it it looks like a notebook in terms of
21:51
the way you interact with the code. Um, so what each of these cells in serve is doing, and I know you know this, but I'm
21:57
just sort of explaining to the folks that are listening. When the code executes here, you notice you've got these two arrows here. These these are
22:04
your upstream dependencies, in which case in in this case, this block doesn't have any upstream dependencies. And in
22:10
this case, it has one downstream dependency, which is this next cell. And so we're tracking which uh which cells
22:16
are related to which cells. But when this block executes, it's actually storing its output. Uh so one of the big
22:24
issues for me with running notebooks is that if I send you a notebook, then you can see all of the charts and graphs
22:31
that I might have printed out, but none of the variables exist uh in in your version of it because notebooks are
22:38
typically in memory tools, right? you'd have to execute the code in order to actually see the values of any
22:43
variables. In Zerve, it's super different. So, in Zerve, uh, when I run
22:48
this code, what we're doing is we're caching and we're storing all of your variable values, uh, and then we're
22:55
passing them downstream to any any variables that that or any cells downstream that might actually need to
23:02
uh, use them as dependencies. And so, this variable A is getting stored. And
23:07
so if if I were to share this with you uh and you were to log in, we could all all be logged in together and we'd all
23:14
see the values of these variables as they uh are run um and and what they're
23:21
stored as. And so that's the reason that every time I run this guy, it's starting with its dependencies. And so it's
23:28
saying, "Oh, a which which a? Oh, it means this a and that a equals one. So I'm going to add one to it and print
23:34
it." So no matter how many times I run it, in this case 1 plus 1 always equals
23:39
two just because of the way that we've structured the notebook. Uh and so it's
23:44
it's actually significantly different from uh the way that notebooks today
23:50
typically function in because of the way we have the architecture set up.
23:57
Yeah, that's uh that's helpful. Uh in fact another way the way that
24:03
we've always had to look at these guys before is as uh a DAG. And so this is
24:08
another way of visualizing that exact same project uh where we've got one cell that or in this case it's a block but
24:16
those are this is the same project. It's just viewed as a DAG instead of as a as a notebook. this cell executes and then
24:22
it's passing its variable space down to subsequent blocks where where when those
24:28
blocks execute then they start with the the starting points uh and that way the
24:33
results are repeatable reproducible uh and the same every time and you can't get into that bad hidden state situation
24:40
that you often do with uh with notebook environments. So I'm actually really curious about the
24:47
way that you work uh when you're writing code. My guess is that you're typically in like a api file in like a VS Code
24:56
kind of a very hardcore type engineering type environment. But when you do have like um you know like visualizations you
25:04
want to look at when you're doing data exploration and things like that uh what type of environment do you find yourself
25:10
working in? So I have forever been a big user of
25:18
the IPython console. I don't even know if it's called that anymore, but that's what I call it. That's what it's always been called. Um, and so what that is is
25:28
a, you know, terminalbased console that has a lot of the IPython or Jupiter
25:36
magics, if you will. And so, you know, in some sense, what you could think of
25:42
it as an appendon notebook and and that's another thing
25:47
that I actually have talked with them about before too is, you know, if you could make notebooks so that they were
25:53
appendon. Um, tell me what you mean by that. What What do you mean by append only?
25:59
What I mean is that once I run a cell, I can't change it and I can't change any
26:04
cells above it. Um, ah, I see. And so the way I use you know
26:09
IPython the console is like once something has run it's run and it's there in the terminal history and I can
26:15
I can now overwrite it um by saying you know again a equals 3 when before I said
26:22
a equals 2 but if I scroll back I'm going to still see the a equals 2. Um, and and so, you know, that was another
26:29
thing that I that I pushed the Jupyter folks on, you know, several times is
26:34
what if you had a notebook kernel that was append only, so I can only add new cells at the end or where if I go back
26:41
and change a cell, it gets rid of every cell after that and and so I'm basically rewinding, but then I'm starting from
26:47
where I rewinded and I can't keep the other things that I've done since then. Um, Gotcha. Okay. I could see that as
26:54
another workaround of that hidden state problem for sure. Yeah. Uh I'm not sure I call it a
27:00
workaround. I just think it's a it's a different paradigm that doesn't, you know, allow um the hidden state.
27:09
One of the things that I've seen quite a lot is um uh wanting to schedule a
27:14
notebook to run on a recurring basis. Do you ever go down that road? Do you play in that space at all?
27:20
I don't. I'm I I'm not much of a data engineer and so I've seen that mostly in the data engineering space. I have a
27:26
pipeline that runs every night and I want to you know take the output of some
27:33
you know job or table or whatever and pipe that through this notebook and then
27:38
put the output there. So it's basically you know the notebook contains my you know ETL rules and um so so I I I've
27:47
seen that happen. Uh but I I generally am not involved in that part of the stack.
27:53
Gotcha. And is that is that a good way to uh to run code like to run a
27:58
recurring job as a scheduled notebook or how how would you do it if you were going to do you have a script file like
28:05
a cron tab or do you you know what what kind of do you think is the most useful when it
28:12
comes to like data pipelines and things like that? Um I again I'm not a data engineer so I'm
28:19
not going to like make uh I don't have strong opinions on that. I think you know so so here here's what I'll tell
28:25
you is that all else being equal I would prefer having my data transformations written in code that way they're easier
28:32
to unit test um and I can use my ID to edit them and so on
28:38
and so forth. I think the teams that I've worked with that used notebooks as
28:43
um for these data engineering pipelines the value to them was that they would
28:49
generate charts and graphs as part of the data pipeline and then you know as we were talking about earlier then they
28:56
would save this as an artifact and so now when I want to understand what did my data pipelines do last night I don't
29:03
just look at logs necessarily but I have these notebook artifacts um
29:09
that that I can look at and understand better. I think there's some questions. Oh, okay. Here's one. George says,
29:14
"We've seen a lot of new tools promise reproducibility, but they end up locked into proprietary formats or hosted
29:21
runtimes. How does Zerve avoid becoming another walled garden?" Uh well, um good
29:26
question. Uh Zerve, uh so we don't create any proprietary formats. Um all
29:32
of our code is just straight Python code. and when you connect it to to GitHub or Bitbucket or any of the the
29:38
hosting tools, uh it's it stores your code as plain text and and so on. So,
29:43
we're using kind of open source technologies to uh you know to to preserve all of that sort of thing. So,
29:49
you can always kind of take your ball and go home if you want to jump out of Zerve into some other type of environment.
29:55
I should mention also by the way for the folks that are on uh if you want to sign up for Zerve, it's a it's free to use.
30:01
So there's a there's a you can create a free account at serve.ai and get in there and and connect your data, upload
30:07
data, use our agent. Uh in fact, maybe that would be a good thing to do now is just kind of demonstrate what our agent
30:14
actually looks like. Let me share my screen again. And I will just jump in here and I'm going to say I'm sharing my
30:20
screen, right? Yep, I'm sharing my screen. I'm gonna say uh I'm doing a
30:26
live stream and I need a cool demo. use
30:32
uh made up data. Uh so we are we're connected so this is
30:37
all running in the cloud and we're connected to uh the uh in this case it's running in
30:44
AWS and we're integrated with all the various large language models and we've
30:49
built an agent that can actually do some really cool stuff uh in terms of like writing code and running code and
30:54
executing code. Um and in this case it started in in DAG view. So, I'm going to go ahead and just jump over to Oh, so
31:02
here's its plan. I'll just read through this plan real quick before I switch to notebook view. Uh, so it's going to
31:08
generate some data. In this case, it's chosen to uh create an e-commerce data
31:13
set. Uh, and it's going to build some parallel branches with sales metrics and customer segmentation and stuff like
31:20
that and then make a dashboard. So, I'll just click approve plan. Uh but let me go here and just jump into
31:27
open as notebook and we can see that same as it starts to write code it'll start to create those blocks in uh
31:35
inside of this notebook and we'll be able to see what it's doing. Um so Joel while it's thinking oh it started to
31:41
create some data already. We'll look at that in a second here. While it's thinking what have you seen in the collaboration space? So in terms of like
31:48
uh data scientists wanting to work on creating uh projects as a group and and coding as
31:55
a team, how how are people doing that in your experience today and is it working?
32:01
Um you know you see a lot of people um
32:07
using Jupyter Lab and and so you know I've worked at companies where they set up big Jupyter Lab instances. So that's
32:14
like a hosted Jupyter notebook server where people can share notebooks on the
32:20
server and work on them. Now I don't I don't know that that admits kind of
32:25
multiplayer editing, if you will, but I I think that still is a pretty common
32:32
way of doing it. Okay. Uh yeah. So I've tried using tools like
32:39
Collab and and a few others and I always find that it's a bit uh a bit sketchy in
32:45
terms of getting getting kind of uh collaborative coding working especially when you want to run uh run your code.
32:52
Um the way that Zerb actually executes is that each cell has its own kernel
32:58
basically. So when you when you click it off when you click run on a particular block now the agent is doing all the
33:03
running here. Uh and so you can see it's creating a boatload of code. Uh and here's like for example the data summary
33:10
of the data set that it created. But when each of these cells ran, they actually create spun up um serverless
33:17
compute to actually execute the the code. The default is is using lambdas. Uh but you could change that to GPUs or
33:24
you could change it to like Fargate containers, but it all runs serverlessly. And so one of the upshots
33:29
of that is that you could execute code uh as many you could run as many of these blocks or cells at the same time
33:36
as you wanted and they don't interfere with each other and the the project is
33:42
smart enough to know uh where the dependencies are. And so if I down if I
33:47
upstream if I made changes to this block and then executed it then the downstream
33:52
blocks they wouldn't automatically update because we want to maintain. We don't want to mess somebody up if they're working downstream and and the
33:59
upstream data happens to change but the blocks are smart enough to know that there's been an up upstream change and
34:05
that they need to be that everything needs to be re-executed in order to reflect the latest data. Uh so uh like
34:13
for example, if we go back to our DAG view on this graph, you can see there's quite a lot of like a nonlinear process
34:20
that's being created here. Um and if if I kicked off all of this to run again, you'd notice that you could have
34:25
multiple blocks running simultaneously. And so each one being independent and knowing what the dependencies are makes
34:32
it really useful when it comes to uh wanting to run stuff in parallel and wanting to collaborate with other
34:37
teammates. So have you think have you seen do you see a need for something like that? We
34:44
certainly have talked with lots of teams where they're like collaboration we don't really do that. Is is
34:49
collaborative real time like synchronous collaborative coding is that a thing that you think is going to grow or is
34:55
that going to be something that's eh like teams don't really do that or what are your thoughts on that?
35:01
Um, you know, I I think what's probably going to grow is real time collaborative
35:08
coding where you're collaborating with the AI um is is more likely to grow than
35:14
collaborating with other people. I mean, if anything, the the way of the world seems to be pushing a lot of this work
35:19
onto the AI like you just did, right? Yeah, that's fair. That's fair. So, the collaboration would be more in
35:26
terms of like handoffs from from team to team say. No, I I I mean that that's one aspect,
35:32
but another aspect, you know, you could imagine um collaborating with the AI,
35:38
right? Like the AI is making changes, I'm making changes. We're trying not to step on each other. Uh, I don't know
35:44
that the the current coding assistants are are good at that right now, but yeah, I think I feel like or or or multiple
35:54
um, you know, people will spin up multiple cloud codes or multiple codec CLIs at the same time and try to get
36:00
them not to step on each other. Right. Right. Yeah. So, we do something similar there.
36:05
So, if we wanted to kick off a new chat uh and say, "Hey, could you explain
36:12
what this uh code is doing, please?" Uh
36:17
then, you know, you see how you can have multiple agents kind of doing all sorts of stuff at the same time and and
36:23
getting them to not step on each other's toes is, I think, a sort of an unsolved problem at the moment. Um the way that
36:29
we've tried to tackle it is by controlling the scope of what each agent can do. So, if I jump back to uh to the
36:37
the notebook view here, I can pick one of these blocks. Oh, this is interesting. Remind me to come back here
36:44
uh to this block here because the agent is self-healing. It's created some code that it's uh that's got some errors in
36:51
it and uh it's it's uh made a mistake and so it's going to go back and try and fix it and self-heal and all that kind
36:57
of stuff. So, that'll be interesting to see if it'll work. Um but I can go in and I can
37:02
there's a couple more questions. Oh, do we have more questions? Sorry, I'm not I'm sharing my screen and I
37:08
can't see the uh the thing. All right. In practice, most data teams mix code from multiple languages and
37:14
environments. How does a canvas model handle that complexity better than just gluing together scripts and notebooks?
37:22
Uh have you experienced that? Do you find teams working in multiple languages and uh trying to knit all that together?
37:29
Uh I've I've seen it before. Um it it depends on the organization. I mean, some organizations leave it up to the
37:34
team until you have a team that works in R and a team that works in Python. Um, and maybe even a team that works
37:41
TypeScript. Yeah, I mean SQL. Yeah, SQL, I guess, is
37:46
a good example. Um, and then some teams are are much more standardized. I think I think it goes both ways, but yeah,
37:53
it's definitely a real thing. H. So the way that we handle it um when when each
38:00
code block uh executes and serializes the output um that does something kind
38:06
of unexpected I would I would guess. So let's say we were running uh doing some pandas code and we execute that code and
38:16
we've got a data frame. That dataf frame is going to get serialized uh in in most
38:21
cases as a parquet file. And it turns out R know knows how to deal with parquet and SQL knows how to deal with
38:26
parquet. And so in the deserialization step when the next block runs it might be an R block or a SQL block. Uh and
38:33
they can use those Python objects directly in Zerve. Uh so within a single Zerve notebook or a single Zerve canvas,
38:41
you can use R to visualize a a Python data structure or ggplot to plot
38:47
something from uh from from Python. you could SQL query a pandis dataf frame uh
38:53
and that's all sort of by virtue of how the uh the serialization and
38:58
deserialization steps impact the way the code runs. So we've seen a few teams
39:04
that have actually most commonly pardon me most commonly the managers are using
39:10
R uh and the team is using Python. Uh there seems to be an age gap between R
39:15
users and Python users. Uh and so a lot of times we'll see a a a manager that
39:21
uses R and a team that runs Python and the manager wants to get in there and get his hands dirty or her hands dirty from time to time. Uh, and being able to
39:28
do that interoperably between languages can be really interesting.
39:33
Uh, let's see. You're framing the canvas as a kind of bridge between exploration and production. Does that risk
39:39
satisfying neither side? Too rigid for discovery, too visual for engineering? Uh, I'm actually really curious about
39:45
your answer to this question. Uh, what would it take to get you using something like a Zerb for for something like a
39:50
production process? So here here's where I think my
39:56
challenge would would arise is that the sort of experiments that I need to do
40:05
involve interacting with systems that are maybe running locally on my machine, right? So maybe I have um some kind of
40:15
service that's running locally and my experiments need to be making calls to this service possibly in a decoupled way
40:23
using like a rest APIs but possibly in a more coupled ways. And so because of
40:28
that, for my use cases that I have today, having sort of a like a cloud
40:35
environment, whether it be a Jupiter lab or or or a collab or or reserve or
40:40
whatever, will be a little bit tricky, I think.
40:46
uh what are some examples of those local kind of uh uh connections that you need to make just so I can
40:53
so so for example you know um let's say I'm I have a AI enabled SAS product and
41:00
it has backend services that are powering it and now I want to do experiments with where I'm you know
41:08
changing some of the parameters that the system runs with and now I want to Um,
41:15
I want to see are the results better, are the results worse. Um, and I need to do that by, you know, potentially
41:23
stopping restarting the system or running multiple copies of the system. Um, but but I have
41:30
services running either locally or in the cloud somewhere that will be
41:36
um a little bit more work. It's not impossible to talk to them from, you
41:41
know, a hosted notebook or whatever, but it's a little more work than if I just want to run some kind of hermetically sealed experiment where the notebook
41:48
contains everything that it needs to know, right? No, that that totally makes sense. How do you think self-hosting uh
41:55
or um it's to me? So I do all of my data science and data analytics,
42:02
pardon me, in in Zerve now in the cloud and I would never ever want to go back
42:08
to local development. Uh just because when I have something that I want somebody to look at, it's just so easy
42:13
to just send a link and share a project. Uh I view it as kind of like moving from Microsoft Word to Google Docs. Uh like
42:20
who in their right mind would ever use Google Docs now? uh unless there was some forcing function that required you
42:25
to like the the only use case I've seen is uh legal uh like law firms that need
42:32
you know some some sort of a word feature that's not available in docs. Well, you know, my Microsoft Word is is
42:38
is a cloud program now. So, well, yeah, maybe
42:44
it it is. I promise. I used it in my last job, so Oh, okay. Right. Well, anyway, they they
42:49
certainly caught up, I guess, then. But yeah, just from an analogy perspective, to me, moving from Jupiter to something
42:56
like Aerve or a cloud hosted notebook, uh there there are major advantages to doing that. Do you find that to be the
43:02
case? Um or is it just me in the way I operate?
43:07
No, I mean I I think there's um yeah, there there's pros and cons to
43:13
everything, right? So, some of the things you're saying about making it easier to share, uh, making it easier to
43:19
have multiple people working on stuff, um, certainly that's the case. And if those are the things that you're
43:24
optimizing for, having a way of hosting it, be it serve or jupitter lab or
43:30
collab or whatever helps with that. Um, at the same time, you know, you're
43:37
giving yourself an extra dependency. You you can't work on an airplane. Um, if
43:42
you're the sort of person who works on an airplane, um, I can't work on an airplane because there's not enough room to open my laptop, but it's a different
43:47
issue. It might be a benefit, I guess. No, I'm kidding.
43:52
Right. Uh, h in terms of um the the stack and how it's moving to the cloud
44:01
for data scientists and the way that they operate, have you seen any trends that are interesting in that space?
44:09
Uh that's uh here's part of what I'm thinking. So like when during the pandemic I was at a company called data
44:16
robot and we were doing some work with uh with the with health and human services on uh simulating the the effect
44:26
of recruiting for the the vaccine trials in different areas. And so you know if
44:32
you recruit here this is how many people that you'll get and this is how long your your uh vaccine trial will take
44:39
given given certain assumptions and so on. And at the time there were you know thousands of people dying every day of
44:45
COVID. And so it felt to us whether it was or not it felt to us like what we
44:51
were doing every minute mattered uh in terms of like being able to to help. Now, it's a thorny and political thing
44:58
to talk about, but just in terms of like data science and infrastructure and stuff like that, we were working in
45:03
notebooks and we were like trying to email and slack notebook files back and forth and you know do that whole thing
45:11
and getting to the cloud and handling like the orchestration and and provisioning resources and stuff like
45:17
that was not something that we were uh able to do quickly. uh and so that that
45:23
was a major stumbling block and some of the folks that are using Zerf have have made a similar uh observation.
45:30
Yeah, I think that's a fair point and and I think also one thing that I've seen over the past, you know, 5 to 10
45:36
years is that a lot of companies that had been reluctant to move to the cloud
45:43
more broadly, not just in data science, have really embraced that. Um and and so
45:49
there is a lot of appetite for you know we want to embrace cloud solutions to
45:55
things. Um all right so maybe we'll wrap up a
46:00
little bit here. What uh what's next? What like what do you see as the big like if you had to put another talk
46:05
rewrite it out there uh you know I don't like X about the data science development uh uh space the toolkit
46:13
that's available to data scientists. What's What's Joel Gru's part two at the next uh Jupyter Con event?
46:20
Uh, honestly, it's uh it's been the talk would be it's been seven years. Why
46:25
haven't you fixed all these things? All right, that's fair. Oh, we got one
46:31
more question, then we'll uh we'll go and wrap. It says, uh, you both talked about determinism and hidden state, but isn't some amount of non-determinism
46:38
inevitable once models, randomness, and APIs enter the picture? How far can
46:43
tools really go to enforce reproducibility? So I think you know
46:50
when I was at AI2 and we were training our own models um if you if you have a
46:56
model that runs on a you know single GPU and not in parallel and you're setting
47:02
the seed then it is like pretty reproducible like you'll get the same result every time. Now, if you're
47:08
running on multiple GPUs, um, you know, or if it's a hosted model, maybe, uh,
47:14
maybe it's a little bit less reproducible, but but I think, um, you know, if you're calling, you know, GPT5
47:21
and you set the temperature to zero and you give it the same, um, you give it
47:27
the exact same model date and whatever, then in theory, you should be getting
47:32
the the same sorts of things back. So I I think um so I'd say I think there are
47:39
ways to get things pretty reproducible in most realistic cases. I mean if you're you know if you're a data science
47:45
team that's training XG boost models or whatever you can get the exact same one back I think. And the second thing is
47:52
that you know if you actually do have a model that is so noneterministic that
47:58
when you run it with the same the same inputs you get like vastly different answers and you can't get the same
48:05
answer every time. I think that tells you something interesting too. Um, but I I do
48:10
well no. Um, I wouldn't say you've done something wrong, but what I would say is that you haven't, you know, maybe solve
48:18
the problem as well as you think you have if, you know, your solution only works half the time. Um,
48:26
but I I do, you know, I do cling to this idea that that I think things are, you
48:31
know, can be reproducible if we take the right steps. Awesome.
48:37
Well, hey Joel, thanks for taking the time on this beautiful Monday morning. Uh it's a pleasure to get to talk to you
48:42
and uh I'm uh I'm really uh really sort of optimistic about how the data science
48:50
coding kind of landscape has changed. My the way I code is totally different than
48:55
it was a year ago with the with the large language models that are out there and with uh with the environments that
49:01
are coming and all the tool changes. seems like I mean you always hear this that the pace of uh innovation or or
49:08
change is is fast but it seems like it's even faster than it has been. Is that the sense that you get?
49:14
Yeah. Every month I have to cancel one $200 a month subscription and sign up for a different $200 a month
49:20
subscription because it's better. Awesome. All right. Cool. Well, Joel,
49:26
thanks again. We really appreciate it and uh this was a really great conversation. I would remind everyone I
49:31
was glad to chat. Excellent. Uh anyone that's listening, you can go to zerve.ai and and sign up
49:37
for a uh a free account. Get in there and give it a try and send us your feedback. We'd love to hear it. Thanks
49:43
for joining everybody.


