🏀Zerve chosen as NCAA's Agentic Data Platform for 2026 Hackathon·🧮Meet the Zerve Team at Data Decoded London·📈We're hiring — awesome new roles just gone live!
Videos /Zerve x Joel Grus: The Notebook Reimagined for AI-First Teams
X
Data Day Podcasts

Zerve x Joel Grus: The Notebook Reimagined for AI-First Teams

October 20, 2025

Zerve x Joel Grus: The Notebook Reimagined for AI-First Teams

Notebooks were made for people, not for AI. Zerve rebuilt them for both. In this livestream, Greg Michaelson, Zerve Co-Founder and Chief Product Officer, chats with data science guru Joel Grus (of “I don’t like notebooks” fame), and unveils the latest Zerve release: a new kind of notebook where agents and humans collaborate in real time to go from question to deployed solution 10x faster. Learn how Zerve carries ideas from exploration to live apps and APIs — no rewrites, no orchestration tools, no context switching.


  • 4:45

    Good morning everybody. Uh this is an exciting day for me. I'm Greg Michaelelsson, one of the co-founders

    4:50

    here at Zerve. And today we're talking about notebooks. Um the first time I was

    4:56

    exposed to a notebooks were in graduate school. Uh I don't know if people know but I think notebooks were invented in

    5:02

    back in the 80s by uh a guy at Berkeley who I actually met at Jupyter Con a

    5:08

    while back. Um but anyway my first exposure to notebooks was uh in class

    5:14

    learning data science back at the University of Alabama and uh we use tools like like Jupyter of course it was

    5:21

    called IPython back then. Uh R Studio is another notebook environment. I learned

    5:26

    R in graduate school too. Uh and throughout the the years that I've been

    5:31

    doing data science, uh notebooks played a a pretty significant role. Um seeing

    5:37

    as they're they're just incredible incredibly popular. I think we did the math when we started serve that there

    5:43

    were about five notebooks created per second for uh the uh the entire you know

    5:50

    from like 2020 to you know whatever right it's an outrageous number of notebooks that are being created. Um

    5:56

    there's a company I think data lore did a study that showed that there were millions and millions of notebooks made

    6:03

    public on GitHub uh over the the course of of a two to threeyear period. So kind

    6:10

    of a remarkable uh explosion of usage in in recent years and they are they're super truly useful. Uh but I have here

    6:16

    with me today Joel Grus who is one of my heroes uh an idol of by many uh who's

    6:24

    famous for a talk that he gave back in 2018 uh called I don't like notebooks at

    6:30

    Jupyter Con ironically uh a conference all about Jupyter notebooks. So uh

    6:35

    welcome Joel. Would you uh just give us a quick intro and talk to us a bit about when you first encountered notebooks and

    6:41

    what your uh what your experience is? Yeah. So, uh, I've had a long pretty

    6:47

    windy career. Uh, I studied math originally, started out doing quantitative finance, pivoted to data

    6:56

    analysis, pivoted to data science, wrote a book on data science, then pivoted to

    7:02

    software engineering, uh, to AI, uh, machine learning

    7:08

    engineering, and now I'm back, I guess, doing AI and engineering once again.

    7:14

    uh what were my first boy so I've been using Python for a really long time I I started using it around 2005 I was also

    7:22

    in grad school and I took a class in probability modeling that was based in

    7:28

    mat lab and the way things worked is that the campus had a mat lab site

    7:34

    license that only worked on campus and I lived off campus and so I couldn't do my

    7:39

    assignments at home so I didn't have access to mat lab and so I discovered that there was a language called Python

    7:46

    and a tool at the time it was called numeric today it's numpy that basically

    7:52

    gave me a mat lab- like experience and that was when I started using python and then I've used it kind of off and on

    7:58

    ever since I think you know when I got into data science I started using notebooks because that was what people

    8:04

    used uh for data science I think even the first edition of data science from scratch had a section at the end that

    8:10

    said oh you should check out notebooks that's what people use for data science Um, so I use them a little bit. I never

    8:17

    thought about them that much. In 2016, I joined the Allen Institute for

    8:22

    Artificial Intelligence, which is a AI research nonprofit in Seattle. Uh, this was before kind of the transformer

    8:29

    revolution. And so a lot of NLP stuff back then was done on the Java stack

    8:35

    using Stanford core NLP. So when I joined it was really a Scola shop. Everything was in Scola. um which was

    8:41

    not great for me. one because I don't scowl that much and do some more of a Python person. But anyway, as you know,

    8:47

    Transformers caught on and BERT caught on and GPD caught on, uh, all of NLP

    8:52

    started switching to Python. And so, uh, a lot of my co-workers who'd been doing Scola and Java forever needed to learn

    8:59

    Python. So, they came to me because I was, you know, the Python guy. Um, and

    9:04

    one of my co-workers came to me and she said, "I don't understand Python." And like she's a very smart engineer, like

    9:10

    very good. And I was like, how can you not understand Python? It's not that complicated. And so I was like, bring

    9:17

    your computer, show me. And she started showing me. And it turned out that she

    9:22

    understood Python just fine, but she'd been working in a Jupyter notebook because someone told her to use a Jupyter notebook. And she had gotten the

    9:29

    state all out of whack. And so none of the variables equaled what she thought they equaled. And that was why why she

    9:36

    was confused. and like she wasted a few hours this way. And so I got mad. I went on Twitter and I tweeted, "If anyone

    9:42

    wants me to give a talk about why Jupyter notebooks are bad, um, let me know. I'd love to have an excuse to

    9:47

    write it." Someone at Jupiter Con saw the tweet and responded and said, "We

    9:53

    can't promise we'll accept it, but you know, you should submit it and uh, let's

    9:58

    see what happens." So, I submitted it. They accepted it, and then I was like, "Uhoh. Uh, this is going to be like one

    10:06

    of the most hostile audiences I've ever faced, so I better get my ducks in a row and and know what I'm talking about,

    10:12

    right? Um, and so I I I spent all I spent more time

    10:18

    working on that talk than any talk I've ever worked on. But I spent a lot of time really digging into um what is it

    10:25

    that I do like about notebooks and what is it that I don't like about notebooks just and beyond this you know primary

    10:31

    issue of hidden states which was my starting point and you know it turned

    10:36

    into a talk about a number of other things that really touched around some of my other big interests at the time

    10:42

    which was engineering best practices for researchers and for data scientists. Um

    10:50

    and now you know my joke is that instead of giving that talk I should have

    10:56

    started a a company about better notebooks because a lot of people have done that and been pretty successful

    11:02

    with that whereas my talk you know people like it but it's just just a talk. So

    11:09

    that's awesome. It's funny that you would say that because uh at Zerve we had exactly the same experience. So when

    11:15

    uh when we started Zerve, we were sitting in a room uh me and the two other co-founders and we had uh Mike

    11:21

    Mcccleintoch there who's our head of engineering and he'd never been exposed to notebooks at all. And here we are

    11:27

    building uh what at the time was a development environment for for data science and uh we're explaining to him

    11:34

    how notebooks work and let me actually just share my screen and kind of show what what I showed him a version of it

    11:42

    anyway. Uh, and it was basically this only we did it in a notebook. Actually, I'll pull it up in a notebook. So, I I

    11:49

    said a equals one. Uh, and then I said a equals a + one and then print a, right?

    11:56

    So, it would be two. But then I did this and this and this and this. And he was

    12:03

    stunned. Mike was absolutely stunned uh that he was like, "No, it can't possibly work that way." Like, there's no way

    12:09

    that somebody built it to do that. Uh, and so that's why I pulled up the screen and showed it to him. Uh, and maybe you

    12:16

    can explain kind of what's going on there for folks that maybe haven't been in a notebook or maybe more on the

    12:21

    engineering side and have been scripting and things like that. Yeah. So, you know, when it comes to

    12:28

    having a notebook like this, what's going on is not just what you see on the screen. So, you run the first cell and

    12:35

    it sets A to one. You run the second cell, it increases a by one and then

    12:41

    prints it. So then a becomes two. And then you can run that second cell as many times as you want to and it will

    12:48

    keep changing the value of a. And now it also keeps incrementing you know the number next to the cell. So it says this

    12:55

    is so obviously there's some kind of funny business going on here and that the first thing we ran was a equals 1.

    13:02

    The second thing, you know, the seventh thing we ran was a equals a + 1 print a.

    13:08

    And clearly we did some other things between 1 and seven. But looking at this

    13:13

    notebook, there's no way to actually know what were those things. And so

    13:19

    that's where you can get confused from. Yeah. And once you start going back and

    13:24

    forth, it gets even uh trickier. Right. If you have a really long complex notebook, I've seen notebooks where

    13:30

    there are like markdown blocks telling you what order to run the cells in. Uh, and there are some really sort of

    13:37

    pernicious bugs out there if if you happen to run the cells in the wrong order or run a cell too many times or

    13:42

    something like that where it's like nothing equals what it's supposed to equal, but the code actually is written

    13:48

    properly. And really, the only way to fix it is to push this button here, which restarts your kernel. it basically

    13:54

    clears all the memory so that none of the variables have any value and then rerunning everything from scratch in the right order

    14:00

    for sure and one thing that I found um so there are a lot of people who really like notebooks and there you know uh

    14:06

    Jeremy Howard gave an entire talk at a conference on why I'm wrong about notebooks and and so

    14:13

    when you push people on this issue of you know hidden things have happened and you don't know what they are they'll say

    14:20

    yes of course that you're using the notebook wrong. You should know to restart and run all anytime you make a

    14:27

    change like that. And I said, well, that's good, but most people, you know, either don't know that or know it, but

    14:33

    don't do it. So, it causes a lot of problems still. Um,

    14:38

    yeah, to me, it seems like putting the turn it off and turn it on button as a top level navigation button in the app

    14:44

    is a is a sign of a design flaw. Uh, but really notebooks were designed from the beginning just to be scratch pads uh in

    14:51

    like an academic setting. I'm not even sure it always was a

    14:57

    button. I I think it might have been a drop down menu at some point, but or maybe not even present in in uh uh

    15:03

    No, I mean there was always a there was always a restart and run all but option I think um because that was a a common

    15:10

    thing to do. But yeah, now since you gave your talk, there have been lots and lots of of startups that

    15:16

    have popped up uh doing notebook type environments, but nobody really addressed that hidden state issue. Have

    15:21

    you have you seen any movement on it apart from Zerve? We'll talk about Zerve in a bit, but uh have you seen much

    15:27

    change? Yeah, there's I mean there's another notebook called uh Merimo that um also

    15:34

    does reactive notebooks. It's interesting actually at that Jupyter con that I went to um they had a poster

    15:40

    session the night before the conference where people were presenting sort of experimental work and there was a guy

    15:45

    there in 2018 who had made an experimental reactive kernel for uh

    15:52

    Jupiter and so uh a kernel is basically the computational you know engine that

    15:58

    sits behind it and reactive kernel just means that whenever you change a cell

    16:04

    all the cells that depend on it update. And so that makes it so that you can't get in this weird hidden state because

    16:10

    if I change a cell that something else depends on, that thing that depends on it recalculates. And so the state stays

    16:18

    in sync. And so I I argued with this guy for a while. I tried to get him to bite the bullet that every Jupyter notebook

    16:25

    should work in this reactive kind of way. And he he wouldn't bite that bullet. But um it is a little bit

    16:31

    surprising to me that you know the core notebook product hasn't

    16:37

    you know gone through that many changes that would help with this. It's been what seven years.

    16:43

    What is it that you think people like about notebooks so much because there's no question they've been deeply adopted

    16:49

    by the data science and data analy anal analysis communities. So there there are a few things that

    16:55

    people like. uh one is this notion of literate programming, right? So in addition to having these code blocks, I

    17:01

    can put in here's a markdown block that's going to explain, you know, here's what I'm about to do. Here's what

    17:07

    I'm about to do. And so it makes for a very readable document. Whereas if you had a script in a Python file, it would

    17:13

    be much less readable as a document. Along similar lines, one thing that they do really nicely is that they allow you

    17:20

    to intermingle graphical output and text output. Right? So you say here I would like to run a

    17:26

    regression and now I'm going to, you know, put the graph and a table of the

    17:33

    regression coefficients right underneath it. And so as I read this thing top to bottom, it's got code, it's got

    17:39

    equations, it's got explanations, it's got charts, it's got tables, and it

    17:44

    makes uh it makes a really nice artifact, right? um this here's the

    17:50

    exact details of what I did and what the results were. And so to the extent that

    17:57

    you use it as an artifact, I think it's actually, you know, pretty nice and

    18:02

    gives you a good way to look back at here's what I did. Where you start running into trouble is when you know,

    18:09

    okay, this is not just an artifact. It's more of a living document. I don't want to go back and change this part of it and then what happens to the rest of it

    18:15

    and you know people sort of tie themselves into knots and you know maybe

    18:22

    they haven't specified here's the exact dependencies I need and so um it it it

    18:30

    they get sold not just as artifacts but also as this is the key to doing reproducible research and

    18:37

    it's like half the key to doing reproducible research the other half is having the right environment and you

    18:45

    know you can put some pip commands in the top cell and do things like that but

    18:50

    it require again it requires a lot of discipline to use them in a way that's really reproducible.

    18:56

    Yeah, depend dependency management is is a problem. I think probably 90% maybe more of people that work in notebooks

    19:02

    run them locally. Uh so that that's another potential issue. collaboration

    19:08

    is is pretty horrible. Sending like trying to manage notebooks in in like a GitHub or or sending files back and

    19:15

    forth like that's a headache trying to get someone's someone else's notebook running. Um have you ever done any work

    19:22

    collaborative work like using maybe like a Google Collab or some of these more modern kind of notebook environments

    19:28

    that they've essentially cloned Jupiter and they put it on the on the web and stuff like that?

    19:33

    I haven't I I mostly work in code. And so for me to collaborate is

    19:40

    really around writing scripts. Um, you know, when I when

    19:47

    I was at AI2, we we did a lot of experiments and the way that we aimed to

    19:52

    have them be kind of reproducible um was by having them driven by command

    20:00

    line tools with kind of these hyperparameter configuration files. Right? So, here's a script that's going

    20:07

    to read in from a config file that says, you know, use this many hidden layers, use this many input dimensions, use this

    20:14

    embedding model, and so on and so forth. And that way, if you have um, you know,

    20:20

    a variety of these blobs describing experiments, you can just say, let's run all these and and collect the results.

    20:26

    And so, that's the way that I've typically worked. Um, you know, when

    20:31

    when when Streamlit was big, I used it for a lot of I would say more

    20:37

    interactive demos than notebook replacement per se. More here's

    20:42

    a model. I want to give you an easy way to play with it. Um, you being someone who's not a coder maybe necessarily. And

    20:49

    so I've had a good lot of good luck with that. Um, I haven't done much of that recently, but yeah.

    20:55

    Yeah. Got it. Well, maybe we'll segue a little bit into Zerve and what we've been doing

    21:00

    in the space because we are releasing on Wednesday what we're calling uh notebook view. And on Friday, I think we gave you

    21:08

    access to it uh to kind of look at it and play around a bit. So, you're you by by no means have gotten to to get your

    21:14

    fingers as dirty as I'm sure you want to. But, I did want to pop up a few things. Let me share my screen here and

    21:20

    just talk a bit about kind of what we've been thinking about uh in this space and

    21:26

    uh maybe just go back to our original example. A equals 1 and then a add one

    21:31

    to a and then print it. And I've gone ahead and run these cells and you'll notice that a here is uh two. It's

    21:38

    always going to be two uh because of the way zer is is architected. So we call this notebook view but it's not really

    21:45

    uh a notebook in the sense of any traditional notebook. It's more in that it it looks like a notebook in terms of

    21:51

    the way you interact with the code. Um, so what each of these cells in serve is doing, and I know you know this, but I'm

    21:57

    just sort of explaining to the folks that are listening. When the code executes here, you notice you've got these two arrows here. These these are

    22:04

    your upstream dependencies, in which case in in this case, this block doesn't have any upstream dependencies. And in

    22:10

    this case, it has one downstream dependency, which is this next cell. And so we're tracking which uh which cells

    22:16

    are related to which cells. But when this block executes, it's actually storing its output. Uh so one of the big

    22:24

    issues for me with running notebooks is that if I send you a notebook, then you can see all of the charts and graphs

    22:31

    that I might have printed out, but none of the variables exist uh in in your version of it because notebooks are

    22:38

    typically in memory tools, right? you'd have to execute the code in order to actually see the values of any

    22:43

    variables. In Zerve, it's super different. So, in Zerve, uh, when I run

    22:48

    this code, what we're doing is we're caching and we're storing all of your variable values, uh, and then we're

    22:55

    passing them downstream to any any variables that that or any cells downstream that might actually need to

    23:02

    uh, use them as dependencies. And so, this variable A is getting stored. And

    23:07

    so if if I were to share this with you uh and you were to log in, we could all all be logged in together and we'd all

    23:14

    see the values of these variables as they uh are run um and and what they're

    23:21

    stored as. And so that's the reason that every time I run this guy, it's starting with its dependencies. And so it's

    23:28

    saying, "Oh, a which which a? Oh, it means this a and that a equals one. So I'm going to add one to it and print

    23:34

    it." So no matter how many times I run it, in this case 1 plus 1 always equals

    23:39

    two just because of the way that we've structured the notebook. Uh and so it's

    23:44

    it's actually significantly different from uh the way that notebooks today

    23:50

    typically function in because of the way we have the architecture set up.

    23:57

    Yeah, that's uh that's helpful. Uh in fact another way the way that

    24:03

    we've always had to look at these guys before is as uh a DAG. And so this is

    24:08

    another way of visualizing that exact same project uh where we've got one cell that or in this case it's a block but

    24:16

    those are this is the same project. It's just viewed as a DAG instead of as a as a notebook. this cell executes and then

    24:22

    it's passing its variable space down to subsequent blocks where where when those

    24:28

    blocks execute then they start with the the starting points uh and that way the

    24:33

    results are repeatable reproducible uh and the same every time and you can't get into that bad hidden state situation

    24:40

    that you often do with uh with notebook environments. So I'm actually really curious about the

    24:47

    way that you work uh when you're writing code. My guess is that you're typically in like a api file in like a VS Code

    24:56

    kind of a very hardcore type engineering type environment. But when you do have like um you know like visualizations you

    25:04

    want to look at when you're doing data exploration and things like that uh what type of environment do you find yourself

    25:10

    working in? So I have forever been a big user of

    25:18

    the IPython console. I don't even know if it's called that anymore, but that's what I call it. That's what it's always been called. Um, and so what that is is

    25:28

    a, you know, terminalbased console that has a lot of the IPython or Jupiter

    25:36

    magics, if you will. And so, you know, in some sense, what you could think of

    25:42

    it as an appendon notebook and and that's another thing

    25:47

    that I actually have talked with them about before too is, you know, if you could make notebooks so that they were

    25:53

    appendon. Um, tell me what you mean by that. What What do you mean by append only?

    25:59

    What I mean is that once I run a cell, I can't change it and I can't change any

    26:04

    cells above it. Um, ah, I see. And so the way I use you know

    26:09

    IPython the console is like once something has run it's run and it's there in the terminal history and I can

    26:15

    I can now overwrite it um by saying you know again a equals 3 when before I said

    26:22

    a equals 2 but if I scroll back I'm going to still see the a equals 2. Um, and and so, you know, that was another

    26:29

    thing that I that I pushed the Jupyter folks on, you know, several times is

    26:34

    what if you had a notebook kernel that was append only, so I can only add new cells at the end or where if I go back

    26:41

    and change a cell, it gets rid of every cell after that and and so I'm basically rewinding, but then I'm starting from

    26:47

    where I rewinded and I can't keep the other things that I've done since then. Um, Gotcha. Okay. I could see that as

    26:54

    another workaround of that hidden state problem for sure. Yeah. Uh I'm not sure I call it a

    27:00

    workaround. I just think it's a it's a different paradigm that doesn't, you know, allow um the hidden state.

    27:09

    One of the things that I've seen quite a lot is um uh wanting to schedule a

    27:14

    notebook to run on a recurring basis. Do you ever go down that road? Do you play in that space at all?

    27:20

    I don't. I'm I I'm not much of a data engineer and so I've seen that mostly in the data engineering space. I have a

    27:26

    pipeline that runs every night and I want to you know take the output of some

    27:33

    you know job or table or whatever and pipe that through this notebook and then

    27:38

    put the output there. So it's basically you know the notebook contains my you know ETL rules and um so so I I I've

    27:47

    seen that happen. Uh but I I generally am not involved in that part of the stack.

    27:53

    Gotcha. And is that is that a good way to uh to run code like to run a

    27:58

    recurring job as a scheduled notebook or how how would you do it if you were going to do you have a script file like

    28:05

    a cron tab or do you you know what what kind of do you think is the most useful when it

    28:12

    comes to like data pipelines and things like that? Um I again I'm not a data engineer so I'm

    28:19

    not going to like make uh I don't have strong opinions on that. I think you know so so here here's what I'll tell

    28:25

    you is that all else being equal I would prefer having my data transformations written in code that way they're easier

    28:32

    to unit test um and I can use my ID to edit them and so on

    28:38

    and so forth. I think the teams that I've worked with that used notebooks as

    28:43

    um for these data engineering pipelines the value to them was that they would

    28:49

    generate charts and graphs as part of the data pipeline and then you know as we were talking about earlier then they

    28:56

    would save this as an artifact and so now when I want to understand what did my data pipelines do last night I don't

    29:03

    just look at logs necessarily but I have these notebook artifacts um

    29:09

    that that I can look at and understand better. I think there's some questions. Oh, okay. Here's one. George says,

    29:14

    "We've seen a lot of new tools promise reproducibility, but they end up locked into proprietary formats or hosted

    29:21

    runtimes. How does Zerve avoid becoming another walled garden?" Uh well, um good

    29:26

    question. Uh Zerve, uh so we don't create any proprietary formats. Um all

    29:32

    of our code is just straight Python code. and when you connect it to to GitHub or Bitbucket or any of the the

    29:38

    hosting tools, uh it's it stores your code as plain text and and so on. So,

    29:43

    we're using kind of open source technologies to uh you know to to preserve all of that sort of thing. So,

    29:49

    you can always kind of take your ball and go home if you want to jump out of Zerve into some other type of environment.

    29:55

    I should mention also by the way for the folks that are on uh if you want to sign up for Zerve, it's a it's free to use.

    30:01

    So there's a there's a you can create a free account at serve.ai and get in there and and connect your data, upload

    30:07

    data, use our agent. Uh in fact, maybe that would be a good thing to do now is just kind of demonstrate what our agent

    30:14

    actually looks like. Let me share my screen again. And I will just jump in here and I'm going to say I'm sharing my

    30:20

    screen, right? Yep, I'm sharing my screen. I'm gonna say uh I'm doing a

    30:26

    live stream and I need a cool demo. use

    30:32

    uh made up data. Uh so we are we're connected so this is

    30:37

    all running in the cloud and we're connected to uh the uh in this case it's running in

    30:44

    AWS and we're integrated with all the various large language models and we've

    30:49

    built an agent that can actually do some really cool stuff uh in terms of like writing code and running code and

    30:54

    executing code. Um and in this case it started in in DAG view. So, I'm going to go ahead and just jump over to Oh, so

    31:02

    here's its plan. I'll just read through this plan real quick before I switch to notebook view. Uh, so it's going to

    31:08

    generate some data. In this case, it's chosen to uh create an e-commerce data

    31:13

    set. Uh, and it's going to build some parallel branches with sales metrics and customer segmentation and stuff like

    31:20

    that and then make a dashboard. So, I'll just click approve plan. Uh but let me go here and just jump into

    31:27

    open as notebook and we can see that same as it starts to write code it'll start to create those blocks in uh

    31:35

    inside of this notebook and we'll be able to see what it's doing. Um so Joel while it's thinking oh it started to

    31:41

    create some data already. We'll look at that in a second here. While it's thinking what have you seen in the collaboration space? So in terms of like

    31:48

    uh data scientists wanting to work on creating uh projects as a group and and coding as

    31:55

    a team, how how are people doing that in your experience today and is it working?

    32:01

    Um you know you see a lot of people um

    32:07

    using Jupyter Lab and and so you know I've worked at companies where they set up big Jupyter Lab instances. So that's

    32:14

    like a hosted Jupyter notebook server where people can share notebooks on the

    32:20

    server and work on them. Now I don't I don't know that that admits kind of

    32:25

    multiplayer editing, if you will, but I I think that still is a pretty common

    32:32

    way of doing it. Okay. Uh yeah. So I've tried using tools like

    32:39

    Collab and and a few others and I always find that it's a bit uh a bit sketchy in

    32:45

    terms of getting getting kind of uh collaborative coding working especially when you want to run uh run your code.

    32:52

    Um the way that Zerb actually executes is that each cell has its own kernel

    32:58

    basically. So when you when you click it off when you click run on a particular block now the agent is doing all the

    33:03

    running here. Uh and so you can see it's creating a boatload of code. Uh and here's like for example the data summary

    33:10

    of the data set that it created. But when each of these cells ran, they actually create spun up um serverless

    33:17

    compute to actually execute the the code. The default is is using lambdas. Uh but you could change that to GPUs or

    33:24

    you could change it to like Fargate containers, but it all runs serverlessly. And so one of the upshots

    33:29

    of that is that you could execute code uh as many you could run as many of these blocks or cells at the same time

    33:36

    as you wanted and they don't interfere with each other and the the project is

    33:42

    smart enough to know uh where the dependencies are. And so if I down if I

    33:47

    upstream if I made changes to this block and then executed it then the downstream

    33:52

    blocks they wouldn't automatically update because we want to maintain. We don't want to mess somebody up if they're working downstream and and the

    33:59

    upstream data happens to change but the blocks are smart enough to know that there's been an up upstream change and

    34:05

    that they need to be that everything needs to be re-executed in order to reflect the latest data. Uh so uh like

    34:13

    for example, if we go back to our DAG view on this graph, you can see there's quite a lot of like a nonlinear process

    34:20

    that's being created here. Um and if if I kicked off all of this to run again, you'd notice that you could have

    34:25

    multiple blocks running simultaneously. And so each one being independent and knowing what the dependencies are makes

    34:32

    it really useful when it comes to uh wanting to run stuff in parallel and wanting to collaborate with other

    34:37

    teammates. So have you think have you seen do you see a need for something like that? We

    34:44

    certainly have talked with lots of teams where they're like collaboration we don't really do that. Is is

    34:49

    collaborative real time like synchronous collaborative coding is that a thing that you think is going to grow or is

    34:55

    that going to be something that's eh like teams don't really do that or what are your thoughts on that?

    35:01

    Um, you know, I I think what's probably going to grow is real time collaborative

    35:08

    coding where you're collaborating with the AI um is is more likely to grow than

    35:14

    collaborating with other people. I mean, if anything, the the way of the world seems to be pushing a lot of this work

    35:19

    onto the AI like you just did, right? Yeah, that's fair. That's fair. So, the collaboration would be more in

    35:26

    terms of like handoffs from from team to team say. No, I I I mean that that's one aspect,

    35:32

    but another aspect, you know, you could imagine um collaborating with the AI,

    35:38

    right? Like the AI is making changes, I'm making changes. We're trying not to step on each other. Uh, I don't know

    35:44

    that the the current coding assistants are are good at that right now, but yeah, I think I feel like or or or multiple

    35:54

    um, you know, people will spin up multiple cloud codes or multiple codec CLIs at the same time and try to get

    36:00

    them not to step on each other. Right. Right. Yeah. So, we do something similar there.

    36:05

    So, if we wanted to kick off a new chat uh and say, "Hey, could you explain

    36:12

    what this uh code is doing, please?" Uh

    36:17

    then, you know, you see how you can have multiple agents kind of doing all sorts of stuff at the same time and and

    36:23

    getting them to not step on each other's toes is, I think, a sort of an unsolved problem at the moment. Um the way that

    36:29

    we've tried to tackle it is by controlling the scope of what each agent can do. So, if I jump back to uh to the

    36:37

    the notebook view here, I can pick one of these blocks. Oh, this is interesting. Remind me to come back here

    36:44

    uh to this block here because the agent is self-healing. It's created some code that it's uh that's got some errors in

    36:51

    it and uh it's it's uh made a mistake and so it's going to go back and try and fix it and self-heal and all that kind

    36:57

    of stuff. So, that'll be interesting to see if it'll work. Um but I can go in and I can

    37:02

    there's a couple more questions. Oh, do we have more questions? Sorry, I'm not I'm sharing my screen and I

    37:08

    can't see the uh the thing. All right. In practice, most data teams mix code from multiple languages and

    37:14

    environments. How does a canvas model handle that complexity better than just gluing together scripts and notebooks?

    37:22

    Uh have you experienced that? Do you find teams working in multiple languages and uh trying to knit all that together?

    37:29

    Uh I've I've seen it before. Um it it depends on the organization. I mean, some organizations leave it up to the

    37:34

    team until you have a team that works in R and a team that works in Python. Um, and maybe even a team that works

    37:41

    TypeScript. Yeah, I mean SQL. Yeah, SQL, I guess, is

    37:46

    a good example. Um, and then some teams are are much more standardized. I think I think it goes both ways, but yeah,

    37:53

    it's definitely a real thing. H. So the way that we handle it um when when each

    38:00

    code block uh executes and serializes the output um that does something kind

    38:06

    of unexpected I would I would guess. So let's say we were running uh doing some pandas code and we execute that code and

    38:16

    we've got a data frame. That dataf frame is going to get serialized uh in in most

    38:21

    cases as a parquet file. And it turns out R know knows how to deal with parquet and SQL knows how to deal with

    38:26

    parquet. And so in the deserialization step when the next block runs it might be an R block or a SQL block. Uh and

    38:33

    they can use those Python objects directly in Zerve. Uh so within a single Zerve notebook or a single Zerve canvas,

    38:41

    you can use R to visualize a a Python data structure or ggplot to plot

    38:47

    something from uh from from Python. you could SQL query a pandis dataf frame uh

    38:53

    and that's all sort of by virtue of how the uh the serialization and

    38:58

    deserialization steps impact the way the code runs. So we've seen a few teams

    39:04

    that have actually most commonly pardon me most commonly the managers are using

    39:10

    R uh and the team is using Python. Uh there seems to be an age gap between R

    39:15

    users and Python users. Uh and so a lot of times we'll see a a a manager that

    39:21

    uses R and a team that runs Python and the manager wants to get in there and get his hands dirty or her hands dirty from time to time. Uh, and being able to

    39:28

    do that interoperably between languages can be really interesting.

    39:33

    Uh, let's see. You're framing the canvas as a kind of bridge between exploration and production. Does that risk

    39:39

    satisfying neither side? Too rigid for discovery, too visual for engineering? Uh, I'm actually really curious about

    39:45

    your answer to this question. Uh, what would it take to get you using something like a Zerb for for something like a

    39:50

    production process? So here here's where I think my

    39:56

    challenge would would arise is that the sort of experiments that I need to do

    40:05

    involve interacting with systems that are maybe running locally on my machine, right? So maybe I have um some kind of

    40:15

    service that's running locally and my experiments need to be making calls to this service possibly in a decoupled way

    40:23

    using like a rest APIs but possibly in a more coupled ways. And so because of

    40:28

    that, for my use cases that I have today, having sort of a like a cloud

    40:35

    environment, whether it be a Jupiter lab or or or a collab or or reserve or

    40:40

    whatever, will be a little bit tricky, I think.

    40:46

    uh what are some examples of those local kind of uh uh connections that you need to make just so I can

    40:53

    so so for example you know um let's say I'm I have a AI enabled SAS product and

    41:00

    it has backend services that are powering it and now I want to do experiments with where I'm you know

    41:08

    changing some of the parameters that the system runs with and now I want to Um,

    41:15

    I want to see are the results better, are the results worse. Um, and I need to do that by, you know, potentially

    41:23

    stopping restarting the system or running multiple copies of the system. Um, but but I have

    41:30

    services running either locally or in the cloud somewhere that will be

    41:36

    um a little bit more work. It's not impossible to talk to them from, you

    41:41

    know, a hosted notebook or whatever, but it's a little more work than if I just want to run some kind of hermetically sealed experiment where the notebook

    41:48

    contains everything that it needs to know, right? No, that that totally makes sense. How do you think self-hosting uh

    41:55

    or um it's to me? So I do all of my data science and data analytics,

    42:02

    pardon me, in in Zerve now in the cloud and I would never ever want to go back

    42:08

    to local development. Uh just because when I have something that I want somebody to look at, it's just so easy

    42:13

    to just send a link and share a project. Uh I view it as kind of like moving from Microsoft Word to Google Docs. Uh like

    42:20

    who in their right mind would ever use Google Docs now? uh unless there was some forcing function that required you

    42:25

    to like the the only use case I've seen is uh legal uh like law firms that need

    42:32

    you know some some sort of a word feature that's not available in docs. Well, you know, my Microsoft Word is is

    42:38

    is a cloud program now. So, well, yeah, maybe

    42:44

    it it is. I promise. I used it in my last job, so Oh, okay. Right. Well, anyway, they they

    42:49

    certainly caught up, I guess, then. But yeah, just from an analogy perspective, to me, moving from Jupiter to something

    42:56

    like Aerve or a cloud hosted notebook, uh there there are major advantages to doing that. Do you find that to be the

    43:02

    case? Um or is it just me in the way I operate?

    43:07

    No, I mean I I think there's um yeah, there there's pros and cons to

    43:13

    everything, right? So, some of the things you're saying about making it easier to share, uh, making it easier to

    43:19

    have multiple people working on stuff, um, certainly that's the case. And if those are the things that you're

    43:24

    optimizing for, having a way of hosting it, be it serve or jupitter lab or

    43:30

    collab or whatever helps with that. Um, at the same time, you know, you're

    43:37

    giving yourself an extra dependency. You you can't work on an airplane. Um, if

    43:42

    you're the sort of person who works on an airplane, um, I can't work on an airplane because there's not enough room to open my laptop, but it's a different

    43:47

    issue. It might be a benefit, I guess. No, I'm kidding.

    43:52

    Right. Uh, h in terms of um the the stack and how it's moving to the cloud

    44:01

    for data scientists and the way that they operate, have you seen any trends that are interesting in that space?

    44:09

    Uh that's uh here's part of what I'm thinking. So like when during the pandemic I was at a company called data

    44:16

    robot and we were doing some work with uh with the with health and human services on uh simulating the the effect

    44:26

    of recruiting for the the vaccine trials in different areas. And so you know if

    44:32

    you recruit here this is how many people that you'll get and this is how long your your uh vaccine trial will take

    44:39

    given given certain assumptions and so on. And at the time there were you know thousands of people dying every day of

    44:45

    COVID. And so it felt to us whether it was or not it felt to us like what we

    44:51

    were doing every minute mattered uh in terms of like being able to to help. Now, it's a thorny and political thing

    44:58

    to talk about, but just in terms of like data science and infrastructure and stuff like that, we were working in

    45:03

    notebooks and we were like trying to email and slack notebook files back and forth and you know do that whole thing

    45:11

    and getting to the cloud and handling like the orchestration and and provisioning resources and stuff like

    45:17

    that was not something that we were uh able to do quickly. uh and so that that

    45:23

    was a major stumbling block and some of the folks that are using Zerf have have made a similar uh observation.

    45:30

    Yeah, I think that's a fair point and and I think also one thing that I've seen over the past, you know, 5 to 10

    45:36

    years is that a lot of companies that had been reluctant to move to the cloud

    45:43

    more broadly, not just in data science, have really embraced that. Um and and so

    45:49

    there is a lot of appetite for you know we want to embrace cloud solutions to

    45:55

    things. Um all right so maybe we'll wrap up a

    46:00

    little bit here. What uh what's next? What like what do you see as the big like if you had to put another talk

    46:05

    rewrite it out there uh you know I don't like X about the data science development uh uh space the toolkit

    46:13

    that's available to data scientists. What's What's Joel Gru's part two at the next uh Jupyter Con event?

    46:20

    Uh, honestly, it's uh it's been the talk would be it's been seven years. Why

    46:25

    haven't you fixed all these things? All right, that's fair. Oh, we got one

    46:31

    more question, then we'll uh we'll go and wrap. It says, uh, you both talked about determinism and hidden state, but isn't some amount of non-determinism

    46:38

    inevitable once models, randomness, and APIs enter the picture? How far can

    46:43

    tools really go to enforce reproducibility? So I think you know

    46:50

    when I was at AI2 and we were training our own models um if you if you have a

    46:56

    model that runs on a you know single GPU and not in parallel and you're setting

    47:02

    the seed then it is like pretty reproducible like you'll get the same result every time. Now, if you're

    47:08

    running on multiple GPUs, um, you know, or if it's a hosted model, maybe, uh,

    47:14

    maybe it's a little bit less reproducible, but but I think, um, you know, if you're calling, you know, GPT5

    47:21

    and you set the temperature to zero and you give it the same, um, you give it

    47:27

    the exact same model date and whatever, then in theory, you should be getting

    47:32

    the the same sorts of things back. So I I think um so I'd say I think there are

    47:39

    ways to get things pretty reproducible in most realistic cases. I mean if you're you know if you're a data science

    47:45

    team that's training XG boost models or whatever you can get the exact same one back I think. And the second thing is

    47:52

    that you know if you actually do have a model that is so noneterministic that

    47:58

    when you run it with the same the same inputs you get like vastly different answers and you can't get the same

    48:05

    answer every time. I think that tells you something interesting too. Um, but I I do

    48:10

    well no. Um, I wouldn't say you've done something wrong, but what I would say is that you haven't, you know, maybe solve

    48:18

    the problem as well as you think you have if, you know, your solution only works half the time. Um,

    48:26

    but I I do, you know, I do cling to this idea that that I think things are, you

    48:31

    know, can be reproducible if we take the right steps. Awesome.

    48:37

    Well, hey Joel, thanks for taking the time on this beautiful Monday morning. Uh it's a pleasure to get to talk to you

    48:42

    and uh I'm uh I'm really uh really sort of optimistic about how the data science

    48:50

    coding kind of landscape has changed. My the way I code is totally different than

    48:55

    it was a year ago with the with the large language models that are out there and with uh with the environments that

    49:01

    are coming and all the tool changes. seems like I mean you always hear this that the pace of uh innovation or or

    49:08

    change is is fast but it seems like it's even faster than it has been. Is that the sense that you get?

    49:14

    Yeah. Every month I have to cancel one $200 a month subscription and sign up for a different $200 a month

    49:20

    subscription because it's better. Awesome. All right. Cool. Well, Joel,

    49:26

    thanks again. We really appreciate it and uh this was a really great conversation. I would remind everyone I

    49:31

    was glad to chat. Excellent. Uh anyone that's listening, you can go to zerve.ai and and sign up

    49:37

    for a uh a free account. Get in there and give it a try and send us your feedback. We'd love to hear it. Thanks

    49:43

    for joining everybody.

Related Videos

Decision-grade data work

Explore, analyze and deploy your first project in minutes