The Cursor Moment for Data Science: Context at the Core
September 16, 2025
Data science is not software engineering. The difference is context. In this livestream, weâll unpack what âcontextâ really means, why itâs critical for data science, and how it can change the way teams work. Youâll hear about data scientists at leading companies whoâve used context-aware workflows to cut wasted effort, streamline collaboration, and move faster from idea to impact. Weâll also give you a live demo of Zerve, the first AI development environment for data scientists, with context built into its core.
Key takeaways:
Why context is the missing ingredient in traditional AI coding assistants for data science
Real-world examples of context-aware workflows driving productivity and collaboration
How context-aware AI agents keep hypotheses, data, and code aligned, so you avoid wasted cycles
A live look at Zerveâs context-first agentic approach to data science
4:29
All right. Good morning, everybody. My name is Greg Michaelelsson. I'm one of the co-founders here at Zerve, chief
4:34
product officer. Got the music going in the background. All right. All right. Today's topic is
4:42
coding agents, particularly as it relates to data science. I was just listening to a podcast this morning uh
4:48
from the daily and they were talking about all of the crazy holes that people are falling down as they're talking to
4:54
large language models. Seems like they're everywhere in the news these days. I just used Chatty PT to come up
5:00
with a recipe for hot honey mac and cheese. It was absolutely delicious. So,
5:06
I go back and forth between thinking that large language models are just kind of the smartest dumb chat bots that have
5:11
ever been invented or something really, really different. Um, I'm joined by
5:18
Jason Hillary, one of our other co-founders here at Zerve, chief technical officer. Uh, Jason, say hi.
5:24
You want to introduce yourself a bit? Yeah. So, uh, hi everyone and, um, hi Greg. Um, so looking forward to to kind
5:31
of the the next hour. Um, it's been a long-standing thing in reserve to try and get you on a a podcast. So, this is
5:38
kind of the first first step towards it. Well, it should be a good conversation
5:43
today. Anyway, uh, but like I say, every every time you turn the news on, every time you're reading on LinkedIn,
5:48
everything's about AI, everything's about large language models. And maybe the biggest impact that uh these guys
5:55
have had these large language models is in the space of coding uh in the space of writing code because it turns out
6:00
large language models are super good at writing code and uh that's impacting
6:06
everybody in this space somebody who isn't Oh sorry go ahead
6:12
Jason oh yeah just on on that the yeah it's kind of a killer application for for
6:17
like agents in general um is the the coding side of it especially with um the likes of uh Claude just pretty much
6:24
focusing on on coding in general. It's like that's kind of their killer killer niche.
6:30
It certainly is changing the way that people write code today. It certainly changed the way that I write code.
6:35
Someone who isn't here with us today is Philly Hayes. He was supposed to be here. So, I know his uh fan club and
6:41
entourage are disappointed that he's not here. He's a bit under the weather. So, we're sorry that our other co-founder
6:47
and CEO aren't here. Everyone's very disappointed. And I'm sure we'll we'll suffer in comparison, but we shall
6:53
bravely soldier on. Uh, it's 9:00 a.m. for me here on the on the west coast of the United States. And Jason, what is it
7:00
for you? It's Yeah, 5:00 pm here on the west coast of Ireland. Nice. Okay. Well, we truly are
7:06
multicultural here. Uh, well, before we dive in, uh, Jason, I want to make sure
7:12
we're all sort of starting on the same page. Uh, can you just talk a little bit about what coding agents are? like what
7:18
what do what do I mean when I say coding agents so we can all kind of get on the same page. Yeah. So I think I think um everybody
7:24
kind of has um been exposed to like large language models and what they're able to do through chatbt and I think
7:29
the the core thing with a an agent and a coding agent um in particular is like its agency or
7:37
its autonomy to to do tasks kind of end to end. So it's um where um it all would
7:44
have started with GitHub copilot and kind of like the autocomplete and then
7:49
uh people would have like migrated to using chatbt to uh enter a prompt, give it extra context to get some code it
7:56
could copy and paste into its code editor, run the code and iterate that way. uh coding agents um get much closer
8:03
to the to the code and often run inside of your IDE or your terminal and they're
8:09
able to kind of go end to end from planning to debugging to testing to
8:14
execution and they're able to iterate so able to actually like work on the code have full context of your code base
8:21
access to tools like running the code uh accessing the terminal installing packages everything like that. So being
8:27
able to like autonomously act as if a developer would inside of your IDE.
8:33
That sounds dangerous. Yeah. But fun. Yeah.
8:38
But dangerous but fun. That's like our slogan here. Um Yeah.
8:43
How uh how do you So I'm not a I'm not an engineer, you know. I'm not an engineer. I'm a data scientist. I write
8:49
code from time to time, but it's always bad code. Uh as with most data scientists. Uh in fact I remember one of
8:56
the first events we did at Zerve uh we were talking about how using Zerve to
9:01
develop data science could help people code better because it makes you think in a modular way and the code is
9:06
organized as a graph and stuff like that and one of these data scientists in the room stood up this was in Einhovven in Europe
9:14
and uh he goes well why can't you just make data scientists be better coders and I was like good question uh I don't
9:22
know the answer to that question and I don't if large language models actually make anybody better coders. But how have
9:29
I I think what they can do is like standardize kind of like code quality across it anyways because it's all like
9:34
um effectively the same models writing 90% of the the code. Um there is a thing as well where um in a lot of data
9:42
science project projects like optimizing too early can be a problem. So like bad code that works um h isn't always the
9:49
worst thing to to start with. So, it's like you might be throwing it all out in like a couple of hours time. Anyways,
9:56
that reminds me of an argument I have all the time at home. I I like to when I load the dishwasher, I don't like to
10:02
prescrub the dishes and uh I'm I always get yelled at because some of them don't get clean and
10:08
I'm like, "Hey, a lot of them did get clean." So, that's kind of like bad code that works. I I I' I'd be I'd be on the other camp.
10:15
I would have grown up Yeah. pre pre-washing doing the dishwashers work for us. Yeah. Yeah, I don't like that. I don't
10:21
like that at all. How have engineers kind of uh adopted these things? Do they like uh coding agents? Is it is it do
10:27
they view them as uh like uh invaders or or as a threat to their work or what's
10:33
what's the sense? Yeah, I I think it's like um pe people have like adopted it. It's taken a
10:39
while. Um there was like early adopters like most things and then there was like some people that were very skeptical how
10:45
people like to write code and there's like some interesting like um patterns where we've heard of like um developers
10:51
who use the autocomplete features in the likes of cursor but never accept it and
10:56
will like rewrite the code themselves even if they like it. Um so I'm not 100% sure the the benefit of it but that's
11:03
kind of like a lot of people have found some like interesting ways to to work with it. Um, and then there's more
11:10
recently I think um the likes of like cloud code is like more more agentic than like autocomplete or like the tab
11:16
functionality and cursor and it's gotten like good adoption. I think there's like um yeah a prevailing sense that it's um
11:24
it's the future like it's here and it's probably here to stay. Yeah, I've always found the autocomplete things to be very
11:31
distracting and like the user experience like the actual interaction with the the models is very clunky and hard to you
11:38
know it kind of interrupts your flow a bit. Yeah, I would have it's like um or you just kind of like sit there and wait
11:44
like watch the cursor blink for a few seconds just for it to fill in something. it can be good to un unblock
11:50
you a little bit but um I would have found uh a lot of the time I would have still used like chat GPT or something
11:56
like that to create the code that I would copy back in and then like kind of go back and forth to to debug.
12:04
Yeah. Now, I've also one of the other things I found is when you use these coding agents that they they tend to be
12:10
good at starting projects and at at kind of jump starting and giving you some boilerplate and getting something at
12:17
some foundation built, but they struggle a bit when it comes to like working on a project that's already partially done.
12:24
Why is that? Um, a lot of it is probably to do with context and like um just
12:30
being able to to find the right like context with inside a a code base or um
12:36
maybe like uh yeah just not being able to deal with larger context in general. I think models are getting better. It's
12:43
like if you have larger context windows, they have like less of a tendency to forget the top of the file um we'll say
12:52
than um uh before. But um it's still somewhat of a an issue where and you do
12:58
get like hallucinations still. So um I was using it on front-end code so not
13:03
data science code for for this particular project and it h hallucinates like this kind of like styles all the
13:10
time. Just puts in like colors that it thinks should be there that aren't actually there and things like that. So it's like um um yeah, so it can get like
13:17
infuriating sometimes for when you think it's easy um that it can um yeah kind of
13:24
like struggle. That's funny. Um I was actually using uh chatpt to to write a a query uh to to
13:33
use the OpenAI API to submit requests. And it invented an API and I when I
13:39
pointed out that that API didn't actually exist, it said that it should exist.
13:45
Well, it's it has a point. It's like Yeah, it's um So, I see we have a question in there. Um Greg, it's like
13:52
maybe one that we were kind of like maybe planning to get to get to at the end, but it's like um probably a hot
13:57
topic. So, I I don't know if it's one that you want to um kind of t tackle from the start.
14:03
All right, let's see. It says, "If contextaware agents succeed, what roles or tasks for data scientists do you
14:09
think will disappear and which ones will become more important?
14:14
Good question. What do you think, Jason? Uh, I think there's um, yeah, if you kind of think about what the world could
14:20
look like if there was like agents that could kind of go end to end on lots of tasks. Um, for data scientists, I think
14:27
a lot of the value could be in uh, like experimenting like much quicker um, in
14:33
terms of being able to try a lot more like experiments in parallel. So there's uh somewhat like supervision of like
14:40
multiple agents becomes like uh and the ability to easy like quickly switch
14:45
contexts um becomes like um important. Uh I do see a world as well where agents
14:51
don't just do the development side because um in data science in particular
14:57
you're typically like producing like an output like a model that you want to get into production and being able to like heal a pipeline or be able to kind of
15:04
like monitor and like retrain models are things as well that I think agents could be uh useful for. So I think ideas
15:13
and being like creative um understanding like data its limitations and evaluating
15:18
like results are like all things that the human uh still does. They probably write less code and hopefully do like
15:26
less debugging. I would say yeah I'm very optimistic about the uh
15:32
improvements that these models have been making. I think the difference between like a a chat GPT you know three and
15:38
five is is pretty remarkable. Uh, and so my I'm focused mainly on getting the
15:44
user experience right. Like how can I make this thing as easy to use as possible because I know that the models are going to get better and they're just
15:50
going to become easier and easier to kind of integrate into your life and workflow and coding and all that jazz.
15:58
Um, yeah. No, it sounds um, yeah, the user experience is like a actually a really interesting one in terms of um, I
16:05
don't think the final form has been there yet. It's kind of been an evolution of like the traditional like
16:13
coding experiences be the Jupyter notebook or like the VS codes that have
16:18
like um IDs like put onto them uh or like agents kind of like embedded and put onto them which is a very good
16:24
starting point because there's like a familiar familiarity for like the the users. Um but yeah, I can see coding
16:31
looking a good bit different and people maybe using like more natural language to to kind of like do things. um as the
16:39
people get more comfortable with the the tools and they're able to like work for work for longer.
16:45
I was listening to NPR the other day and I heard a story about a woman who had fallen in love with JPT
16:52
uh and it it was an awesome story but her problem was that after she'd spoken
16:58
like been talking to him it JPT for like three or four days it would forget who she was uh because of this
17:05
context window. Can you define context window just so people kind of understand in case anybody doesn't understand what
17:11
that is and how it relates to what we're talking about? Sure. So um in your large language models these kind of like um these uh
17:19
they're basically just like um predictors they take a a set of like um input tokens which are like characters
17:26
of um of like um like a language. So English, German, it could be any kind of
17:32
language and then it goes through this model. uh the numbers all get crunched and what it produces is like a
17:38
prediction of like the next set of like um characters. So it kind of like takes in one set of uh characters and it
17:44
produces another set. Um so the context window is that input set of characters.
17:50
Um so that can be um so context is effectively what you input into it and
17:56
you have two different types of inputs. You've got the system prompt which is
18:02
the instructions for the the model. So kind of all of the background information that the user doesn't
18:07
typically see. And then you've got the message history. So this is like what the user types in, what the model
18:13
responds with, and then when the user sends the next message, it's got all of the previous messages. So they can um
18:19
effectively output the the next set. So it's um it's a yeah, the limit on the
18:25
the characters that can go into the model. And there is a limit. So there is like an upper limit. Um they are getting
18:32
bigger. So it's like um the larger like cloud models are 200,000 and there's one
18:37
that has like a million characters now. Characters or tokens? Oh, tokens. Tokens. So it can be multi.
18:43
It's like you typically get three to four characters in a talk. Wow. So that's huge. A million tokens is
18:49
a ginormous context. Yeah. Got it. So when I send in a prompt to chat GBT, it's more than it's getting
18:55
more than just the question I asked it. Oh, definitely. It's having all of the instructions like how it should respond.
19:01
Should it be like um should it be like playful or not? Should it like tell Joe? All of those kind of things are like
19:07
hidden inputs into the the model and then when people use it in a business context, it'll have that information
19:13
around like um the task that it's doing. The coding agents will be told um like
19:20
how they should uh react, what kind of language they should use um and any kind of like pitfalls they they might fall
19:26
into. So it's like make sure that you're like checking your syntaxes, all of those kind of um all of those like extra
19:34
things that make the model behave well. And there was actually an interesting thing I saw earlier. Um so just uh for
19:41
one of the the demos earlier I was kind of playing around with a financial uh data set and I was wanted to do some
19:46
like encoding using one of the uh large uh language models and there was a
19:52
fine-tuned one called I think it was like fin or something like that um on
19:58
hugging face and uh I was working with the zer agent and I was like can you do
20:03
the encoding and it came back and said um that actually um people have shown
20:09
that uh using GPT40 with um uh just like
20:14
proper or like um like um just like um that context like additional context
20:20
outperforms the fine-tuned model by up to 10%. So um not something that I would
20:26
have thought of a year ago where a fine-tuned model on uh like financial data sets would do uh worse in terms of
20:34
if you take a general purpose model and just give it like specific context related to the the task it ended up
20:41
outperforming it. H that's wild. So the the general allpurpose models have become so
20:47
sophisticated that they're better than some of the more narrow purpose. Yeah I I wouldn't have
20:52
guessed that. um even six months ago. Huh. I actually I wouldn't have guessed it today, I think. But uh there you go.
21:00
Yeah. All right. Well, hey, let's jump in. Uh maybe we'll start with cursor or one of
21:06
the other sort of um uh uh coding agent tools that are out there for for more
21:11
engineering and software development purposes. Do you think you could show us uh just pull it up and give us some examples of how the agent helps with uh
21:19
writing boiler plate or doing code completion or something like that? Yeah. So um so for this I just have like
21:24
a a sample project and tell me when you can see the see the screen. Yeah.
21:29
Um so this is this is cursor. It's um a very very cool uh software development
21:35
tool. Uh it runs locally. You have access to the terminal. You've got the uh agent and kind of like chat
21:42
functionalities over here. And um to start with because it's a data science kind of a project what I'll do is I'll
21:48
just ask it to create a streamlet app. And I've taken a Kaggle data set that has like global temperatures here. Um so
21:56
uh what I expect cursor to do here and what it does like really well is that it's going to like um create a to-do
22:03
list. So it'll like examine the files that I have. It'll create the stream the file probably put it in my file system
22:09
here and then write all of the code. Um and you can kind of see that it's like
22:14
working through this uh to-do list. It's reading all of the files. Um, and it'll kind of like do everything from like
22:21
running it. It's now going to create this um, temperature dashboard py. And this kind of stuff I think is like um,
22:28
uh, cursor is like excellent at. And this is kind of like scratching the surface um, in terms of like what it's
22:34
able to do in terms of like work working with like large file systems. Um, because what it does under the hood is
22:40
that when you have a give a cursor access to a codebase, it'll index it all. Um, and then it can it's able to
22:47
like intelligently search it so that if you have a question about your codebase or you have some related code somewhere
22:53
else in a file that you're looking at, it's able to go find it, get it, put it into its context and then kind of like h
23:01
take that into account when it's uh when it's uh coding. So uh I'm sure here if I
23:07
opened it up now let me ask you a question while I was thinking. Uh I'm I'm sure a lot of our
23:13
listeners know what rag is. Uh retrieval retrieval augmented generation. Is that
23:18
right? I always forget what the A stands for. Uh yes. Yeah. Augment. Okay. Awesome. So that that's when you
23:24
ask a question and then your agent performs some sort of a search uh to pull in information to add into the
23:30
context. Um I guess the alternative to doing that in this case would just to be to include the entire codebase in the
23:36
context rather than searching it first. Yeah. and you'd have issues typically
23:42
with the even a million tokens probably isn't enough for like a large um code
23:48
base. So you do need to do um some level of uh like rag or retrieval and um
23:54
that's something that I I think cursors spent a lot of time doing uh was kind of like um uh getting good search and
24:02
indexing for dealing with like large code bases. Gotcha. Now, one one of the things that
24:08
I've seen as I've played with some of these tools is that when I'm using a a public data set that I downloaded from
24:14
uh from the internet like Titanic or or like the global temperature ones that the it the models almost seem to know
24:21
what's in the data to start. Yeah, it's actually I've got an interesting one when we get on to a cursor for like uh
24:29
like notebooks and data science where it did some like interesting things based on um uh not necessarily based on like
24:36
what was in the code or what it did but more so on what it expected it to to do
24:42
in terms of like some of the insights that it um produced. Um so there is definitely a thing where if you have
24:48
like a popular data set or um like a a well-known code base because the large
24:54
language models that are being used are trained on the data effectively. It's got it like stored in its memory and
25:00
then it's able to um yeah just be able to output it. So it doesn't need to really write code or um we'll say like
25:08
run the code to be able to like answer some of the questions. So if you're benchmarking these models or or the
25:14
coding agents using a public data set is kind of like target leakage. It's like cheating. It's definitely definitely it's like um
25:21
so um yeah you definitely have like you said uh like target leakage where um it
25:28
just has some it's in it's in its training data. So these are just models at the end of the day that have been
25:34
trained across like most uh if not all of the the internet.
25:39
Okay. Good. Okay. So, um the coding agents that are actually doing this work, how is this how is what something
25:46
like cursor or zerbs agent do? How's what it's doing different from just copying, you know, putting a prompt into
25:53
jet gybt and then copy pasting the code into a code editor and running it? How how
25:58
so I think there's kind of like two two things. So, one is the the context and then the other one is like what's called
26:04
like tool calls. So, this is the ability for it to like do um like actions. So
26:10
instead of it just being chat, one of the things it's able to do is like um respond with like an instruction and
26:16
then be able to like execute that uh instruction. So it could be everything from like reading particular files to
26:22
writing code to running like terminal commands in this case. Um so um so at
26:30
this point I think our stream app is like done. We can copy the command and we'll get an app like uh up and running.
26:37
So, um, this is kind of like a an example of, uh, where I think cursor
26:42
shines and even if you were to do something like change it to, uh, like
26:50
light mode. Um, I I think yeah, so these kind of like applications is like super
26:55
good at. So, it's like script based. It's got um it's able to kind of like edit the the code, the styles and these
27:02
kind of like server based applications actually have like hot reloading like baked into them effectively. Um so when
27:09
you make a code change and you accept it, it'll automatically like hot reload. Um which isn't necessarily the same kind
27:16
of like execution models that you have when you're working with with data. I would say
27:22
there's a question here in the in the comments. It says what safeguards exist to prevent coding agents from introducing insecure or biased code.
27:30
H it's actually down to the systems that like implement them. It's like around like guardrails. So it's like um you can
27:38
kind of bake in guardrails. So you can have like multiple agents um would be
27:43
one thing uh or you can have like evaluations that are more like deterministic. Um so uh you can imagine
27:51
you have like a coding agent. um it produces an output and then you have an evaluator uh that has to like make sure
27:58
that has uh certain uh like accuracies certain um we'll say like coding
28:04
practices before it can accept the answer and then like progress so um kind
28:10
of LLM as a judge or guard rails would be the typical approaches that people
28:16
would take to date. So there is there's a bit more risk when the agents start to have a bit more
28:21
power. Oh, definitely. Yeah. And there's um big big issues if you give it like
28:27
access to like production databases or anything like that because um they can like uh if they have access to things
28:35
like the the terminal um they could like delete files. Um systems like cursor do
28:42
um kind of prevent it and have like allow lists of certain commands that they can run. But um there's definitely
28:49
more risk the more like autonomy they have. Um there's sometimes less risk if you can run them in a sandbox
28:55
environment and you have things like um if they do have to work with the database that the credentials they have
29:02
are like read only. Um and if they're not able to like access certain parts of your file system uh that means that uh
29:10
it's even if they do um try to do something bad they don't have like the actual um permissions to do so.
29:18
Gotcha. Gotcha. Okay. Let's shift gears a little bit and talk a bit about data science. So uh why can't uh what are
29:25
what are the issues when it comes to using something like a cursor uh for a data science project?
29:31
Yeah. So this is kind of like a my my experience of it. So um so in uh this
29:37
case uh same data set to just keep it uh simple and I asked it to do like EDA on
29:43
my files. Um so um what it's perfectly good at is like code generation. Um but
29:50
uh in the case of like a notebook what it doesn't have is like the ability to execute the the code. So if I ask cursor
29:58
to run the code typically what it'll do is like if I say like a run my uh first
30:05
cell in my notebook typically what it tries
30:10
to do is like h take the code here and run it as a python script in the terminal. Um so uh it doesn't do the
30:18
execution. Uh but when you ask it to do something like um so it's gonna activate
30:25
um and uh okay. So that's like typ typically
30:31
what it does is it doesn't have like access to the the the kernel itself to
30:36
to run it. um and it doesn't have access to the the variables or the the state
30:42
because it's like optimized for larger code bases and we'll say
30:48
in data science is a bit more iterative. So def definitely so yeah
30:53
you you want to be able to run it look at the code and be able to then write your next piece of code based on what
31:00
the not only the the code in the cell but what the outputs were. you should debug it um before moving on. Um so when
31:08
I asked cursor to do like EDA on my files just keep it simple uh it wrote uh
31:15
it did read the top of the CSVs uh effectively. So read uh the top of it to
31:21
get like some information and then it wrote nine cells but didn't write it didn't run any of them effectively. Uh
31:28
but what it did do is it produced uh some like key findings. Um, and I then
31:35
asked it how did it like find key findings without it being able to access the without it like having executed the
31:42
code. Um, so uh what it did then was it uh kind of like reproduced some of the
31:48
uh code cells again because I was in agent mode but when I asked it again um uh basically it kind of touched on your
31:55
point earlier. So I just asked it how did you get the insights without having run the code?
32:01
It said it was a good point. So it's kind of hallucinating a bit. Yeah. So it read the file structure. It
32:08
had like domain knowledge applications and then like uh what it called um educ
32:13
educated like inferences. So just like the fact that it knows that like there's global warming, there's um certain
32:20
things. So it h produced some insights uh without having like run the code that were effectively uh hallucinated.
32:29
Got it. Okay. So it's it's a bit like trying to use a VS code for for doing data science manually, right? You you
32:36
typically iterative process and you want to be able to see your results and react to them and and so on. That's just
32:42
difficult in this environment. Exactly. Now it still has like some some usefulness. It can write the code. You
32:48
can still like ask questions of the the code. You can uh do like uh code completions, things like that. But um in
32:55
terms of like an agent kind of going end to end to be able to like do autonomous work um that's uh yeah the ability to h
33:05
execute the code and like work iteratively are kind of like two things that um currently for me are kind of
33:13
down downfalls for working on like data science projects. And this is all running locally. Yeah,
33:20
this is also running locally. So it's like um yeah, so that's actually another good point in terms of like a lot of the
33:27
times um you do want to uh be able to like burst onto the cloud to use like
33:33
GPUs or other kind of like larger larger computes um uh which you can do through
33:39
SSH but you still have to set up a remote um environment. Gotcha. Okay, let's switch to Zerve. uh
33:46
can you talk to us a bit about uh kind of what the inspiration behind Zer's agent is and and why we why we don't
33:53
think that there's been sort of a cursor moment for data science yet? Uh sure. So I think fundamentally
33:59
there's some just key differences that we've touched on a little bit in terms of like uh software development and data
34:06
science. Um it's everything from how you execute the code. Um so that is like
34:12
it's more iterative exploratory. Um which is why uh notebook environments
34:17
are popular in the first place is the ability to be able to um yeah write some
34:22
code see the results and write some more code um which is like um super useful
34:29
workflow. um it has some drawbacks um around like state management, stability
34:36
um things like that but in general it's like um how most of the world does their
34:41
uh their data science and for good reason. Um then there's um the yeah just
34:49
the uh data and the code are also like important. So when an agent is running
34:54
the results, uh the types of data it's produced are all um like a super
35:01
important context. Um so if you have just a file name, h knowing the data
35:07
types, knowing the types of like um if there's mixed data types in a column become like very important so you're
35:13
able to like uh convert them or uh handle all of the the different like intricacies of of your data sets. uh how
35:22
you should join them. Um things like that all take loads of time when you're
35:27
like starting projects. Um the types of code you write are like different as well. So just um typically
35:34
software it's kind of like a more closed system. You're following like patterns in the code. Um whereas in data science
35:41
you probably have less of an idea to start and you're leveraging a lot more third party packages. So hard to imagine
35:47
doing a data science project without using like pandas, numpy, h tensorflow, some of those like um packages. So you
35:54
have to learn how to use those. Um you have these kind of like concept around uh using like vectors uh for like
36:02
parallelizing your operations um more so than uh you would do in like your
36:07
typical kind of code basis. Um so uh there's probably 101 other different um
36:14
like um differences as well uh including even the um the deployment kind of like
36:20
cycles and how you like monitor them. So uh model deployment is like very different than uh your typical software
36:28
deployments. Um and CI/CD practices uh monitoring uh all seem to be very
36:35
different. So software engineering typically has very uh standard practices
36:40
where it's a bit more fragmented in uh the the data world. Um one one other
36:46
thing actually that just brings to mind is the it's a lot more like non-binary. So code compiling or like a test passing
36:53
isn't like necessarily um the um how you assess uh like a data uh project. you've
37:01
got to kind of um yeah just when you're assessing like the quality of model
37:07
looking at things like data leakage like um being skeptical of having like 100% accuracy things like that are all um far
37:14
more subjective probably sure like missing values how you handle missing values is going to depend on the
37:21
shape of the data how much of the particular variable is missing uh you know its relative importance to the
37:27
analysis that you're doing like lots of questions about context text questions like like subject matter questions that
37:33
were going to impact the way you handle that particular column. What about
37:39
sorry guys sorry no I was just going to go to a question that uh somebody asked in the comments
37:44
uh they said how does Zer handle scaling context across large data sets and longunning workflows without
37:51
overwhelming the model. So I guess maybe to say it another way, uh what's in the Zer context that's uh
37:57
that's different from like a a cursor and and that sort of thing and what happens when it gets big?
38:03
Um so may maybe what I can do is kind of like bring bring up Zer I guess at at this point to
38:10
uh just kind of uh show it and introduce it to people and then we can talk around uh some of the information that goes
38:16
into the the context. Um I love this part.
38:22
So this is Zerve and this is the uh just um to keep it like um apples with apples
38:28
and kind of a a comparison. Uh what we have is the same uh data sets that we
38:34
had from Kaggle that we had in our cursor example and we've given it the
38:39
same uh instruction. Um and in serve what we have is um so this uh
38:46
effectively produced a a four-step plan. Um when it was running it actually ran for uh 15 minutes non-stop. So uh when
38:54
what did you ask it to do to start with? Uh do EDA. So same same with
39:00
same so it was a data quality analysis exploratory statistics uh correlations.
39:06
Um so um in serve what you have is a a DAG. Um so each of these are code
39:12
blocks. You can combine them together. So you have Python R that can work like interoperably. You can connect the data
39:18
sources. You can mix in Genai um and like a whole host of other kind of like
39:24
block types. Um so uh in serve what kind of like happens different and we'll show um maybe we'll
39:31
kick off a work uh the agent kind of uh coding in a minute. Um but the
39:37
difference is uh when a block is created it'll also execute it. So it'll take uh
39:42
the context that it has available to it is uh anything that it puts in the the
39:48
output uh visualizations it creates um any of the uh the data the data frames
39:55
uh that's been created. So each of these are available to the agent. So when the
40:02
agent is running what it'll do is it'll use like multiple tool calls. So while
40:07
it's working, it'll decide to uh read a certain output, summarize its results,
40:13
keep it in context or not, if it's relevant, uh be able to access the outputs, so the metadata about the uh
40:21
different uh variable types uh or read the variables uh directly from the the
40:28
state. Um, so observe actually dynamically sets its context based on what the what the
40:34
result looked like. Exactly. Yeah. And I actually didn't know that. Yeah. So it's um and it does it at each
40:42
of these steps. So it's able to reset its context and be able to kind of based
40:47
on uh the the different blocks kind of uh read variables, the charts, the data
40:53
frames or the the code or the the output. Um so h each of the models that
40:59
we use so both of these models actually uh we use the same model in cursor and
41:04
in zerf here. So it's cloud for sonnet uh was used um but here you can kind of
41:11
see um it does uh some kind of like interesting things where it uh as it
41:17
goes it creates like all of the the analysis and then this is like information that's available for the the
41:23
next block when it's um continuing on. it's like a analysis. So before it ever
41:28
does the correlation, it has all of the statistical information um we'll say
41:34
that's available um and then when you ask it to give something like uh the uh
41:40
the insights um here uh it's able to kind of give u information based on what
41:47
it's like the the code that's been executed. Got it? Instead of like making it up based on what it thinks is going
41:53
on, it's actually running code at each step along the way and then the results from each block pass to the next block
41:59
and so on and the agent is reading that output and summarizing it into results here. Exactly. Yeah. So that's um that's kind
42:07
of the the core difference for like data science workflows here. It's the kind of the execution model. Um and there's kind
42:14
of a fundamental difference I guess in terms of like where it executes. This is in the the cloud um versus it being uh
42:22
local which means you can kind of like um have these running kind of in the background. Uh you can be working on
42:30
another another task. Um and it's able to yeah just work for for longer without
42:36
kind of um hugging uh any like resources on your your local machine as well.
42:41
Gotcha. How big is the Zerb agent's context window? Uh so it depends. It's a it's limited by
42:48
the model that you're using. So if you're using sonnet, I think it's 200,000 characters. Uh typically our
42:55
context windows end up in the 15 to 20,000 tokens, I believe.
43:02
Gotcha. Just depending on the size of the project and the complexity and so on. Yeah. how many tool calls it's done, how
43:08
much like relevant like so the the longer it goes typically um you have uh
43:13
you have like longer context build up because it has like information about like the the task that it's already
43:20
performed while it's been been working. Um there is techniques as well um to
43:25
like compact it. So you can um even um you can just summarize feed feed the
43:32
context window into a large language model and have it like effectively
43:37
condense down the information so that you can turn a larger context window into a smaller context window for it to
43:44
to continue. Gotcha. I actually saw a paper published not long ago about compression and
43:50
hallucinations and being able to run a test to see if uh if a hallucination was
43:56
actually happening rather than having to read and sort of evaluate your uh the actual answers. I haven't fully sort of
44:03
internalized how that works, but that's got to be related. Yeah. Yeah. Yeah. Oh, definitely. And I' I've
44:09
seen it a few times actually where you um where it does like compact um and
44:14
condense the the context um it can often go off on a tangent immediately
44:19
afterwards. So it's um uh picks up something that it shouldn't from the
44:24
it's like you told it not to do something previously and then as soon as it gets like um condensed it'll start
44:30
the thing that you asked it not to do. Awesome. Okay. um challenges. I want to
44:37
talk about engineering challenges for building this thing. What was the what do you think was the hardest thing uh or
44:44
maybe one or two of the hard things that it uh that you you guys encountered when you were building the agent and getting
44:49
it to perform reliably? Uh oh, there's um uh there's been a few.
44:54
Um the initial one actually was uh probably just structuring the code base to be like um aentic. So it's uh we had
45:03
kind of written the uh the code over like a couple of years. Um it was like a standard like backend
45:11
um that has only one way of like communicating with it. It's like somebody would press a button on the front end that would make an API request
45:18
and then they would like update a database or take some action like running a block. um uh when you want
45:27
both the user and the agent to be able to do it, you've got to like restructure and rethink about your your code because
45:32
now you want the agent to be able to do it as well as the the user and ideally you want to be able for them to be able
45:39
to do it at the same time. So when a user does something while the agent is working, you want the context to be uh
45:46
updated. Um then so that was uh definitely one uh the um
45:53
that's like a plumbing and orchestration type issues. It's it's just yeah that's mainly like
45:58
orchestration. Um there's there was small differences between all of the different like model providers. Um so
46:05
like prompt engineering to get it to work from like one model to another when
46:10
you're um because basically it is uh for the most part the context that you're providing to it. um and they've all been
46:18
trained slightly differently. Um so having it um be able to to do that um is
46:25
well like be able to easily switch across like model providers is um uh somewhat of a a challenge. Um UIUX is
46:34
always a challenge I think. Um there's always I remember sorry to interrupt I remember
46:40
when I first started testing the one of the early versions of the agent it kept swallowing all the errors. Uh so it
46:47
would like it would write logic in to say try this and if it doesn't work just assign these dummy variables and just
46:52
keep going as if nothing went wrong. Oh it did it did it all the time. Try except everywhere. It was fun. Um
47:00
were just so eager to please. Oh that's it's a it's it's a big challenge. Um, that's actually a super
47:06
uh good one, Greg, in terms of they are they do want to just like h give you a
47:13
good like positive um they're kind of like yesmen in that sense. So it's like um uh getting it to actually stop when
47:21
it can't do something is like a big um a big challenge. So especially when you're working with data, it's like mightn't be
47:27
possible. Uh with you ask it a question and the data set isn't available. Um
47:32
that's a that's an issue and a related one is um uncertainty with like working on a
47:39
data project. Um so uh you have a file you've never looked at it before you say
47:44
um I want to do like a a model training um the if you ask like an LLM like what
47:51
the steps it should be or like um an agent what the step should be it could make like a to-do list of like eight
47:56
things um and in reality it could fail after the first one. it's just like not
48:01
feasible should change the approach. Um but often times what it'll do is it'll
48:06
just continue like no nothing uh is wrong. So getting it to the uh which was
48:12
one of the questions earlier around like the safeguards and the guardrails um those kind of like checks the
48:18
adaptability then to update the plan and give like um
48:23
uh good uh useful information to the user is like uh critical.
48:29
So, okay, future. Let's think about the future because um I've been seeing like code complete and stuff like that for a
48:35
long time and my first action was always to turn it off and because it's just annoying and it never really worked very
48:42
good. That was of course before the the agents and the LMS and stuff like that. But now, like as I was scrolling through
48:47
LinkedIn this morning as I was, you know, kind of getting ready for the day, I'm seeing guys come on and say the same
48:54
stuff, right? these these uh crusty engineers going, "All right, show me a product with AI and I'll uh you know, my
49:00
first question will be, how do I turn this off so I can get back to to my manual coding?" Uh when I was I was the
49:07
chief customer officer at Data Robot, we invented uh automated machine learning back then and that was a lot of the
49:13
response that we got. It was like I don't want an automated machine learning system. I want to build my own machine learning because, you know, I know best
49:20
and I know what's better and all that sort of thing. Do you think this large language model stuff is kind of like
49:26
that? Like is it going to be a flash in the pan and people are going to realize, hey, the the emperor's got no clothes
49:31
and this thing is just a fancy autocomplete and it's never really going to work or is it just going to get better and better until it's like
49:38
legitimately replacing people's workflow and stuff like that? So I I Yeah. So I I think um even even
49:45
if it didn't get any better, it's here here to say is what I would say. even if it doesn't get much better than fancy
49:52
autocomplete because it's um already a big productivity gainer. H there's
49:57
probably like three three different types of um um people that are like
50:03
facilitated by uh coding agents. There's like the experts for their productivity.
50:09
There's non-domain experts who have like access to things. So think like um uh
50:14
lovable for like front-end development. Um, and then there's like educational
50:20
purposes. So, it's um you can ask as stupid a question as you want and it'll give you an answer and explain things to
50:27
you. It has like unlimited patience and can show you like new techniques for for doing things which is um probably where
50:34
Stack Overflow didn't work and people would ask like their as small and silly a question as they wanted to like chat
50:41
you BT. Um, so, um, yeah, Stack Overflow wasn't the most friendly place for stupid questions, was
50:47
it? Yeah. Yeah, it was a terrible place for super questions, but it was um, they haven't taken that website down
50:52
yet, have they? No. No, it's it was back to LA last year. I think it was back to like um,
50:58
God lost it was Yeah. back to like late 20110s kind of levels of uh, tra
51:04
traffic, I think. Um, so prediction is I think um, they'll get better and better.
51:11
um they haven't uh I think the innovation hasn't um slowed down um in
51:16
terms of um the just how good the things like the context windows get. There
51:23
might be some like cool um developments hopefully in the um relatively short
51:29
term around like reasoning and like how they do like um like thinking time at like inference um which could be uh
51:36
cool. Um but yeah, I think um will it replace uh everybody's workflows?
51:44
Probably not. Could it automate some of the the tasks? I think it probably um probably could.
51:51
We got a couple of questions coming in from LinkedIn. Uh one around how does this help you with data modeling and the
51:57
other one around how does it automate ETL data pipelines? It's kind of two questions in the same category. Can you
52:03
talk about yes a bit? Uh the ETL pit I think is actually where it's like um
52:08
yeah so EDA ETL all super like um interesting.
52:13
There's um one thing I'd like to see probably more of with the agents that we're looking at is um data discovery.
52:21
So um understanding kind of like uh database structures uh being able to
52:26
like potentially like index them so that you don't have to do the same um like iterative steps to like understand the
52:33
tables. um is um something that I think is like potentially like very valuable.
52:39
Um but um ETL pipelines I think um yeah it's a very good application uh for
52:46
agents. Uh Zerve is particularly good at it because it's set up as a DAG has like parallelism. Um so um uh yeah so um yeah
52:57
I'd say um give give it a try. So you can go to app.serve.ai AI try it for
53:02
free give it a prompt and see if it can build a ETL pipeline. Um on the data
53:08
modeling side as well um yeah I think there's uh some uh some work to do there
53:14
for um agents potentially to work on like really large databases um just to
53:21
get the data discovery integrating with things like um data catalogs definitely
53:26
helps to provide context um but it um yeah in general I'd say working with uh
53:34
data um and if you write code to work with data. Um, it definitely should be a
53:40
productivity gainer if nothing else. Well, yeah, like Jason said, zerve.ai
53:46
will get you uh get you a free account so you can get on and and play around with it, kick the tires, see what uh see
53:53
what the agent can do for you. We did mention the university programs uh that are teaching data science and there's
53:58
definitely an application there uh as well. Uh, Zerve was designed to be self-hosted, so you can install it in
54:05
your your own cloud environment, so your data doesn't have to ever have to leave your VPC. So, it's great for secure data
54:11
as well. We would love to get folks on there and uh even more folks on there experimenting, kicking the tires,
54:17
sending us feedback uh and just benefiting from the the way the agent works to deal with with data.
54:24
Yeah. And on the ETL pipelines as well, there's like full scheduling built in and a git integration. So you can
54:30
version it and uh schedule it to to run. So once your pipeline runs, you can have it run um with a custom chron expression
54:38
or uh every every hour or every day. All right, last question. We're uh we're
54:44
out of time here. Give us uh give us your biggest wish for for agents uh in
54:50
the next year or two. I I'll do mine and then I'll give you a second to think about it while I talk about yours. Here's the thing that I want the most
54:56
and that is the ability to cleanly interrupt an agent's process and say, "Hey, I forgot this in my prompt without
55:04
sort of it losing its place or having to start from scratch." Um, I don't think any of the models have
55:09
really gotten gotten that down, right? Uh, I know you can do some of that inside of Zerf, but I really want to be
55:16
able to interrupt because that's sort of how I communicate and I never remember everything that I need to include in the
55:21
prompt. Oh, 100%. That's that's a that's a killer one, Greg. That's a really good
55:27
one. So, I won't top that one. Being able to interrupt um is um uh one of the
55:33
top things I'd like agents to be able to ask me questions proactively, I think, is like one of the things. So if I need
55:40
uh something or need something that there's like a really clean uh UI that like tells me and then it's able to
55:46
continue. So it interrupting me instead of me interrupting it I guess. Um and then um custom outputs for like um uh
55:56
like data projects I think is something that I'd really like. So if I have um uh
56:02
like to be able to say I want PowerPoint, I want it in this format, I want it in that format. um that it's
56:07
able to do that in my um my own bespoke kind of like formats and things like that. So, if I say this is your
56:13
template, go off and like create this. Uh I think that would potentially save a lot of time.
56:19
Well, judging by the way things are going, I don't think it'll be long before both of those things are are completely within reach.
56:25
Yeah. Yeah. So, it's um Yeah. So, um and do you have a favorite um agentic tool
56:31
at the minute? Oh, besides Zerve. Besides Zerve. Yeah. Yeah. Ah man, my
56:37
catchy BT knows so much about me. Uh I have tried operator. So chat I think they actually turned off operator when
56:44
they released the agents I think. Yeah. Yeah. Yeah. So I thought operator was cool. It was a little slow but uh I asked chat PT
56:52
the other day what it knew about me and it was alarming the things that it knew about me. Got a few details wrong. I
56:58
posted on LinkedIn actually but uh yeah I have a deep personal relationship with Jet GPT.
57:03
Oh very good. What about you? Um, is it my I chatt is my go-to one at the minute, I would
57:10
still say. Um, I do like I I've tried Comet, the perplexity browser. Um, it's
57:18
Oh, Grock is hilarious though. I like I like about Grock that it's uh un unfiltered.
57:24
Yeah. Yeah. That I just um Sorry, interrupted though. Oh, no. It's a that's a good one. Yeah, it's um and uh on it it's actually some
57:31
so they've done a really good job um like yeah two years ago would we have thought there'd be like so many like
57:38
good model providers and even like some of the open source models are are good. So it is um um yeah the the variety in
57:46
it and having like choices for like the different um uh large models I think is
57:51
a a big a big win in general for for everybody. Nice. All right. Well, I think our
57:57
time's about up, but uh thanks for all the insights, Jason. Jason is the true brains behind Zerve. Uh he and the
58:05
engineering team have built something truly remarkable. And uh it's a privilege for me to get to get on there
58:10
and use it and p around and and have it amplify my ability to to write code. And
58:16
uh I'm just super excited for where we're going and what we're building and and I can't wait for everybody to get a chance to try it.
58:21
Yeah. So, um, Greg, Greg's underplayed. He's he's rolled massively there, but it's, uh, yeah, no, been a been a
58:27
pleasure and, yeah, with, um, yeah, like Greg said, um, getting people in, trying
58:32
it, giving feedback, um, seeing what people build and, um, yeah, if people want to continue the conversation as
58:39
well, there's a community Slack channel, um, that they can, uh, join as well.
58:44
Excellent. All right. Well, in that case, let's sign off, Jason. Yep. Thanks very much, Greg. And thanks
58:50
very much everybody for for tuning in.
59:06
Take
59:13
the position.


