🏀Zerve chosen as NCAA's Agentic Data Platform for 2026 Hackathon·🧮Meet the Zerve Team at Data Decoded London·📈We're hiring — awesome new roles just gone live!
Videos /The Cursor Moment for Data Science: Context at the Core
X
Zerve Agent & Features

The Cursor Moment for Data Science: Context at the Core

September 16, 2025

The Cursor Moment for Data Science: Context at the Core

Data science is not software engineering. The difference is context. In this livestream, we’ll unpack what “context” really means, why it’s critical for data science, and how it can change the way teams work. You’ll hear about data scientists at leading companies who’ve used context-aware workflows to cut wasted effort, streamline collaboration, and move faster from idea to impact. We’ll also give you a live demo of Zerve, the first AI development environment for data scientists, with context built into its core.

Key takeaways:

  • Why context is the missing ingredient in traditional AI coding assistants for data science

  • Real-world examples of context-aware workflows driving productivity and collaboration

  • How context-aware AI agents keep hypotheses, data, and code aligned, so you avoid wasted cycles

  • A live look at Zerve’s context-first agentic approach to data science

  • 4:29

    All right. Good morning, everybody. My name is Greg Michaelelsson. I'm one of the co-founders here at Zerve, chief

    4:34

    product officer. Got the music going in the background. All right. All right. Today's topic is

    4:42

    coding agents, particularly as it relates to data science. I was just listening to a podcast this morning uh

    4:48

    from the daily and they were talking about all of the crazy holes that people are falling down as they're talking to

    4:54

    large language models. Seems like they're everywhere in the news these days. I just used Chatty PT to come up

    5:00

    with a recipe for hot honey mac and cheese. It was absolutely delicious. So,

    5:06

    I go back and forth between thinking that large language models are just kind of the smartest dumb chat bots that have

    5:11

    ever been invented or something really, really different. Um, I'm joined by

    5:18

    Jason Hillary, one of our other co-founders here at Zerve, chief technical officer. Uh, Jason, say hi.

    5:24

    You want to introduce yourself a bit? Yeah. So, uh, hi everyone and, um, hi Greg. Um, so looking forward to to kind

    5:31

    of the the next hour. Um, it's been a long-standing thing in reserve to try and get you on a a podcast. So, this is

    5:38

    kind of the first first step towards it. Well, it should be a good conversation

    5:43

    today. Anyway, uh, but like I say, every every time you turn the news on, every time you're reading on LinkedIn,

    5:48

    everything's about AI, everything's about large language models. And maybe the biggest impact that uh these guys

    5:55

    have had these large language models is in the space of coding uh in the space of writing code because it turns out

    6:00

    large language models are super good at writing code and uh that's impacting

    6:06

    everybody in this space somebody who isn't Oh sorry go ahead

    6:12

    Jason oh yeah just on on that the yeah it's kind of a killer application for for

    6:17

    like agents in general um is the the coding side of it especially with um the likes of uh Claude just pretty much

    6:24

    focusing on on coding in general. It's like that's kind of their killer killer niche.

    6:30

    It certainly is changing the way that people write code today. It certainly changed the way that I write code.

    6:35

    Someone who isn't here with us today is Philly Hayes. He was supposed to be here. So, I know his uh fan club and

    6:41

    entourage are disappointed that he's not here. He's a bit under the weather. So, we're sorry that our other co-founder

    6:47

    and CEO aren't here. Everyone's very disappointed. And I'm sure we'll we'll suffer in comparison, but we shall

    6:53

    bravely soldier on. Uh, it's 9:00 a.m. for me here on the on the west coast of the United States. And Jason, what is it

    7:00

    for you? It's Yeah, 5:00 pm here on the west coast of Ireland. Nice. Okay. Well, we truly are

    7:06

    multicultural here. Uh, well, before we dive in, uh, Jason, I want to make sure

    7:12

    we're all sort of starting on the same page. Uh, can you just talk a little bit about what coding agents are? like what

    7:18

    what do what do I mean when I say coding agents so we can all kind of get on the same page. Yeah. So I think I think um everybody

    7:24

    kind of has um been exposed to like large language models and what they're able to do through chatbt and I think

    7:29

    the the core thing with a an agent and a coding agent um in particular is like its agency or

    7:37

    its autonomy to to do tasks kind of end to end. So it's um where um it all would

    7:44

    have started with GitHub copilot and kind of like the autocomplete and then

    7:49

    uh people would have like migrated to using chatbt to uh enter a prompt, give it extra context to get some code it

    7:56

    could copy and paste into its code editor, run the code and iterate that way. uh coding agents um get much closer

    8:03

    to the to the code and often run inside of your IDE or your terminal and they're

    8:09

    able to kind of go end to end from planning to debugging to testing to

    8:14

    execution and they're able to iterate so able to actually like work on the code have full context of your code base

    8:21

    access to tools like running the code uh accessing the terminal installing packages everything like that. So being

    8:27

    able to like autonomously act as if a developer would inside of your IDE.

    8:33

    That sounds dangerous. Yeah. But fun. Yeah.

    8:38

    But dangerous but fun. That's like our slogan here. Um Yeah.

    8:43

    How uh how do you So I'm not a I'm not an engineer, you know. I'm not an engineer. I'm a data scientist. I write

    8:49

    code from time to time, but it's always bad code. Uh as with most data scientists. Uh in fact I remember one of

    8:56

    the first events we did at Zerve uh we were talking about how using Zerve to

    9:01

    develop data science could help people code better because it makes you think in a modular way and the code is

    9:06

    organized as a graph and stuff like that and one of these data scientists in the room stood up this was in Einhovven in Europe

    9:14

    and uh he goes well why can't you just make data scientists be better coders and I was like good question uh I don't

    9:22

    know the answer to that question and I don't if large language models actually make anybody better coders. But how have

    9:29

    I I think what they can do is like standardize kind of like code quality across it anyways because it's all like

    9:34

    um effectively the same models writing 90% of the the code. Um there is a thing as well where um in a lot of data

    9:42

    science project projects like optimizing too early can be a problem. So like bad code that works um h isn't always the

    9:49

    worst thing to to start with. So, it's like you might be throwing it all out in like a couple of hours time. Anyways,

    9:56

    that reminds me of an argument I have all the time at home. I I like to when I load the dishwasher, I don't like to

    10:02

    prescrub the dishes and uh I'm I always get yelled at because some of them don't get clean and

    10:08

    I'm like, "Hey, a lot of them did get clean." So, that's kind of like bad code that works. I I I' I'd be I'd be on the other camp.

    10:15

    I would have grown up Yeah. pre pre-washing doing the dishwashers work for us. Yeah. Yeah, I don't like that. I don't

    10:21

    like that at all. How have engineers kind of uh adopted these things? Do they like uh coding agents? Is it is it do

    10:27

    they view them as uh like uh invaders or or as a threat to their work or what's

    10:33

    what's the sense? Yeah, I I think it's like um pe people have like adopted it. It's taken a

    10:39

    while. Um there was like early adopters like most things and then there was like some people that were very skeptical how

    10:45

    people like to write code and there's like some interesting like um patterns where we've heard of like um developers

    10:51

    who use the autocomplete features in the likes of cursor but never accept it and

    10:56

    will like rewrite the code themselves even if they like it. Um so I'm not 100% sure the the benefit of it but that's

    11:03

    kind of like a lot of people have found some like interesting ways to to work with it. Um, and then there's more

    11:10

    recently I think um the likes of like cloud code is like more more agentic than like autocomplete or like the tab

    11:16

    functionality and cursor and it's gotten like good adoption. I think there's like um yeah a prevailing sense that it's um

    11:24

    it's the future like it's here and it's probably here to stay. Yeah, I've always found the autocomplete things to be very

    11:31

    distracting and like the user experience like the actual interaction with the the models is very clunky and hard to you

    11:38

    know it kind of interrupts your flow a bit. Yeah, I would have it's like um or you just kind of like sit there and wait

    11:44

    like watch the cursor blink for a few seconds just for it to fill in something. it can be good to un unblock

    11:50

    you a little bit but um I would have found uh a lot of the time I would have still used like chat GPT or something

    11:56

    like that to create the code that I would copy back in and then like kind of go back and forth to to debug.

    12:04

    Yeah. Now, I've also one of the other things I found is when you use these coding agents that they they tend to be

    12:10

    good at starting projects and at at kind of jump starting and giving you some boilerplate and getting something at

    12:17

    some foundation built, but they struggle a bit when it comes to like working on a project that's already partially done.

    12:24

    Why is that? Um, a lot of it is probably to do with context and like um just

    12:30

    being able to to find the right like context with inside a a code base or um

    12:36

    maybe like uh yeah just not being able to deal with larger context in general. I think models are getting better. It's

    12:43

    like if you have larger context windows, they have like less of a tendency to forget the top of the file um we'll say

    12:52

    than um uh before. But um it's still somewhat of a an issue where and you do

    12:58

    get like hallucinations still. So um I was using it on front-end code so not

    13:03

    data science code for for this particular project and it h hallucinates like this kind of like styles all the

    13:10

    time. Just puts in like colors that it thinks should be there that aren't actually there and things like that. So it's like um um yeah, so it can get like

    13:17

    infuriating sometimes for when you think it's easy um that it can um yeah kind of

    13:24

    like struggle. That's funny. Um I was actually using uh chatpt to to write a a query uh to to

    13:33

    use the OpenAI API to submit requests. And it invented an API and I when I

    13:39

    pointed out that that API didn't actually exist, it said that it should exist.

    13:45

    Well, it's it has a point. It's like Yeah, it's um So, I see we have a question in there. Um Greg, it's like

    13:52

    maybe one that we were kind of like maybe planning to get to get to at the end, but it's like um probably a hot

    13:57

    topic. So, I I don't know if it's one that you want to um kind of t tackle from the start.

    14:03

    All right, let's see. It says, "If contextaware agents succeed, what roles or tasks for data scientists do you

    14:09

    think will disappear and which ones will become more important?

    14:14

    Good question. What do you think, Jason? Uh, I think there's um, yeah, if you kind of think about what the world could

    14:20

    look like if there was like agents that could kind of go end to end on lots of tasks. Um, for data scientists, I think

    14:27

    a lot of the value could be in uh, like experimenting like much quicker um, in

    14:33

    terms of being able to try a lot more like experiments in parallel. So there's uh somewhat like supervision of like

    14:40

    multiple agents becomes like uh and the ability to easy like quickly switch

    14:45

    contexts um becomes like um important. Uh I do see a world as well where agents

    14:51

    don't just do the development side because um in data science in particular

    14:57

    you're typically like producing like an output like a model that you want to get into production and being able to like heal a pipeline or be able to kind of

    15:04

    like monitor and like retrain models are things as well that I think agents could be uh useful for. So I think ideas

    15:13

    and being like creative um understanding like data its limitations and evaluating

    15:18

    like results are like all things that the human uh still does. They probably write less code and hopefully do like

    15:26

    less debugging. I would say yeah I'm very optimistic about the uh

    15:32

    improvements that these models have been making. I think the difference between like a a chat GPT you know three and

    15:38

    five is is pretty remarkable. Uh, and so my I'm focused mainly on getting the

    15:44

    user experience right. Like how can I make this thing as easy to use as possible because I know that the models are going to get better and they're just

    15:50

    going to become easier and easier to kind of integrate into your life and workflow and coding and all that jazz.

    15:58

    Um, yeah. No, it sounds um, yeah, the user experience is like a actually a really interesting one in terms of um, I

    16:05

    don't think the final form has been there yet. It's kind of been an evolution of like the traditional like

    16:13

    coding experiences be the Jupyter notebook or like the VS codes that have

    16:18

    like um IDs like put onto them uh or like agents kind of like embedded and put onto them which is a very good

    16:24

    starting point because there's like a familiar familiarity for like the the users. Um but yeah, I can see coding

    16:31

    looking a good bit different and people maybe using like more natural language to to kind of like do things. um as the

    16:39

    people get more comfortable with the the tools and they're able to like work for work for longer.

    16:45

    I was listening to NPR the other day and I heard a story about a woman who had fallen in love with JPT

    16:52

    uh and it it was an awesome story but her problem was that after she'd spoken

    16:58

    like been talking to him it JPT for like three or four days it would forget who she was uh because of this

    17:05

    context window. Can you define context window just so people kind of understand in case anybody doesn't understand what

    17:11

    that is and how it relates to what we're talking about? Sure. So um in your large language models these kind of like um these uh

    17:19

    they're basically just like um predictors they take a a set of like um input tokens which are like characters

    17:26

    of um of like um like a language. So English, German, it could be any kind of

    17:32

    language and then it goes through this model. uh the numbers all get crunched and what it produces is like a

    17:38

    prediction of like the next set of like um characters. So it kind of like takes in one set of uh characters and it

    17:44

    produces another set. Um so the context window is that input set of characters.

    17:50

    Um so that can be um so context is effectively what you input into it and

    17:56

    you have two different types of inputs. You've got the system prompt which is

    18:02

    the instructions for the the model. So kind of all of the background information that the user doesn't

    18:07

    typically see. And then you've got the message history. So this is like what the user types in, what the model

    18:13

    responds with, and then when the user sends the next message, it's got all of the previous messages. So they can um

    18:19

    effectively output the the next set. So it's um it's a yeah, the limit on the

    18:25

    the characters that can go into the model. And there is a limit. So there is like an upper limit. Um they are getting

    18:32

    bigger. So it's like um the larger like cloud models are 200,000 and there's one

    18:37

    that has like a million characters now. Characters or tokens? Oh, tokens. Tokens. So it can be multi.

    18:43

    It's like you typically get three to four characters in a talk. Wow. So that's huge. A million tokens is

    18:49

    a ginormous context. Yeah. Got it. So when I send in a prompt to chat GBT, it's more than it's getting

    18:55

    more than just the question I asked it. Oh, definitely. It's having all of the instructions like how it should respond.

    19:01

    Should it be like um should it be like playful or not? Should it like tell Joe? All of those kind of things are like

    19:07

    hidden inputs into the the model and then when people use it in a business context, it'll have that information

    19:13

    around like um the task that it's doing. The coding agents will be told um like

    19:20

    how they should uh react, what kind of language they should use um and any kind of like pitfalls they they might fall

    19:26

    into. So it's like make sure that you're like checking your syntaxes, all of those kind of um all of those like extra

    19:34

    things that make the model behave well. And there was actually an interesting thing I saw earlier. Um so just uh for

    19:41

    one of the the demos earlier I was kind of playing around with a financial uh data set and I was wanted to do some

    19:46

    like encoding using one of the uh large uh language models and there was a

    19:52

    fine-tuned one called I think it was like fin or something like that um on

    19:58

    hugging face and uh I was working with the zer agent and I was like can you do

    20:03

    the encoding and it came back and said um that actually um people have shown

    20:09

    that uh using GPT40 with um uh just like

    20:14

    proper or like um like um just like um that context like additional context

    20:20

    outperforms the fine-tuned model by up to 10%. So um not something that I would

    20:26

    have thought of a year ago where a fine-tuned model on uh like financial data sets would do uh worse in terms of

    20:34

    if you take a general purpose model and just give it like specific context related to the the task it ended up

    20:41

    outperforming it. H that's wild. So the the general allpurpose models have become so

    20:47

    sophisticated that they're better than some of the more narrow purpose. Yeah I I wouldn't have

    20:52

    guessed that. um even six months ago. Huh. I actually I wouldn't have guessed it today, I think. But uh there you go.

    21:00

    Yeah. All right. Well, hey, let's jump in. Uh maybe we'll start with cursor or one of

    21:06

    the other sort of um uh uh coding agent tools that are out there for for more

    21:11

    engineering and software development purposes. Do you think you could show us uh just pull it up and give us some examples of how the agent helps with uh

    21:19

    writing boiler plate or doing code completion or something like that? Yeah. So um so for this I just have like

    21:24

    a a sample project and tell me when you can see the see the screen. Yeah.

    21:29

    Um so this is this is cursor. It's um a very very cool uh software development

    21:35

    tool. Uh it runs locally. You have access to the terminal. You've got the uh agent and kind of like chat

    21:42

    functionalities over here. And um to start with because it's a data science kind of a project what I'll do is I'll

    21:48

    just ask it to create a streamlet app. And I've taken a Kaggle data set that has like global temperatures here. Um so

    21:56

    uh what I expect cursor to do here and what it does like really well is that it's going to like um create a to-do

    22:03

    list. So it'll like examine the files that I have. It'll create the stream the file probably put it in my file system

    22:09

    here and then write all of the code. Um and you can kind of see that it's like

    22:14

    working through this uh to-do list. It's reading all of the files. Um, and it'll kind of like do everything from like

    22:21

    running it. It's now going to create this um, temperature dashboard py. And this kind of stuff I think is like um,

    22:28

    uh, cursor is like excellent at. And this is kind of like scratching the surface um, in terms of like what it's

    22:34

    able to do in terms of like work working with like large file systems. Um, because what it does under the hood is

    22:40

    that when you have a give a cursor access to a codebase, it'll index it all. Um, and then it can it's able to

    22:47

    like intelligently search it so that if you have a question about your codebase or you have some related code somewhere

    22:53

    else in a file that you're looking at, it's able to go find it, get it, put it into its context and then kind of like h

    23:01

    take that into account when it's uh when it's uh coding. So uh I'm sure here if I

    23:07

    opened it up now let me ask you a question while I was thinking. Uh I'm I'm sure a lot of our

    23:13

    listeners know what rag is. Uh retrieval retrieval augmented generation. Is that

    23:18

    right? I always forget what the A stands for. Uh yes. Yeah. Augment. Okay. Awesome. So that that's when you

    23:24

    ask a question and then your agent performs some sort of a search uh to pull in information to add into the

    23:30

    context. Um I guess the alternative to doing that in this case would just to be to include the entire codebase in the

    23:36

    context rather than searching it first. Yeah. and you'd have issues typically

    23:42

    with the even a million tokens probably isn't enough for like a large um code

    23:48

    base. So you do need to do um some level of uh like rag or retrieval and um

    23:54

    that's something that I I think cursors spent a lot of time doing uh was kind of like um uh getting good search and

    24:02

    indexing for dealing with like large code bases. Gotcha. Now, one one of the things that

    24:08

    I've seen as I've played with some of these tools is that when I'm using a a public data set that I downloaded from

    24:14

    uh from the internet like Titanic or or like the global temperature ones that the it the models almost seem to know

    24:21

    what's in the data to start. Yeah, it's actually I've got an interesting one when we get on to a cursor for like uh

    24:29

    like notebooks and data science where it did some like interesting things based on um uh not necessarily based on like

    24:36

    what was in the code or what it did but more so on what it expected it to to do

    24:42

    in terms of like some of the insights that it um produced. Um so there is definitely a thing where if you have

    24:48

    like a popular data set or um like a a well-known code base because the large

    24:54

    language models that are being used are trained on the data effectively. It's got it like stored in its memory and

    25:00

    then it's able to um yeah just be able to output it. So it doesn't need to really write code or um we'll say like

    25:08

    run the code to be able to like answer some of the questions. So if you're benchmarking these models or or the

    25:14

    coding agents using a public data set is kind of like target leakage. It's like cheating. It's definitely definitely it's like um

    25:21

    so um yeah you definitely have like you said uh like target leakage where um it

    25:28

    just has some it's in it's in its training data. So these are just models at the end of the day that have been

    25:34

    trained across like most uh if not all of the the internet.

    25:39

    Okay. Good. Okay. So, um the coding agents that are actually doing this work, how is this how is what something

    25:46

    like cursor or zerbs agent do? How's what it's doing different from just copying, you know, putting a prompt into

    25:53

    jet gybt and then copy pasting the code into a code editor and running it? How how

    25:58

    so I think there's kind of like two two things. So, one is the the context and then the other one is like what's called

    26:04

    like tool calls. So, this is the ability for it to like do um like actions. So

    26:10

    instead of it just being chat, one of the things it's able to do is like um respond with like an instruction and

    26:16

    then be able to like execute that uh instruction. So it could be everything from like reading particular files to

    26:22

    writing code to running like terminal commands in this case. Um so um so at

    26:30

    this point I think our stream app is like done. We can copy the command and we'll get an app like uh up and running.

    26:37

    So, um, this is kind of like a an example of, uh, where I think cursor

    26:42

    shines and even if you were to do something like change it to, uh, like

    26:50

    light mode. Um, I I think yeah, so these kind of like applications is like super

    26:55

    good at. So, it's like script based. It's got um it's able to kind of like edit the the code, the styles and these

    27:02

    kind of like server based applications actually have like hot reloading like baked into them effectively. Um so when

    27:09

    you make a code change and you accept it, it'll automatically like hot reload. Um which isn't necessarily the same kind

    27:16

    of like execution models that you have when you're working with with data. I would say

    27:22

    there's a question here in the in the comments. It says what safeguards exist to prevent coding agents from introducing insecure or biased code.

    27:30

    H it's actually down to the systems that like implement them. It's like around like guardrails. So it's like um you can

    27:38

    kind of bake in guardrails. So you can have like multiple agents um would be

    27:43

    one thing uh or you can have like evaluations that are more like deterministic. Um so uh you can imagine

    27:51

    you have like a coding agent. um it produces an output and then you have an evaluator uh that has to like make sure

    27:58

    that has uh certain uh like accuracies certain um we'll say like coding

    28:04

    practices before it can accept the answer and then like progress so um kind

    28:10

    of LLM as a judge or guard rails would be the typical approaches that people

    28:16

    would take to date. So there is there's a bit more risk when the agents start to have a bit more

    28:21

    power. Oh, definitely. Yeah. And there's um big big issues if you give it like

    28:27

    access to like production databases or anything like that because um they can like uh if they have access to things

    28:35

    like the the terminal um they could like delete files. Um systems like cursor do

    28:42

    um kind of prevent it and have like allow lists of certain commands that they can run. But um there's definitely

    28:49

    more risk the more like autonomy they have. Um there's sometimes less risk if you can run them in a sandbox

    28:55

    environment and you have things like um if they do have to work with the database that the credentials they have

    29:02

    are like read only. Um and if they're not able to like access certain parts of your file system uh that means that uh

    29:10

    it's even if they do um try to do something bad they don't have like the actual um permissions to do so.

    29:18

    Gotcha. Gotcha. Okay. Let's shift gears a little bit and talk a bit about data science. So uh why can't uh what are

    29:25

    what are the issues when it comes to using something like a cursor uh for a data science project?

    29:31

    Yeah. So this is kind of like a my my experience of it. So um so in uh this

    29:37

    case uh same data set to just keep it uh simple and I asked it to do like EDA on

    29:43

    my files. Um so um what it's perfectly good at is like code generation. Um but

    29:50

    uh in the case of like a notebook what it doesn't have is like the ability to execute the the code. So if I ask cursor

    29:58

    to run the code typically what it'll do is like if I say like a run my uh first

    30:05

    cell in my notebook typically what it tries

    30:10

    to do is like h take the code here and run it as a python script in the terminal. Um so uh it doesn't do the

    30:18

    execution. Uh but when you ask it to do something like um so it's gonna activate

    30:25

    um and uh okay. So that's like typ typically

    30:31

    what it does is it doesn't have like access to the the the kernel itself to

    30:36

    to run it. um and it doesn't have access to the the variables or the the state

    30:42

    because it's like optimized for larger code bases and we'll say

    30:48

    in data science is a bit more iterative. So def definitely so yeah

    30:53

    you you want to be able to run it look at the code and be able to then write your next piece of code based on what

    31:00

    the not only the the code in the cell but what the outputs were. you should debug it um before moving on. Um so when

    31:08

    I asked cursor to do like EDA on my files just keep it simple uh it wrote uh

    31:15

    it did read the top of the CSVs uh effectively. So read uh the top of it to

    31:21

    get like some information and then it wrote nine cells but didn't write it didn't run any of them effectively. Uh

    31:28

    but what it did do is it produced uh some like key findings. Um, and I then

    31:35

    asked it how did it like find key findings without it being able to access the without it like having executed the

    31:42

    code. Um, so uh what it did then was it uh kind of like reproduced some of the

    31:48

    uh code cells again because I was in agent mode but when I asked it again um uh basically it kind of touched on your

    31:55

    point earlier. So I just asked it how did you get the insights without having run the code?

    32:01

    It said it was a good point. So it's kind of hallucinating a bit. Yeah. So it read the file structure. It

    32:08

    had like domain knowledge applications and then like uh what it called um educ

    32:13

    educated like inferences. So just like the fact that it knows that like there's global warming, there's um certain

    32:20

    things. So it h produced some insights uh without having like run the code that were effectively uh hallucinated.

    32:29

    Got it. Okay. So it's it's a bit like trying to use a VS code for for doing data science manually, right? You you

    32:36

    typically iterative process and you want to be able to see your results and react to them and and so on. That's just

    32:42

    difficult in this environment. Exactly. Now it still has like some some usefulness. It can write the code. You

    32:48

    can still like ask questions of the the code. You can uh do like uh code completions, things like that. But um in

    32:55

    terms of like an agent kind of going end to end to be able to like do autonomous work um that's uh yeah the ability to h

    33:05

    execute the code and like work iteratively are kind of like two things that um currently for me are kind of

    33:13

    down downfalls for working on like data science projects. And this is all running locally. Yeah,

    33:20

    this is also running locally. So it's like um yeah, so that's actually another good point in terms of like a lot of the

    33:27

    times um you do want to uh be able to like burst onto the cloud to use like

    33:33

    GPUs or other kind of like larger larger computes um uh which you can do through

    33:39

    SSH but you still have to set up a remote um environment. Gotcha. Okay, let's switch to Zerve. uh

    33:46

    can you talk to us a bit about uh kind of what the inspiration behind Zer's agent is and and why we why we don't

    33:53

    think that there's been sort of a cursor moment for data science yet? Uh sure. So I think fundamentally

    33:59

    there's some just key differences that we've touched on a little bit in terms of like uh software development and data

    34:06

    science. Um it's everything from how you execute the code. Um so that is like

    34:12

    it's more iterative exploratory. Um which is why uh notebook environments

    34:17

    are popular in the first place is the ability to be able to um yeah write some

    34:22

    code see the results and write some more code um which is like um super useful

    34:29

    workflow. um it has some drawbacks um around like state management, stability

    34:36

    um things like that but in general it's like um how most of the world does their

    34:41

    uh their data science and for good reason. Um then there's um the yeah just

    34:49

    the uh data and the code are also like important. So when an agent is running

    34:54

    the results, uh the types of data it's produced are all um like a super

    35:01

    important context. Um so if you have just a file name, h knowing the data

    35:07

    types, knowing the types of like um if there's mixed data types in a column become like very important so you're

    35:13

    able to like uh convert them or uh handle all of the the different like intricacies of of your data sets. uh how

    35:22

    you should join them. Um things like that all take loads of time when you're

    35:27

    like starting projects. Um the types of code you write are like different as well. So just um typically

    35:34

    software it's kind of like a more closed system. You're following like patterns in the code. Um whereas in data science

    35:41

    you probably have less of an idea to start and you're leveraging a lot more third party packages. So hard to imagine

    35:47

    doing a data science project without using like pandas, numpy, h tensorflow, some of those like um packages. So you

    35:54

    have to learn how to use those. Um you have these kind of like concept around uh using like vectors uh for like

    36:02

    parallelizing your operations um more so than uh you would do in like your

    36:07

    typical kind of code basis. Um so uh there's probably 101 other different um

    36:14

    like um differences as well uh including even the um the deployment kind of like

    36:20

    cycles and how you like monitor them. So uh model deployment is like very different than uh your typical software

    36:28

    deployments. Um and CI/CD practices uh monitoring uh all seem to be very

    36:35

    different. So software engineering typically has very uh standard practices

    36:40

    where it's a bit more fragmented in uh the the data world. Um one one other

    36:46

    thing actually that just brings to mind is the it's a lot more like non-binary. So code compiling or like a test passing

    36:53

    isn't like necessarily um the um how you assess uh like a data uh project. you've

    37:01

    got to kind of um yeah just when you're assessing like the quality of model

    37:07

    looking at things like data leakage like um being skeptical of having like 100% accuracy things like that are all um far

    37:14

    more subjective probably sure like missing values how you handle missing values is going to depend on the

    37:21

    shape of the data how much of the particular variable is missing uh you know its relative importance to the

    37:27

    analysis that you're doing like lots of questions about context text questions like like subject matter questions that

    37:33

    were going to impact the way you handle that particular column. What about

    37:39

    sorry guys sorry no I was just going to go to a question that uh somebody asked in the comments

    37:44

    uh they said how does Zer handle scaling context across large data sets and longunning workflows without

    37:51

    overwhelming the model. So I guess maybe to say it another way, uh what's in the Zer context that's uh

    37:57

    that's different from like a a cursor and and that sort of thing and what happens when it gets big?

    38:03

    Um so may maybe what I can do is kind of like bring bring up Zer I guess at at this point to

    38:10

    uh just kind of uh show it and introduce it to people and then we can talk around uh some of the information that goes

    38:16

    into the the context. Um I love this part.

    38:22

    So this is Zerve and this is the uh just um to keep it like um apples with apples

    38:28

    and kind of a a comparison. Uh what we have is the same uh data sets that we

    38:34

    had from Kaggle that we had in our cursor example and we've given it the

    38:39

    same uh instruction. Um and in serve what we have is um so this uh

    38:46

    effectively produced a a four-step plan. Um when it was running it actually ran for uh 15 minutes non-stop. So uh when

    38:54

    what did you ask it to do to start with? Uh do EDA. So same same with

    39:00

    same so it was a data quality analysis exploratory statistics uh correlations.

    39:06

    Um so um in serve what you have is a a DAG. Um so each of these are code

    39:12

    blocks. You can combine them together. So you have Python R that can work like interoperably. You can connect the data

    39:18

    sources. You can mix in Genai um and like a whole host of other kind of like

    39:24

    block types. Um so uh in serve what kind of like happens different and we'll show um maybe we'll

    39:31

    kick off a work uh the agent kind of uh coding in a minute. Um but the

    39:37

    difference is uh when a block is created it'll also execute it. So it'll take uh

    39:42

    the context that it has available to it is uh anything that it puts in the the

    39:48

    output uh visualizations it creates um any of the uh the data the data frames

    39:55

    uh that's been created. So each of these are available to the agent. So when the

    40:02

    agent is running what it'll do is it'll use like multiple tool calls. So while

    40:07

    it's working, it'll decide to uh read a certain output, summarize its results,

    40:13

    keep it in context or not, if it's relevant, uh be able to access the outputs, so the metadata about the uh

    40:21

    different uh variable types uh or read the variables uh directly from the the

    40:28

    state. Um, so observe actually dynamically sets its context based on what the what the

    40:34

    result looked like. Exactly. Yeah. And I actually didn't know that. Yeah. So it's um and it does it at each

    40:42

    of these steps. So it's able to reset its context and be able to kind of based

    40:47

    on uh the the different blocks kind of uh read variables, the charts, the data

    40:53

    frames or the the code or the the output. Um so h each of the models that

    40:59

    we use so both of these models actually uh we use the same model in cursor and

    41:04

    in zerf here. So it's cloud for sonnet uh was used um but here you can kind of

    41:11

    see um it does uh some kind of like interesting things where it uh as it

    41:17

    goes it creates like all of the the analysis and then this is like information that's available for the the

    41:23

    next block when it's um continuing on. it's like a analysis. So before it ever

    41:28

    does the correlation, it has all of the statistical information um we'll say

    41:34

    that's available um and then when you ask it to give something like uh the uh

    41:40

    the insights um here uh it's able to kind of give u information based on what

    41:47

    it's like the the code that's been executed. Got it? Instead of like making it up based on what it thinks is going

    41:53

    on, it's actually running code at each step along the way and then the results from each block pass to the next block

    41:59

    and so on and the agent is reading that output and summarizing it into results here. Exactly. Yeah. So that's um that's kind

    42:07

    of the the core difference for like data science workflows here. It's the kind of the execution model. Um and there's kind

    42:14

    of a fundamental difference I guess in terms of like where it executes. This is in the the cloud um versus it being uh

    42:22

    local which means you can kind of like um have these running kind of in the background. Uh you can be working on

    42:30

    another another task. Um and it's able to yeah just work for for longer without

    42:36

    kind of um hugging uh any like resources on your your local machine as well.

    42:41

    Gotcha. How big is the Zerb agent's context window? Uh so it depends. It's a it's limited by

    42:48

    the model that you're using. So if you're using sonnet, I think it's 200,000 characters. Uh typically our

    42:55

    context windows end up in the 15 to 20,000 tokens, I believe.

    43:02

    Gotcha. Just depending on the size of the project and the complexity and so on. Yeah. how many tool calls it's done, how

    43:08

    much like relevant like so the the longer it goes typically um you have uh

    43:13

    you have like longer context build up because it has like information about like the the task that it's already

    43:20

    performed while it's been been working. Um there is techniques as well um to

    43:25

    like compact it. So you can um even um you can just summarize feed feed the

    43:32

    context window into a large language model and have it like effectively

    43:37

    condense down the information so that you can turn a larger context window into a smaller context window for it to

    43:44

    to continue. Gotcha. I actually saw a paper published not long ago about compression and

    43:50

    hallucinations and being able to run a test to see if uh if a hallucination was

    43:56

    actually happening rather than having to read and sort of evaluate your uh the actual answers. I haven't fully sort of

    44:03

    internalized how that works, but that's got to be related. Yeah. Yeah. Yeah. Oh, definitely. And I' I've

    44:09

    seen it a few times actually where you um where it does like compact um and

    44:14

    condense the the context um it can often go off on a tangent immediately

    44:19

    afterwards. So it's um uh picks up something that it shouldn't from the

    44:24

    it's like you told it not to do something previously and then as soon as it gets like um condensed it'll start

    44:30

    the thing that you asked it not to do. Awesome. Okay. um challenges. I want to

    44:37

    talk about engineering challenges for building this thing. What was the what do you think was the hardest thing uh or

    44:44

    maybe one or two of the hard things that it uh that you you guys encountered when you were building the agent and getting

    44:49

    it to perform reliably? Uh oh, there's um uh there's been a few.

    44:54

    Um the initial one actually was uh probably just structuring the code base to be like um aentic. So it's uh we had

    45:03

    kind of written the uh the code over like a couple of years. Um it was like a standard like backend

    45:11

    um that has only one way of like communicating with it. It's like somebody would press a button on the front end that would make an API request

    45:18

    and then they would like update a database or take some action like running a block. um uh when you want

    45:27

    both the user and the agent to be able to do it, you've got to like restructure and rethink about your your code because

    45:32

    now you want the agent to be able to do it as well as the the user and ideally you want to be able for them to be able

    45:39

    to do it at the same time. So when a user does something while the agent is working, you want the context to be uh

    45:46

    updated. Um then so that was uh definitely one uh the um

    45:53

    that's like a plumbing and orchestration type issues. It's it's just yeah that's mainly like

    45:58

    orchestration. Um there's there was small differences between all of the different like model providers. Um so

    46:05

    like prompt engineering to get it to work from like one model to another when

    46:10

    you're um because basically it is uh for the most part the context that you're providing to it. um and they've all been

    46:18

    trained slightly differently. Um so having it um be able to to do that um is

    46:25

    well like be able to easily switch across like model providers is um uh somewhat of a a challenge. Um UIUX is

    46:34

    always a challenge I think. Um there's always I remember sorry to interrupt I remember

    46:40

    when I first started testing the one of the early versions of the agent it kept swallowing all the errors. Uh so it

    46:47

    would like it would write logic in to say try this and if it doesn't work just assign these dummy variables and just

    46:52

    keep going as if nothing went wrong. Oh it did it did it all the time. Try except everywhere. It was fun. Um

    47:00

    were just so eager to please. Oh that's it's a it's it's a big challenge. Um, that's actually a super

    47:06

    uh good one, Greg, in terms of they are they do want to just like h give you a

    47:13

    good like positive um they're kind of like yesmen in that sense. So it's like um uh getting it to actually stop when

    47:21

    it can't do something is like a big um a big challenge. So especially when you're working with data, it's like mightn't be

    47:27

    possible. Uh with you ask it a question and the data set isn't available. Um

    47:32

    that's a that's an issue and a related one is um uncertainty with like working on a

    47:39

    data project. Um so uh you have a file you've never looked at it before you say

    47:44

    um I want to do like a a model training um the if you ask like an LLM like what

    47:51

    the steps it should be or like um an agent what the step should be it could make like a to-do list of like eight

    47:56

    things um and in reality it could fail after the first one. it's just like not

    48:01

    feasible should change the approach. Um but often times what it'll do is it'll

    48:06

    just continue like no nothing uh is wrong. So getting it to the uh which was

    48:12

    one of the questions earlier around like the safeguards and the guardrails um those kind of like checks the

    48:18

    adaptability then to update the plan and give like um

    48:23

    uh good uh useful information to the user is like uh critical.

    48:29

    So, okay, future. Let's think about the future because um I've been seeing like code complete and stuff like that for a

    48:35

    long time and my first action was always to turn it off and because it's just annoying and it never really worked very

    48:42

    good. That was of course before the the agents and the LMS and stuff like that. But now, like as I was scrolling through

    48:47

    LinkedIn this morning as I was, you know, kind of getting ready for the day, I'm seeing guys come on and say the same

    48:54

    stuff, right? these these uh crusty engineers going, "All right, show me a product with AI and I'll uh you know, my

    49:00

    first question will be, how do I turn this off so I can get back to to my manual coding?" Uh when I was I was the

    49:07

    chief customer officer at Data Robot, we invented uh automated machine learning back then and that was a lot of the

    49:13

    response that we got. It was like I don't want an automated machine learning system. I want to build my own machine learning because, you know, I know best

    49:20

    and I know what's better and all that sort of thing. Do you think this large language model stuff is kind of like

    49:26

    that? Like is it going to be a flash in the pan and people are going to realize, hey, the the emperor's got no clothes

    49:31

    and this thing is just a fancy autocomplete and it's never really going to work or is it just going to get better and better until it's like

    49:38

    legitimately replacing people's workflow and stuff like that? So I I Yeah. So I I think um even even

    49:45

    if it didn't get any better, it's here here to say is what I would say. even if it doesn't get much better than fancy

    49:52

    autocomplete because it's um already a big productivity gainer. H there's

    49:57

    probably like three three different types of um um people that are like

    50:03

    facilitated by uh coding agents. There's like the experts for their productivity.

    50:09

    There's non-domain experts who have like access to things. So think like um uh

    50:14

    lovable for like front-end development. Um, and then there's like educational

    50:20

    purposes. So, it's um you can ask as stupid a question as you want and it'll give you an answer and explain things to

    50:27

    you. It has like unlimited patience and can show you like new techniques for for doing things which is um probably where

    50:34

    Stack Overflow didn't work and people would ask like their as small and silly a question as they wanted to like chat

    50:41

    you BT. Um, so, um, yeah, Stack Overflow wasn't the most friendly place for stupid questions, was

    50:47

    it? Yeah. Yeah, it was a terrible place for super questions, but it was um, they haven't taken that website down

    50:52

    yet, have they? No. No, it's it was back to LA last year. I think it was back to like um,

    50:58

    God lost it was Yeah. back to like late 20110s kind of levels of uh, tra

    51:04

    traffic, I think. Um, so prediction is I think um, they'll get better and better.

    51:11

    um they haven't uh I think the innovation hasn't um slowed down um in

    51:16

    terms of um the just how good the things like the context windows get. There

    51:23

    might be some like cool um developments hopefully in the um relatively short

    51:29

    term around like reasoning and like how they do like um like thinking time at like inference um which could be uh

    51:36

    cool. Um but yeah, I think um will it replace uh everybody's workflows?

    51:44

    Probably not. Could it automate some of the the tasks? I think it probably um probably could.

    51:51

    We got a couple of questions coming in from LinkedIn. Uh one around how does this help you with data modeling and the

    51:57

    other one around how does it automate ETL data pipelines? It's kind of two questions in the same category. Can you

    52:03

    talk about yes a bit? Uh the ETL pit I think is actually where it's like um

    52:08

    yeah so EDA ETL all super like um interesting.

    52:13

    There's um one thing I'd like to see probably more of with the agents that we're looking at is um data discovery.

    52:21

    So um understanding kind of like uh database structures uh being able to

    52:26

    like potentially like index them so that you don't have to do the same um like iterative steps to like understand the

    52:33

    tables. um is um something that I think is like potentially like very valuable.

    52:39

    Um but um ETL pipelines I think um yeah it's a very good application uh for

    52:46

    agents. Uh Zerve is particularly good at it because it's set up as a DAG has like parallelism. Um so um uh yeah so um yeah

    52:57

    I'd say um give give it a try. So you can go to app.serve.ai AI try it for

    53:02

    free give it a prompt and see if it can build a ETL pipeline. Um on the data

    53:08

    modeling side as well um yeah I think there's uh some uh some work to do there

    53:14

    for um agents potentially to work on like really large databases um just to

    53:21

    get the data discovery integrating with things like um data catalogs definitely

    53:26

    helps to provide context um but it um yeah in general I'd say working with uh

    53:34

    data um and if you write code to work with data. Um, it definitely should be a

    53:40

    productivity gainer if nothing else. Well, yeah, like Jason said, zerve.ai

    53:46

    will get you uh get you a free account so you can get on and and play around with it, kick the tires, see what uh see

    53:53

    what the agent can do for you. We did mention the university programs uh that are teaching data science and there's

    53:58

    definitely an application there uh as well. Uh, Zerve was designed to be self-hosted, so you can install it in

    54:05

    your your own cloud environment, so your data doesn't have to ever have to leave your VPC. So, it's great for secure data

    54:11

    as well. We would love to get folks on there and uh even more folks on there experimenting, kicking the tires,

    54:17

    sending us feedback uh and just benefiting from the the way the agent works to deal with with data.

    54:24

    Yeah. And on the ETL pipelines as well, there's like full scheduling built in and a git integration. So you can

    54:30

    version it and uh schedule it to to run. So once your pipeline runs, you can have it run um with a custom chron expression

    54:38

    or uh every every hour or every day. All right, last question. We're uh we're

    54:44

    out of time here. Give us uh give us your biggest wish for for agents uh in

    54:50

    the next year or two. I I'll do mine and then I'll give you a second to think about it while I talk about yours. Here's the thing that I want the most

    54:56

    and that is the ability to cleanly interrupt an agent's process and say, "Hey, I forgot this in my prompt without

    55:04

    sort of it losing its place or having to start from scratch." Um, I don't think any of the models have

    55:09

    really gotten gotten that down, right? Uh, I know you can do some of that inside of Zerf, but I really want to be

    55:16

    able to interrupt because that's sort of how I communicate and I never remember everything that I need to include in the

    55:21

    prompt. Oh, 100%. That's that's a that's a killer one, Greg. That's a really good

    55:27

    one. So, I won't top that one. Being able to interrupt um is um uh one of the

    55:33

    top things I'd like agents to be able to ask me questions proactively, I think, is like one of the things. So if I need

    55:40

    uh something or need something that there's like a really clean uh UI that like tells me and then it's able to

    55:46

    continue. So it interrupting me instead of me interrupting it I guess. Um and then um custom outputs for like um uh

    55:56

    like data projects I think is something that I'd really like. So if I have um uh

    56:02

    like to be able to say I want PowerPoint, I want it in this format, I want it in that format. um that it's

    56:07

    able to do that in my um my own bespoke kind of like formats and things like that. So, if I say this is your

    56:13

    template, go off and like create this. Uh I think that would potentially save a lot of time.

    56:19

    Well, judging by the way things are going, I don't think it'll be long before both of those things are are completely within reach.

    56:25

    Yeah. Yeah. So, it's um Yeah. So, um and do you have a favorite um agentic tool

    56:31

    at the minute? Oh, besides Zerve. Besides Zerve. Yeah. Yeah. Ah man, my

    56:37

    catchy BT knows so much about me. Uh I have tried operator. So chat I think they actually turned off operator when

    56:44

    they released the agents I think. Yeah. Yeah. Yeah. So I thought operator was cool. It was a little slow but uh I asked chat PT

    56:52

    the other day what it knew about me and it was alarming the things that it knew about me. Got a few details wrong. I

    56:58

    posted on LinkedIn actually but uh yeah I have a deep personal relationship with Jet GPT.

    57:03

    Oh very good. What about you? Um, is it my I chatt is my go-to one at the minute, I would

    57:10

    still say. Um, I do like I I've tried Comet, the perplexity browser. Um, it's

    57:18

    Oh, Grock is hilarious though. I like I like about Grock that it's uh un unfiltered.

    57:24

    Yeah. Yeah. That I just um Sorry, interrupted though. Oh, no. It's a that's a good one. Yeah, it's um and uh on it it's actually some

    57:31

    so they've done a really good job um like yeah two years ago would we have thought there'd be like so many like

    57:38

    good model providers and even like some of the open source models are are good. So it is um um yeah the the variety in

    57:46

    it and having like choices for like the different um uh large models I think is

    57:51

    a a big a big win in general for for everybody. Nice. All right. Well, I think our

    57:57

    time's about up, but uh thanks for all the insights, Jason. Jason is the true brains behind Zerve. Uh he and the

    58:05

    engineering team have built something truly remarkable. And uh it's a privilege for me to get to get on there

    58:10

    and use it and p around and and have it amplify my ability to to write code. And

    58:16

    uh I'm just super excited for where we're going and what we're building and and I can't wait for everybody to get a chance to try it.

    58:21

    Yeah. So, um, Greg, Greg's underplayed. He's he's rolled massively there, but it's, uh, yeah, no, been a been a

    58:27

    pleasure and, yeah, with, um, yeah, like Greg said, um, getting people in, trying

    58:32

    it, giving feedback, um, seeing what people build and, um, yeah, if people want to continue the conversation as

    58:39

    well, there's a community Slack channel, um, that they can, uh, join as well.

    58:44

    Excellent. All right. Well, in that case, let's sign off, Jason. Yep. Thanks very much, Greg. And thanks

    58:50

    very much everybody for for tuning in.

    59:06

    Take

    59:13

    the position.

Related Videos

Decision-grade data work

Explore, analyze and deploy your first project in minutes