Fine-Tuning Llama 3 in Zerve

Fine Tuning Llama 3 in Zerve

Introduction

In late April, Meta released Llama 3 and it quickly garnered attention for its enhanced capabilities in language understanding and generation. The model includes a context length of up to 8,192 tokens (up from 4,096 in Llama 2) and introduces a new tokenizer with a 128K token vocabulary.

We leveraged our Hugging Face integration in Zerve to import Llama 3 into a canvas and guide it to build a personalized travel itinerary for a trip to Italy. Because Zerve persists your models, variables, and data in the canvas, once you import Llama 3 (or any other LLM), it becomes a fully hosted, accessible, private version of that model.

Understanding the Fine Tuning Technique Used

For this project we used a light-touch approach. Rather than retraining weights extensively, we combined the pre trained model with carefully structured prompts that steer generation toward the task. You can think of this as prompt engineering informed by context—effective when the base model already understands the domain and you want tailored outputs without heavy compute.

Technical Deep Dive: Fine Tuning Llama 3 on Zerve

Below is the flow we followed to adapt Llama 3 for personalized travel itineraries using a custom scenario as a “tour guide” in Italy.

Code Initialization and Setup

Initialize the tokenizer and model via AutoTokenizer and AutoModelForCausalLM, and set generation configs for the task.

Tokenizer and model initialization

Data Preparation

Define the system context and a detailed user query for a three week family trip. This enables consistent structure and easy personalization.

Prompt and context setup

Generating a Response

Use generate to produce a draft itinerary, then iterate on constraints (budget, pace, dietary needs) as needed.

Generated itinerary sample

GPU Infrastructure

With Zerve’s IDE, you can select serverless GPUs per block and mix in CPUs or Lambdas where appropriate, without managing infrastructure.

GPU selection in Zerve

Results

The workflow generated a family itinerary across Italy with per day plans, lodging suggestions, and food recommendations. It was easy to tweak pacing and preferences to match the family’s needs.

Final itinerary output

Efficiency and Privacy

While our example used non sensitive data, Zerve’s self hosted option on AWS lets you keep prompts, variables, and outputs private in your own cloud. You can import open source models, fine tune with your data, and avoid sending prompts to third parties.

Conclusion

Using Zerve to guide Llama 3 with structured prompts delivered high quality trip plans without full retraining. Serverless GPUs simplified execution so the focus stayed on iteration and quality rather than infrastructure.

Explore the canvas and code here: open the example. Drop comments or questions right in the canvas—we would love your feedback.

FAQs

Is this “fine tuning” or prompt engineering?

This example uses a light approach that relies on strong prompts and context rather than full weight updates. You can still perform parameter-efficient fine tuning in Zerve if you need domain adaptation.

Can I run true parameter-efficient fine tuning (LoRA/QLoRA) in Zerve?

Yes. You can attach GPU blocks, install PEFT libraries, and run LoRA or QLoRA training. Zerve persists artifacts and lets you version checkpoints with Git.

How are privacy and IP protected?

With self-hosted deployments your prompts, datasets, and models remain in your cloud. You control access, networking, and data residency.

What if I need longer context windows?

Zerve supports multiple model families and variants. You can choose a model with a larger context window or use retrieval to inject relevant context at generation time.

Do I need to manage GPUs or cluster scaling?

No. Select GPU types per block and Zerve provisions serverless compute on demand. Mix CPUs, GPUs, and Lambdas in the same workflow.

Can I expose the itinerary generator as an app or API?

Yes. Use App Builder to publish a form-based app or expose the pipeline as an API endpoint with authentication and versioned deployments.

Jason Hillary

May 7, 2024

Blog

Fine-Tuning Llama 3 in Zerve

Fine Tuning Llama 3 in Zerve

Jason Hillary

05/07/2024

Fine Tuning Llama 3 in Zerve

Introduction

Understanding the Fine Tuning Technique Used

Technical Deep Dive: Fine Tuning Llama 3 on Zerve

Below is the flow we followed to adapt Llama 3 for personalized travel itineraries using a custom scenario as a “tour guide” in Italy.

Code Initialization and Setup

Initialize the tokenizer and model via AutoTokenizer and AutoModelForCausalLM, and set generation configs for the task.

Tokenizer and model initialization

Data Preparation

Define the system context and a detailed user query for a three week family trip. This enables consistent structure and easy personalization.

Prompt and context setup

Generating a Response

Use generate to produce a draft itinerary, then iterate on constraints (budget, pace, dietary needs) as needed.

Generated itinerary sample

GPU Infrastructure

With Zerve’s IDE, you can select serverless GPUs per block and mix in CPUs or Lambdas where appropriate, without managing infrastructure.

GPU selection in Zerve

Results

The workflow generated a family itinerary across Italy with per day plans, lodging suggestions, and food recommendations. It was easy to tweak pacing and preferences to match the family’s needs.

Final itinerary output

Efficiency and Privacy

Conclusion

Explore the canvas and code here: open the example. Drop comments or questions right in the canvas—we would love your feedback.

FAQs

Is this “fine tuning” or prompt engineering?

Can I run true parameter-efficient fine tuning (LoRA/QLoRA) in Zerve?

Yes. You can attach GPU blocks, install PEFT libraries, and run LoRA or QLoRA training. Zerve persists artifacts and lets you version checkpoints with Git.

How are privacy and IP protected?

With self-hosted deployments your prompts, datasets, and models remain in your cloud. You control access, networking, and data residency.

What if I need longer context windows?

Zerve supports multiple model families and variants. You can choose a model with a larger context window or use retrieval to inject relevant context at generation time.

Do I need to manage GPUs or cluster scaling?

No. Select GPU types per block and Zerve provisions serverless compute on demand. Mix CPUs, GPUs, and Lambdas in the same workflow.

Can I expose the itinerary generator as an app or API?

Yes. Use App Builder to publish a form-based app or expose the pipeline as an API endpoint with authentication and versioned deployments.

Transform your data journey with Zerve

Explore & develop at light speed.