Fine tuning Interview Questions
Core Concepts of Fine-Tuning
What is fine-tuning? (Very Frequent) 🎯
Fine-tuning is the process of taking a large, pre-trained model (like Llama 3 or GPT-4) that has general knowledge and training it a little more on a smaller, specific dataset. This specializes the model for a particular task or to have a certain style.
Analogy: Think of a chef who has graduated from a top culinary school (pre-training). Fine-tuning is like them working for a year at a high-end pastry shop to become a specialist in making desserts.
Why would you fine-tune an LLM? What are the benefits?
You fine-tune a model to teach it something new that can't be taught well through prompting alone. The main reasons are:
To Teach a Style or Persona: To make the model consistently talk like a specific character (e.g., a pirate, a legal expert).
To Teach a New Skill/Format: To make the model reliably perform a structured task, like always generating JSON output or writing code in a specific style.
To Improve Reliability on a Niche Domain: To make the model more accurate and familiar with specific jargon from a field like medicine or finance.
What's the difference between Pre-training and Fine-tuning? (Very Frequent)
Pre-training: This is the initial, massive training phase where a model learns general knowledge, grammar, and reasoning abilities from a huge chunk of the internet. This takes millions of dollars and months of time.
Fine-tuning: This is a second, much shorter and cheaper training phase. It adapts the already-trained model to a specific, narrow task using a small, high-quality dataset.
What is Transfer Learning?
Transfer learning is the general machine learning concept of taking knowledge learned from one task and applying it to a different but related task. Fine-tuning is a form of transfer learning. We are transferring the general language knowledge from the pre-trained model to our specific task.
When should I use Fine-Tuning vs. RAG vs. Prompt Engineering? (Very Frequent)
This is a critical decision. Here's a simple guide:
| Method | Best For... | Example |
| Prompt Engineering | Controlling the model on a case-by-case basis. Quick and easy. | "Summarize this article for me in three bullet points." |
| RAG | Providing factual, up-to-date knowledge to reduce hallucinations. | "What were our company's sales figures for last quarter?" |
| Fine-Tuning | Changing the fundamental behavior, style, or skill of the model. | "Make a chatbot that always responds in the persona of Shakespeare." |
What are the risks or downsides of fine-tuning?
Catastrophic Forgetting: The model can become so specialized in its new task that it "forgets" some of its general abilities and performs poorly on things it used to do well.
Cost & Effort: While cheaper than pre-training, it still requires significant GPU resources, time, and effort to prepare the data.
Data Requirement: It requires a high-quality, clean dataset. "Garbage in, garbage out" is very true for fine-tuning.
The Fine-Tuning Process
What are the main steps to fine-tune a model? ⚙️
Goal Definition: Decide exactly what behavior you want from the model.
Data Collection & Preparation: Gather and format high-quality examples for your task. This is the most important step.
Choose a Base Model: Select a pre-trained model that fits your needs (e.g., Llama 3 8B, Mistral 7B).
Training: Run the fine-tuning process on a GPU, where the model learns from your dataset.
Evaluation: Test the new model to see if its performance has improved on your task and check if it has gotten worse on other tasks (regression testing).
Deployment: If you're happy with the results, deploy the model for use.
What kind of data do you need for fine-tuning?
You need a dataset of high-quality examples. For instruction fine-tuning, this is typically a set of prompt-response pairs. For example, a dataset could have hundreds or thousands of examples like {"instruction": "Who was Leonardo da Vinci?", "output": "Leonardo da Vinci was a..."}.
How do you format data for fine-tuning?
The data is usually formatted into a specific structure, often a JSONL file (JSON Lines), where each line is a JSON object representing one training example. The exact format depends on the training script, but it often involves a template that shows the model where the user's prompt starts and where the AI's response should begin.
What are hyperparameters in fine-tuning?
Hyperparameters are the settings you configure before the training starts. The most important ones are:
Learning Rate: How big of a step the model takes when adjusting its weights. Too high, and it won't learn; too low, and it will be too slow.
Epochs: The number of times the model will see the entire training dataset.
Batch Size: The number of training examples the model processes at one time.
What is a "base model" versus a "fine-tuned model"?
Base Model: The original, general-purpose, pre-trained model you download (e.g.,
meta-llama/Llama-3-8B).Fine-tuned Model: What you get after you've continued training the base model on your own dataset. It's a new, specialized version.
How do you evaluate a fine-tuned model?
Holdout Set: You set aside a portion of your data (a validation or test set) that the model never sees during training. You test the model's performance on this set.
Human Evaluation: For tasks like style or creativity, you often need humans to rate the quality of the model's outputs.
Benchmarks: You can also test the model on standard academic benchmarks to ensure it hasn't gotten worse at general tasks.
Key Techniques & Terminology
What is "full fine-tuning"?
This is the original method where you update all the weights and parameters of the pre-trained model. It gives very good results but requires a lot of memory (VRAM) and computational power.
What is PEFT (Parameter-Efficient Fine-Tuning)? (Very Frequent) ✨
PEFT is a set of techniques that allow you to fine-tune a model by only updating a very small number of parameters instead of all of them. The majority of the model's original weights are "frozen" and not changed.
Analogy: Instead of rewriting a whole 1,000-page textbook (full fine-tuning), PEFT is like adding a few pages of new notes or sticky tabs (training a few new weights). It's much more efficient.
Why is PEFT so popular?
Dramatically Reduces Compute Needs: It lets you fine-tune large models on consumer-grade GPUs.
Faster Training: Since you're training far fewer parameters, the process is much faster.
Avoids Catastrophic Forgetting: Because the original model is mostly frozen, it's much less likely to forget its core abilities.
Smaller Storage: The final "fine-tuned model" is just a small file of the new, changed weights (the "adapter"), which you can place on top of the original base model.
What is LoRA (Low-Rank Adaptation)? (Very Frequent)
LoRA is the most popular PEFT technique. It works by freezing the original model weights and injecting small, trainable "adapter" matrices into the layers of the Transformer. We only train these tiny adapters, which is incredibly efficient. When we're done, we just need to save these small adapter matrices.
What is Quantization?
Quantization is a technique to make a model smaller and faster by reducing the precision of its weights. For example, you can convert the model's numbers from 32-bit floating points to 16-bit or even 4-bit integers. This is like rounding numbers to have fewer decimal places. It makes the model take up less memory and run faster, with only a small drop in performance.
What is QLoRA?
QLoRA is a very efficient technique that combines Quantization and LoRA. It first quantizes a base model to a very small size (e.g., 4-bit) to save memory, and then it trains LoRA adapters on top of that quantized model. This method has made it possible to fine-tune very large models on a single gaming GPU.
What is "instruction fine-tuning"?
This is a specific type of fine-tuning where the goal is to make a base model better at following user commands and having a conversation. The dataset consists of (instruction, response) pairs. Models like Llama 3-Instruct or GPT-4 are instruction-tuned versions of their base models.
Practical & Scenario-Based Questions
You want to create a customer support bot that can classify user tickets into categories: "Billing," "Technical Issue," or "General Inquiry." What's the best approach?
Fine-tuning would be an excellent approach. You could gather a dataset of thousands of past customer tickets and their correct categories. Then, you can fine-tune a smaller model (like Mistral 7B) on this classification task. It will become an expert at categorizing your specific types of tickets.
You notice your fine-tuned model is just repeating the examples from your training data. What is this called and how do you fix it?
This is called overfitting. The model has memorized the training data instead of learning the underlying pattern. You can fix this by:
Adding more diversity to your training data.
Reducing the number of training epochs.
Adjusting the learning rate.
You only have a single GPU with 24GB of VRAM. Can you fine-tune a 70-billion parameter model like Llama 3 70B?
With full fine-tuning, absolutely not. But with modern PEFT techniques, yes, it's possible. By using QLoRA, you can quantize the 70B model to 4-bit, which drastically reduces its memory footprint, and then train small LoRA adapters on top.
What is the difference between a "base" Llama 3 model and an "Instruct" Llama 3 model?
The "base" model is the pre-trained foundation model that is good at predicting the next word but not necessarily good at following commands. The "Instruct" model is the result of taking that base model and performing instruction fine-tuning on it, making it an expert at being a helpful chatbot.
You are building a chatbot for a hospital. What is more important: RAG or fine-tuning?
Both are important, but for different reasons. You would use RAG to provide the chatbot with up-to-date, factual information about a specific patient's medical records (which are private and not in the training data). You would use fine-tuning to train the model on medical textbooks and conversation examples so it understands medical terminology and communicates with the right clinical tone. A production system would likely use both.
What is a "LoRA adapter"?
A LoRA adapter is the small set of new weights that you train using the LoRA method. After fine-tuning, your output is not a whole new 7-billion parameter model. Instead, it's just this small adapter file (often only a few megabytes), which you can then load "on top" of the original base model to give it its new specialty.
Comments
Post a Comment