LLM Fine-Tuning w/ Axolotl

May 23, 2024 (9mo ago)

Table of Contents

  1. Our Options
  2. Axolotl
  3. Fine-Tuning Lifecycle
  4. Fine-Tuning on Model

Our Options

Base Model

  • Model size: typically we use 7B model for easier hosting.
  • Model family: we will try multiple popular models and decide.

LoRA vs Full Fine-Tune

LoRA Is All You Need

  • (often used) LoRA: reduces the # of trainable parameters by training low-rank matrices.
    • (default first step) QLoRA: trains a quantized version of low-rank matrices but the LLM may lose in quality.
  • (rarely used) Full Fine-Tune: adjusts all the parameters of a pre-trained LLM.

Axolotl

Axolotl Intro

  • Wrapper for Hugging Face tools.
  • Easy to use.
  • Best practices built-in.

Fine-Tuning Steps

  1. Go to Axolotl's GitHub repo and start with a template like this.
  2. Use the Quickstart section of the read me page.

Quickstart Script

# preprocess datasets - optional but recommended
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess examples/openllama-3b/lora.yml

# finetune lora
accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml

# inference (not for production)
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./lora-out"

# inference with gradio
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./lora-out" --gradio

An Example of Preprocessing Template

Note that the input (everything in the template) is masked so that it does not affect the loss during training.

<start>
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
# Instruction:
{instruction}
### Input:
{input}
### Response:
output
<end>

Debugging Axolotl

Fine-Tuning Lifecycle

Prompt Engineering

  • Use a off-the-shelf LLM such as GPT-4 and Claude
  • Prototype the solution by iteratively apply prompt engineering techniques to solve the problem.

Evaluation System

  • Establish a virtuous cycle of improving the solution with evals at its center.
  • Level 1 evals
    • Assertions and unit tests that don't involve calls to a LLM.
    • Filtering and curating data.
    • Inference time to automatically correct output.
  • Level 2 evals
    • Logging traces
    • Human evaluations
    • Automated evaluations with LLMs
  • Level 3 evals

Synthetic Data

  • Generate as much as data as resources allowed.
  • Use the most powerful LLM to apply data augmentation and/or self-instruct.

Prepare Data For Fine-Tuning

No need to make the data perfect the first time. Go through the pipeline and see some predictions

  • Curate and filter the data
    • Remove duplicates and similar data
    • Remove data that is overly complex or simple
    • Use tests to filter data with Lilac
  • Choose a dataset format
  • Change the yml config
    • Change the dataset
    • Train on inputs: false
    • Log training metrics to Weights & Biases
    • Upload the model to Huggingface
  • Use the preprocess command: !python -m axolotl.cli.preprocess config.yml
  • Load the dataset and inspect it in the last_run_prepared format
  • Use verbose debugging to check things: !python -m axolotl.cli.preprocess config.yml --debug
  • Look at special tokens: tok.decode([42])

Training

  • Run training: accelerate launch -m axolotl.cli.train config.yml
  • Experiment with different hyperparameters such as learning rates and batch sizes

Sanity Check

  • Use Huggingface to download the LLM for sanity checking
  • Construct a prompt template and make sure it works
  • Sanity checking a few examples

Fine-Tuning on Model

Model Intro

  • Feels local, but its remote ("code in production")
  • Massively parallel
  • Python native
  • Docs: https://modal.com/
  • https://github.com/modal-labs/lIm-finetuning
  • Has additional defaults / some differences
  • Merges LoRA back into the base model
  • Use a -data flag instead of relying on the config
  • Deepspeed config comes from the axolot repo that is cloned-

Debug Data

  • https://github.com/modal-labs/lIm-finetuning/blob/main/nbs/inspect_data.ipynb

  • Tip: replace github.com with nbsanity.com to view notebooks


References:

  1. Dan Becker
  2. Hamel Husain
  3. Wing Lian