Llama-2 is a household of open-source LLMs launched by Meta. Llama-2 7B is the smallest mannequin of this household when it comes to parameter rely. The “chat” variant of Llama-2 7B is optimized for chatbot-like dialogue use instances. That is notably helpful for functions that contain conversations because it’s optimized to generate responses in a conversational context, making it notably helpful for functions like chatbots or digital assistants. The Llama-2 7B chat mannequin is smaller and quicker than its counterparts within the Llama-2 household, making it a sensible choice for velocity and cost-efficiency on the expense of some accuracy.
Positive-tuning LLMs basically means taking a pre-trained mannequin like Llama-2 that has been already been skilled on an enormous datasets and making minor modifications to the weights of the trainable parameters of this mannequin to optimize its efficiency on a brand new, particular activity or dataset. Through the means of fine-tuning, the general structure of the pre-trained Llama-2 mannequin stays unchanged since solely a small set of parameters’ weights is modified to be taught the necessary options of the coaching dataset.
Positive-tuning provides a number of benefits:
- Price-effective and environment friendly: Coaching a LLM from scratch may be extraordinarily time-consuming and computationally costly. Therefore, fine-tuning is a good different because it makes use of a pre-trained mannequin and builds on this, considerably decreasing the time and compute assets whereas attaining good outcomes.
- Improved efficiency: Since pre-trained LLMs are already skilled on large quantities of knowledge (~ 2 trillion tokens for Llama-2), by fine-tuning a pre-trained mannequin, we will reap the benefits of this information to enhance efficiency on our new, particular activity or dataset.
This tutorial is predicated on this Google Colab pocket book discovered here, the place you might run all of the cells sequentially and get your private fine-tuned Llama-2 chatbot!
On this tutorial, we’ll be utilizing the Nvidia T4 GPU with 16 GB of VRAM that’s provided within the free model of Google Colab. If you happen to’re operating the pocket book by yourself GPU, that’s works too! The code under will mechanically connect with the T4 GPU if operating it on Colab, or the primary GPU (when you’ve a number of GPUs) when you’re operating it elsewhere.
!pip set up GPUtilimport torch
import GPUtil
import os
GPUtil.showUtilization()
if torch.cuda.is_available():
print("GPU is obtainable!")
else:
print("GPU not accessible.")
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Set to the GPU ID (0 for T4)
Now that you simply’ve established your GPU connection, it’s time to put in (and import) the required libraries for fine-tuning.
!pip set up git+https://github.com/huggingface/peft.git
!pip set up speed up
!pip set up -i https://pypi.org/easy/ bitsandbytes
!pip set up transformers==4.30
!pip set up datasets
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig,LlamaTokenizer
from huggingface_hub import notebook_login
from datasets import load_dataset
from peft import prepare_model_for_kbit_training
from peft import LoraConfig, get_peft_model
from datetime import datetimeif 'COLAB_GPU' in os.environ:
from google.colab import output
output.enable_custom_widget_manager()
Since Llama-2 is ruled by the Meta license, to obtain the mannequin weights and tokenizer, please go to Meta’s website to simply accept their license and request entry for his or her fashions in HuggingFace (normally ought to take lower than a day to get entry).
When you’ve gotten entry to the Llama-2 fashions, log in to HuggingFace to enter the write entry token when prompted to load the mannequin in your pocket book.
if 'COLAB_GPU' in os.environ:
!huggingface-cli login
else:
notebook_login()
Having accomplished our setup, it’s time to load our mannequin (Llama-2 7B Chat) utilizing QLoRA
(quantization of parameter weights to 4 bits) to cut back reminiscence necessities and improve coaching velocity, whereas guaranteeing that we don’t attain the bottleneck of the 16GB GPU reminiscence.
Word: Within the code under, we load all trainable parameters within the 4-bit normal-float (nf4
) datatype and use double quantization to additional reminiscence financial savings. Nonetheless, our computational precision is 16-bits (bfloat16
) since we wish speedup compute of hidden states because the default datatype is float32
.
base_model_id = "meta-llama/Llama-2-7b-chat-hf"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)mannequin = AutoModelForCausalLM.from_pretrained(base_model_id,
quantization_config=bnb_config)
Most of our personal information is in unstructured codecs, like textual content information or pdfs. Whereas reformatting this to structured information like JSON
or CSV
information may end in higher coaching outcomes since there’s a clear mapping between question-answer pairs, nevertheless this format is labor intensive and solely superb for eventualities the place information is solely Q&A pairs neatly organized and follows a predictable construction. We perceive this and therefore this tutorial focuses on fine-tuning Llama-2 solely on information in unstructured .txt
information!
Since Llama-2 has been skilled on information till July 2023, for this tutorial we’ll be utilizing the information in regards to the Hawaii wildfires in August 2023 sourced from the report of the Maui Police division discovered here. We’ve copied the info of the PDF into a number of textual content information with none further formatting.
We’ll clone the GitHub repository containing the textual content information, and cargo them as coaching information.
!git clone https://github.com/poloclub/Positive-tuning-LLMs.git
train_dataset = load_dataset("textual content", data_files={"practice":
["hawaii_wf_1.txt", "hawaii_wf_2.txt",
"hawaii_wf_3.txt","hawaii_wf_4.txt",
"hawaii_wf_5.txt","hawaii_wf_6.txt",
"hawaii_wf_7.txt","hawaii_wf_8.txt",
"hawaii_wf_9.txt","hawaii_wf_10.txt",
"hawaii_wf_11.txt"]}, cut up='practice')
Having loaded our information, we’ll must tokenize (break down sequences of textual content into smaller components or “tokens”) this coaching information earlier than passing this into Llama-2 to fine-tune it. We’ll initialize the LlamaTokenizer
with the pre-trained Llama-2–7B-chat mannequin and manually set the EoS
token in order that the mannequin is aware of the best way to acknowledge the “finish of sentence” and the PAD
token to pad shorter traces to match the size of longer ones, because the LlamaTokenizer
is understood to have points with this.
tokenizer = LlamaTokenizer.from_pretrained(base_model_id, use_fast=False,
trust_remote_code=True,
add_eos_token=True)if tokenizer.pad_token is None:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
# set the pad token to point that it is the end-of-sentence
tokenizer.pad_token = tokenizer.eos_token
Our tokenizer is configured, which suggests it’s now time to tokenize our coaching information!
tokenized_train_dataset=[]
for phrase in train_dataset:
tokenized_train_dataset.append(tokenizer(phrase['text']))
We’re one step away from coaching the mannequin! We have to allow gradient checkpointing to commerce computation time for decrease reminiscence utilization throughout coaching. We then setup our LoRA
configuration to cut back the variety of trainable parameters which might considerably cut back the reminiscence and time required for fine-tuning. LoRA
works by decomposing the massive matrix of the pre-trained mannequin into two smaller low-rank matrices within the consideration layers which drastically reduces the variety of parameters that have to be fine-tuned. Discuss with the LoRA documentation to be taught extra in regards to the parameters and use instances.
mannequin.gradient_checkpointing_enable()
mannequin = prepare_model_for_kbit_training(mannequin)config = LoraConfig(
# rank of the replace matrices
# Decrease rank leads to smaller matrices with fewer trainable params
r=8,
# impacts low-rank approximation aggressiveness
# rising worth accelerates coaching
lora_alpha=64,
# modules to use the LoRA replace matrices
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"gate_proj",
"down_proj",
"up_proj",
"o_proj"
],
# determines LoRA bias kind, influencing coaching dynamics
bias="none",
# regulates mannequin regularization; rising might result in underfitting
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
mannequin = get_peft_model(mannequin, config)
It’s lastly time to coach our Llama-2 mannequin on our new information (yay!). We’ll be utilizing the Transformers
library to create a Trainer
object for coaching the mannequin. The Coach
takes the pre-trained mannequin (Llama-2 7B chat), coaching datasets, coaching arguments (outlined under), and information collator as enter.
Coaching time will depend on the dimensions of the coaching information, variety of epochs and the configuration of the GPU used. If you happen to use the pattern Hawaii wildfire dataset supplied and run the pocket book on Google Colab’s T4 GPU, then it ought to take round 1 hour half-hour to finish coaching for 3 epochs.
If you’re fine-tuning in your personal information, we extremely advocate you to change the coaching parameters, notably studying fee and variety of epochs, to attain the nice efficiency of the fine-tuned mannequin. Whereas doing this, watch out for overfitting!
Take into account that rising the studying fee would possibly result in quicker convergence, nevertheless it would possibly overshoot the optimum resolution. Conversely, a decrease worth might end in slower coaching however higher fine-tuning. Additionally, rising the variety of epochs might enable the mannequin to be taught extra from the info, however this will result in overfitting.
coach = transformers.Coach(
mannequin=mannequin, # llama-2-7b-chat mannequin
train_dataset=tokenized_train_dataset, # coaching information that is tokenized
args=transformers.TrainingArguments(
output_dir="./finetunedModel", # listing the place checkpoints are saved
per_device_train_batch_size=2, # variety of samples processed in a single ahead/backward move per GPU
gradient_accumulation_steps=2, # [default = 1] variety of updates steps to build up the gradients for
num_train_epochs=3, # [IMPORTANT] variety of instances of full move by the complete coaching dataset
learning_rate=1e-4, # [IMPORTANT] smaller LR for higher finetuning
bf16=False, # practice parameters with this precision
optim="paged_adamw_8bit", # use paging to enhance reminiscence administration of default adamw optimizer
logging_dir="./logs", # listing to save lots of coaching log outputs
save_strategy="epoch", # [default = "steps"] retailer after each iteration of a datapoint
save_steps=50, # save checkpoint after variety of iterations
logging_steps = 10 # specify frequency of printing coaching loss information
),# use to type a batch from an inventory of components of train_dataset
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, multi level marketing=False),
)
# if use_cache is True, previous key values are used to hurry up decoding
# if relevant to mannequin. This defeats the aim of finetuning
mannequin.config.use_cache = False
# practice the mannequin primarily based on the above config
coach.practice()
If you happen to’ve reached this far, congratulations! You’ve efficiently fine-tuned Llama 2 by yourself information. Now, let’s load the finetuned mannequin utilizing the BitsAndBytesConfig
we used beforehand. Guarantee to decide on the mannequin checkpoint with the least coaching loss.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig,LlamaTokenizer
from peft import PeftModelbase_model_id = "meta-llama/Llama-2-7b-chat-hf"
nf4Config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = LlamaTokenizer.from_pretrained(base_model_id, use_fast=False,
trust_remote_code=True,
add_eos_token=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id, #similar as earlier than
quantization_config=nf4Config, #similar quantization config as earlier than
device_map="auto",
trust_remote_code=True,
use_auth_token=True
)
tokenizer = LlamaTokenizer.from_pretrained(base_model_id, use_fast=False,
trust_remote_code=True)
modelFinetuned = PeftModel.from_pretrained(base_model,
"finetunedModel/checkpoint-1455")
It’s now time to check your private fine-tuned Llama 2! Beneath are some instance responses when requested questions in regards to the some particulars of the 2023 Hawaii Wildfires.
### ENTER YOUR QUESTION BELOWquery = "Simply reply this query: Inform me in regards to the position of Maui Emergency Administration Company (MEMA) within the 2023 wildfires??"
# Format the query
eval_prompt = f"{query}nn"
promptTokenized = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
modelFinetuned.eval()
with torch.no_grad():
print(tokenizer.decode(modelFinetuned.generate(**promptTokenized, max_new_tokens = 1024)[0], skip_special_tokens=True))
torch.cuda.empty_cache()
Simply reply this query: Inform me in regards to the position of Maui Emergency Administration Company (MEMA) within the 2023 wildfires??MEMA performed a vital position in coordinating and facilitating the response to the 2023 wildfires. MEMA’s main position is to coordinate and facilitate the response to emergencies and disasters in Maui County.
Within the case of the 2023 wildfires, MEMA’s duties included:
1. Coordinating with different emergency response companies, such because the Maui Police Division, Maui Hearth Division, and the Hawaii Nationwide Guard, to make sure a unified response.
2. Activating the Maui Emergency Operations Heart (EOC), which served because the centralized location for coordination of response and help actions.
3. Offering situational stories and knowledge to the County of Maui Mayor’s workplace and different stakeholders.
4. Coordinating with media retailers to make sure correct and well timed dissemination of knowledge to the general public.
5. Managing and coordinating volunteer and donated assets.
6. Coordinating with different authorities companies, such because the Federal Emergency Administration Company (FEMA), the American Crimson Cross, and different help companies.
7. Offering help to emergency shelters and different response efforts.
8. Coordinating with the Maui County Company Counsel’s workplace to make sure authorized help for emergency response and restoration efforts.
9. Coordinating with the Maui Police Division’s Wailuku and Lanai districts to make sure help for the Wailuku and Lanai communities.
10. Coordinating with the Maui Hearth Division’s Wailuku and Lanai stations to make sure help for the Wailuku and Lanai hearth models.
In abstract, MEMA performed a vital position in coordinating and facilitating the response to the 2023 wildfires in Maui County. MEMA’s duties included coordinating with different emergency response companies,
activating the Maui EOC, offering situational stories and knowledge, managing and coordinating volunteer and donated assets,
and offering help to emergency shelters and different response efforts.
One other instance:
# Consumer enters query under
user_question = "When did the Hawaii wildfires happen?"# Format the query
eval_prompt = f"Query: {user_question}. Simply reply this query precisely and conciselynn"
promptTokenized = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
modelFinetuned.eval()
with torch.no_grad():
print(tokenizer.decode(modelFinetuned.generate(**promptTokenized, max_new_tokens = 1024)[0], skip_special_tokens=True))
torch.cuda.empty_cache()
Query: When did the Hawaii wildfires happen?. Simply reply this query preciselyReply: The Hawaii wildfires came about from August 8, 2023 to August 12, 2023.
We are able to see from the above examples that the mannequin performs very effectively and demonstrates a powerful understanding of in regards to the 2023 Wildfire incident!
This brings us to the top of the tutorial! Be at liberty to tinker round with the pocket book and fine-tune your private Llama-2 chatbot in your personal information and have enjoyable 🙂
Credit