Orca-Math is a 7B-sized language mannequin (SLM) based mostly on the Mistral-7B. It achieves an accuracy charge of 86.81% on the GSM8k dataset with out requiring a number of mannequin calls, verifiers, or exterior instruments. The important thing components of Orca-Math’s strategy are:
- A high-quality artificial dataset of 200,000 math issues created utilizing a multi-agent setup the place brokers collaborate to generate the info.
- An iterative studying method that permits the SLM to apply fixing issues, obtain suggestions on its options, and study from choice pairs that incorporate the SLM’s options and the suggestions.
The dataset is out there at HuggingFace.
Beneficial Studying [Papers Explained 160: Orca] [Papers Explained 161: Orca-2]
Seed set
A complete of 36,217 math phrase issues are collected from current open-source datasets, particularly NumGLUE, AddSub, ALGES, ASDiv, DRAW, GSM8k, MATHQA, MultiArith, SingleOP, and SingleEQ.
Agent — Ask Me Something
The seed set is expanded by creating a number of phrase issues from every drawback within the set, utilizing the next immediate:
This agent creates a complete of 120,445 new issues. The options to those phrase issues are generated utilizing GPT4-Turbo.
Agent — Suggester & Editor
The seed set is additional expanded by growing difficult issues, utilizing two new brokers, specifically Suggester and Editor. The Suggester examines a particular drawback and proposes a number of strategies for enhancing its complexity with out creating the precise drawback. Subsequently, the Editor takes the unique phrase drawback and the Suggester’s suggestions to generate an up to date, more difficult drawback.
Two rounds of iterations are carried out per drawback. Every spherical includes utilizing the GPT-4 Turbo mannequin to generate a response.If the generated reply exceeds 1800 characters, it’s filtered out. The method resulted in 37,157 issues.
DMath
Moreover, 6,216 issues sourced from DMath are additionally included. These issues signify a subset of the 7,943 issues current within the DMath coaching set, by which the answer computed by GPT4-Turbo aligns with the exact gold-standard reply.
Supervised Positive-Tuning Experiment (Iteration #1)
Mistral-7B is fine-tuned on the Orca-Math-200K dataset for one epoch with out utilizing packing. The loss is computed solely on the reply tokens. The info is offered within the following instruction format:
Iterative Studying from each Constructive and Unfavourable Indicators
Dataset Building Iteration #2
To generate extra optimistic and unfavorable options for every drawback, 4 responses from the SFT-tuned mannequin (top_p = 0.95 and temperature = 0.7) from iteration #1 are sampled. Subsequently, GPT4-Based mostly-Actual-Match is employed to evaluate the alignment between the trainer’s (GPT4-Turbo) reply and the coed’s reply. For all options the place the student-generated reply doesn’t match the trainer’s reply, are labeled as unfavorable; in any other case, optimistic. A choice dataset is then constructed.
Dataset Building Iteration #3
Let M2 denote the mannequin skilled with KTO on the dataset constructed for Iteration #2. The identical process for the development of dataset is replicated for Iteration #3; nonetheless, M2 is used to generate the 4 responses as a substitute of the SFT-tuned mannequin from iteration #1.
Mistral-7B is fine-tuned for as much as three iterations. Within the first iteration, supervised fine-tuning is used to acquire M1. For the second iteration, SFT, DPO, and KTO are in contrast. The mannequin skilled with KTO performs higher on this group, known as M2. M2 is then used to generate the dataset for iteration #3. Within the third iteration, DPO and KTO are in contrast, with M2 serving as the place to begin. These fashions are additionally in contrast in opposition to three epochs of SFT coaching on the Orca-Math-200K dataset.
Efficiency Towards different LLMs
- The mannequin exceeds a lot larger fashions like LLAMA-2–70B (56.8%) , WizardMath-70B (81.6%), Gemini Professional (86.5% with 32 trials) and GPT-3.5 (77.4%).
- Most notably it could actually attain this stage with solely 200K examples (orders of magnitude lower than different datasets).
Orca-Math: Unlocking the potential of SLMs in Grade College Math 2402.14830
Beneficial Studying [Orca Series] [Small LLMs]