Papers Explained 163: Orca Math. Orca-Math is a 7B-sized language model… | by Ritvik Rastogi | Jul, 2024

Orca-Math is a 7B-sized language mannequin (SLM) based mostly on the Mistral-7B. It achieves an accuracy charge of 86.81% on the GSM8k dataset with out requiring a number of mannequin calls, verifiers, or exterior instruments. The important thing components of Orca-Math’s strategy are:

A high-quality artificial dataset of 200,000 math issues created utilizing a multi-agent setup the place brokers collaborate to generate the info.
An iterative studying method that permits the SLM to apply fixing issues, obtain suggestions on its options, and study from choice pairs that incorporate the SLM’s options and the suggestions.

The dataset is out there at HuggingFace.

Beneficial Studying [Papers Explained 160: Orca] [Papers Explained 161: Orca-2]

Seed set

A complete of 36,217 math phrase issues are collected from current open-source datasets, particularly NumGLUE, AddSub, ALGES, ASDiv, DRAW, GSM8k, MATHQA, MultiArith, SingleOP, and SingleEQ.

Agent — Ask Me Something

The seed set is expanded by creating a number of phrase issues from every drawback within the set, utilizing the next immediate:

The Few shot examples of this immediate are truncated.

This agent creates a complete of 120,445 new issues. The options to those phrase issues are generated utilizing GPT4-Turbo.

Agent — Suggester & Editor

The seed set is additional expanded by growing difficult issues, utilizing two new brokers, specifically Suggester and Editor. The Suggester examines a particular drawback and proposes a number of strategies for enhancing its complexity with out creating the precise drawback. Subsequently, the Editor takes the unique phrase drawback and the Suggester’s suggestions to generate an up to date, more difficult drawback.

Two rounds of iterations are carried out per drawback. Every spherical includes utilizing the GPT-4 Turbo mannequin to generate a response.If the generated reply exceeds 1800 characters, it’s filtered out. The method resulted in 37,157 issues.

DMath

Moreover, 6,216 issues sourced from DMath are additionally included. These issues signify a subset of the 7,943 issues current within the DMath coaching set, by which the answer computed by GPT4-Turbo aligns with the exact gold-standard reply.

Supervised Positive-Tuning Experiment (Iteration #1)

Mistral-7B is fine-tuned on the Orca-Math-200K dataset for one epoch with out utilizing packing. The loss is computed solely on the reply tokens. The info is offered within the following instruction format:

Iterative Studying from each Constructive and Unfavourable Indicators

Dataset Building Iteration #2

To generate extra optimistic and unfavorable options for every drawback, 4 responses from the SFT-tuned mannequin (top_p = 0.95 and temperature = 0.7) from iteration #1 are sampled. Subsequently, GPT4-Based mostly-Actual-Match is employed to evaluate the alignment between the trainer’s (GPT4-Turbo) reply and the coed’s reply. For all options the place the student-generated reply doesn’t match the trainer’s reply, are labeled as unfavorable; in any other case, optimistic. A choice dataset is then constructed.

System immediate for GPT4-based-Actual-Match.

Dataset Building Iteration #3

Let M2 denote the mannequin skilled with KTO on the dataset constructed for Iteration #2. The identical process for the development of dataset is replicated for Iteration #3; nonetheless, M2 is used to generate the 4 responses as a substitute of the SFT-tuned mannequin from iteration #1.

Mistral-7B is fine-tuned for as much as three iterations. Within the first iteration, supervised fine-tuning is used to acquire M1. For the second iteration, SFT, DPO, and KTO are in contrast. The mannequin skilled with KTO performs higher on this group, known as M2. M2 is then used to generate the dataset for iteration #3. Within the third iteration, DPO and KTO are in contrast, with M2 serving as the place to begin. These fashions are additionally in contrast in opposition to three epochs of SFT coaching on the Orca-Math-200K dataset.

The efficiency of a number of iterative studying experiments and baselines on the GSM8k take a look at set.

Efficiency Towards different LLMs

The mannequin exceeds a lot larger fashions like LLAMA-2–70B (56.8%) , WizardMath-70B (81.6%), Gemini Professional (86.5% with 32 trials) and GPT-3.5 (77.4%).
Most notably it could actually attain this stage with solely 200K examples (orders of magnitude lower than different datasets).

Orca-Math: Unlocking the potential of SLMs in Grade College Math 2402.14830

Beneficial Studying [Orca Series] [Small LLMs]

Source link

Papers Explained 163: Orca Math. Orca-Math is a 7B-sized language model… | by Ritvik Rastogi | Jul, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

Unlocking the Power of OCR: Revolutionizing Invoice Automation | by Abhigyana Satpathy | Jul, 2024

Prompt Engineering is Dead: DSPy is the New Paradigm for Prompting | by Tech Insights Hub | Jun, 2024

Implementing Computer Vision with Python and OpenCV | by Muhammad Taha | Jul, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Papers Explained 163: Orca Math. Orca-Math is a 7B-sized language model… | by Ritvik Rastogi | Jul, 2024

Supervised Positive-Tuning Experiment (Iteration #1)

Iterative Studying from each Constructive and Unfavourable Indicators

Efficiency Towards different LLMs

Related Posts