DEV Community

Cover image for Phi-1.5: Microsoft's 1.3B Parameters AI Model Beats Llama 2
Anand Das for Bito

Posted on

Phi-1.5: Microsoft's 1.3B Parameters AI Model Beats Llama 2

Microsoft's recent unveiling of the Phi-1.5 AI model has sent ripples throughout the tech community. Its ability to match or even surpass larger models has made it a hot topic of conversation. This article delves into Phi-1.5's capabilities, how it differs from other models, and why it's generating so much buzz.

Introducing Phi-1.5: Small Size, Big Impact

Microsoft's Phi-1.5 is a groundbreaking language model boasting 1.3 billion parameters. What's impressive is its performance on tasks like common sense reasoning and coding, which is comparable to models 5-10 times its size.

Trained on a massive dataset of 30 billion tokens, the core of its training comprised synthetically generated "textbook-style" data, concentrating on general knowledge and common sense.

Key Features:

  • Robust performance on benchmarks such as WinoGrande, ARC, and BoolQ.
  • Demonstrated expertise in multi-step reasoning tasks like math word problems and coding.
  • Exhibits capabilities like thinking step-by-step and executing simple coding prompts.

Read the Research Paper: Textbooks Are All You Need II: phi-1.5 technical report


Benchmark results

How does Phi-1.5 stack up against heavyweights in the AI domain?

1. Common Sense Reasoning Benchmarks

WinoGrande ARC-Easy ARC-Challenge BoolQ SIQA
Vicuna-13B (v1.1) 0.708 0.754 0.432 0.835 0.437
Llama2-7B 0.691 0.763 0.434 0.779 0.480
Llama-7B 0.669 0.682 0.385 0.732 0.466
MPT-7B 0.680 0.749 0.405 0.739 0.451
Falcon-7B 0.662 0.719 0.363 0.685 0.452
Falcon-rw-1.3B 0.607 0.633 0.282 0.632 0.405
OPT-1.3B 0.610 0.570 0.232 0.596
GPT-Neo-2.7B 0.577 0.611 0.274 0.618 0.400
GPT2-XL-1.5B 0.583 0.583 0.250 0.618 0.394
phi-1.5-web-only (1.3B) 0.604 0.666 0.329 0.632 0.414
phi-1.5-web (1.3B) 0.740 0.761 0.449 0.728 0.530
phi-1.5 (1.3B) 0.734 0.756 0.444 0.758 0.526

2. Language Understanding and Knowledge Benchmarks

PIQA Hellaswag MMLU OpenbookQA SQUAD (EM)
Vicuna-13B 0.774 0.578 0.330
Llama2-7B 0.781 0.571 0.453 0.314 0.67
Llama-7B 0.779 0.562 0.352 0.284 0.60
MPT-7B 0.789 0.571 0.268 0.314 0.60
Falcon-7B 0.794 0.542 0.269 0.320 0.16
Falcon-rw-1.3B 0.747 0.466 0.259 0.244
OPT-1.3B 0.690 0.415 0.240
GPT-Neo-2.7B 0.729 0.427 0.232
GPT2-XL-1.5B 0.705 0.400 0.224
phi-1.5-web-only (1.3B) 0.743 0.478 0.309 0.274
phi-1.5-web (1.3B) 0.770 0.484 0.379 0.360 0.74
phi-1.5 (1.3B) 0.766 0.476 0.376 0.372 0.72

3. Multi-Step Reasoning Benchmarks

GSM8K HumanEval MBPP
Llama-65B 50.9 23.7 37.7
Vicuna-13B 13.4
Llama2-7B 14.6 12.8 20.8
Llama-7B 11.0 11.4 17.7
MPT-7B 6.8 18.3 22.6
Falcon-7B 6.8 0 11.7
Falcon-rw-1.3B < 3 (random guessing) 0 0
OPT-1.3B < 3 0 0
GPT-Neo-2.7B < 3 6.41
GPT2-XL-1.5B < 3 0 0
phi-1.5-web-only (1.3B) < 3 17.2 27.3
phi-1.5-web (1.3B) 44.6 (via coding) 41.4 43.5
phi-1.5 (1.3B) 40.2 (via coding) 34.1 37.7

These benchmarks paints a clear picture that Phi-1.5 is a contender even against models with much larger parameter sizes.


What Makes Phi-1.5 Special?

1. Data Quality Over Quantity:

One of the standout features of Phi-1.5 is its focus on high-quality training data. Instead of sheer volume, Microsoft emphasized the significance of using "textbook-style" data for training.

2. Enhanced with Filtered Web Data:

Apart from its primary training, the model has a sibling named phi-1.5-web. This version, augmented with filtered web data, showed even more promising results across multiple benchmarks.

3. Not Just About Size:

Size isn't everything. While Phi-1.5 has only 1.3 billion parameters, it consistently matches or outperforms models many times its size. This breakthrough has dispelled the myth that bigger is always better in the world of AI.


Areas for Further Exploration

While Phi-1.5 represents a significant leap in model efficiency, there are some unanswered questions:

  • How will it perform outside research environments?
  • Despite its prowess in reasoning, can it truly match human-like thinking?

The model's real-world applicability and flexibility remain to be tested extensively.


The Potential Future of AI Models

Microsoft's Phi-1.5 presents a compelling case for the AI community. It challenges the age-old belief of "bigger is better", proving that with the right kind of training data, even smaller models can achieve wonders.

This introduces the exciting possibility of a more environmentally sustainable AI, given the vast amounts of energy required to train large models.


Conclusion

In a world where data is constantly expanding, Microsoft's Phi-1.5 has redefined what's possible with AI. It's not just about having more data or a bigger model; it's about using the right kind of data effectively.

As Phi-1.5 continues to be tested and refined, one thing is clear: the future of AI looks promising, efficient, and more accessible to a wider audience.

Top comments (0)