LLaMA language model tamed by ancient Windows 98 computer with 128MB RAM

In brief: A group of artificial intelligence researchers has demonstrated running a powerful AI language model on a Windows 98 machine. And we’re not talking about just any old PC, but a vintage Pentium II system with a mere 128MB of RAM. The team behind the experiment is EXO Labs, an organization formed by researchers and engineers from Oxford University.

In a video shared on X, EXO Labs fired up a dusty Elonex Pentium II 350MHz system running Windows 98. Instead of playing Minesweeper or browsing with Netscape Navigator, the PC was put through its paces with something far more demanding: running an AI model.

This model was based on Andrej Karpathy’s Llama2.c code. Against all odds, the computer managed to generate a coherent story on command. It did it at a decent speed, too, which is usually difficult with AI models run locally.

Pace is already a big enough challenge, but another hurdle the team had to overcome was getting modern code to compile and run on an operating system from 1998.

Eventually, they managed to sustain a performance of 39.31 tokens per second running a Llama-based LLM with 260,000 parameters. Cranking up the model size significantly reduced the performance, though. For instance, the 1 billion parameter Llama 3.2 model barely managed 0.0093 tokens per second on the vintage hardware.

As for why the team is trying so hard to run AI models that typically require powerful server hardware on such ancient machines, the goal is to develop AI models that can run on even the most modest of devices. EXO Labs’ mission is to “democratize access to AI” and prevent a handful of tech giants from monopolizing this game-changing technology.

For this, the company is developing what it calls the “BitNet” – a transformer architecture that uses ternary weights to drastically reduce model size. With this architecture, a 7 billion parameter model needs just 1.38GB of storage, making it feasible to run on most budget hardware.

Related reading: Meet Transformers: The Google Breakthrough that Rewrote AI’s Roadmap

BitNet is also designed to be CPU-first, avoiding the need for expensive GPUs. More impressively, the architecture can leverage a staggering 100 billion parameter model on a single CPU while maintaining human reading speeds at 5-7 tokens per second.

If you’re keen to join the locally-run models revolution, EXO Labs is actively seeking contributors. Just check out the full blog post to get a better idea of the mission.

Related Content

Rogers receives highest number of CCTS complaints for the second consecutive year

OnePlus 13R now available in Canada for $849.99

OnePlus Open 2 might be the thinnest foldable yet

Leave a Comment