A London-based startup has trained an artificial intelligence model on 4.1 million recipes written in seven different languages, then compressed the entire dataset into a single 2-megabyte file. The achievement suggests that vast amounts of culinary knowledge can be distilled into a tiny digital footprint, raising questions about how much data is really needed for AI to understand cooking patterns.
The recipe trove
The startup gathered recipes from online sources, cookbooks, and user submissions across English, Spanish, French, German, Italian, Chinese, and Arabic. That’s 4.1 million distinct dishes — from a simple omelet to elaborate multi-step feasts. The model learned ingredient combinations, cooking techniques, and regional variations without memorizing every single instruction.
Instead of storing each recipe as a separate entry, the AI found the underlying structures: which spices pair with which proteins, how heat changes textures, what substitutions are common. The 2MB file holds the compressed essence of that knowledge.
Why size matters
Most large language models today require gigabytes or even terabytes of storage. A 2MB model that captures the same breadth of information could run on a smartphone, a smart appliance, or even a low-power chip. That opens up new possibilities — a fridge that suggests dinner based on what’s inside, or a portable device that translates a foreign recipe in real time.
The startup declined to share the exact architecture or training method, but said the compression was achieved by pruning redundant patterns and encoding only the most efficient representations of culinary logic. The model isn’t just a lookup table; it can infer missing ingredients or adjust portion sizes.
What it might be used for
So far the company hasn’t announced a product. Potential applications include personalized meal planning, cross-cultural recipe translation, nutritional analysis, and automated cooking assistants. A 2MB model could be embedded into kitchen gadgets without needing a cloud connection, meaning faster responses and better privacy.
But there’s a catch. The model was trained on 4.1 million recipes, yet the compressed version’s performance hasn’t been publicly benchmarked. Does it generate novel recipes as well as it classifies existing ones? Can it handle a cuisine it barely saw in training? The startup hasn’t released test results.
For now, the team is refining the approach and looking for partners in the food-tech industry. Whether the 2MB model will end up in a smart oven, a recipe app, or stay a research curiosity is the open question they’ll need to answer next.




