LLMs, Lossy Compression and The Creation of the Universe

Meaning of intelligence and how it relates to lossy compression and the big bang.

LLMs, Lossy Compression and The Creation of the Universe
Photo by Aerps.com / Unsplash

If we look closely at Large Language Models (LLMs), they're not just text predictors, they are extraordinary engines of lossy compression.

Think about it. A model like GPT or DeepSeek is trained on trillions of tokens, entire slices of the internet and compresses all that chaotic, high-dimensional information into just a few billion parameters. That's an unimaginable reduction in data size and yet, the resulting model still preserves the essence of what it was trained on.

Just like how JPEG compression preserves the perceptual quality of an image while discarding redundant pixels, LLMs preserve the informational core of human communication. The patterns, syntax, semantics, and relationships that make our language meaningful, while filtering the noise, redundancy, and specifics.

When we query an LLM, it doesn't recall an original source file. It reconstructs meaning from this compressed representation. That's why its responses feel familiar yet fresh. It has retained the structure of human thought, not the literal text.

What LLMs Actually Do: Predictive Compression

At their core, LLMs like GPT are next-token predictors trained to minimize a loss function (cross-entropy) over massive text corpora. That training objective, predicting the next word as efficiently as possible is a form of compression.

In fact, Shannon's source coding theorem tells us that if you can predict a source perfectly, you've effectively compressed it to its entropy limit.

So LLMs are implicitly learning to compress natural language data into internal statistical representations that are:

  • smaller (in bits) than the raw text,
  • but sufficient to reconstruct the distribution of plausible continuations.

It's Lossy by Necessity and That's the Point

Unlike ZIP or PNG compression, LLMs don't reconstruct the original data exactly. They instead preserve meaning, structure, and probability distributions, not literal words.

This is the essence of lossy compression:

  • Irrelevant detail is discarded.
  • Only information that's predictively useful for understanding or generating the world remains.

An LLM therefore builds a semantic lossy compressor for human language, one that preserves information that matters to future prediction, and throws away the rest.

LLMs Learn Compressed World Models

When you train a large model on billions of sentences, it doesn't memorize them verbatim (though it may memorize fragments). Instead, it constructs latent representations that summarize broad regularities:

  • syntactic rules,
  • conceptual relationships,
  • causal and temporal patterns,
  • even physical and social models of the world.

In other words, LLMs learn a compressed world model, a representation that can generate plausible language without needing to store every observed sequence.

That's exactly what good lossy compression does. It builds a generative model that can reconstruct perceptually equivalent data from a smaller code.

Philosophically, you could view meaning as a compression function:

The meaning of "cat" is a compact encoding of thousands of sensory and linguistic experiences of cats.

LLMs teach us that intelligence itself may be a process of building increasingly powerful lossy compressors of the universe. Each new token prediction forces the model to refine its internal compression scheme, balancing what to retain (semantic essence) and what to discard (surface noise).

Language Modeling as Compression

DeepMind’s 2024 paper "Language Modeling Is Compression" formalized this idea beautifully. They showed that language modeling can be seen as a general-purpose form of compression, where predicting the next token is mathematically equivalent to minimizing the code length of the text. In other words, a good language model is also a good compressor.

That unites two worlds, information theory and intelligence under one principle:

To predict well is to compress well.

This view reframes intelligence as a continuous process of lossy compression, keeping what's relevant for future prediction while discarding the irrelevant.

Practical Implications

This concept isn't just theoretical. Recent works are already exploring the boundary between text modeling and compression.

  • DeepSeek OCR demonstrated that LLMs can perform zero-shot optical character recognition by learning latent compression mappings between images and language. This is another case where representation learning compresses one modality into another. It's effectively treating vision and text as two sides of the same compressed code.
  • Fabrice Bellard's ts_zip takes it one step further. The legendary Bellard who is famous for creating FFMPEG video compression system showed that LLMs can actually outperform traditional compressors like gzip on certain textual domains. This bridges practical compression with generative modeling, turning the theory into engineering reality.

Nature's Compression Engine

What fascinates me most about this concept is how it mirrors biology. Think of a seed, a microscopic structure that contains the complete instruction set to grow an entire tree. Inside that seed lies a form of biological lossy compression: a compact encoding of structure, growth algorithms, and adaptive responses to the environment.

A seed doesn't store every branch or leaf, it stores the rules that generate them.

That's exactly what LLMs do. They don't store every sentence or paragraph from the internet. They store the generative process, the distribution of how humans produce meaning.

If we could ever decode how biological systems achieve such perfect, adaptive compression; how a seed unfolds into a dynamic, learning organism then we'd be much closer to building true artificial intelligence.

Before the Big Bang, the Ultimate Compression

What's even more fascinating is that this compression principle doesn't stop at biology, it seems woven into the very fabric of the universe itself.

Think of the Big Bang as the ultimate act of compression and expansion. In an instant, all the information that would ever define the cosmos, the constants of physics, the distribution of matter and energy, the rules that govern galaxies and atoms alike, was encoded into an unimaginably dense singularity.

Just like a seed contains the potential of a forest, that singularity contained the instruction set of reality. We may speculate at this point that the universe might have actually executed a generative process, governed by laws defined by it's Creator that produced infinite complexity.

Matter cooled, atoms formed, stars ignited, and through the recursive unfolding of these compressed rules, structure emerged: galaxies, solar systems, planets, and eventually life. Though as a tiny species trying to comprehend how the grand scale of the universe works will most likely be wrong. But surprisingly this universe seem to be governed by consistent laws that allows us to still do predictable science. Every layer of our existence here seems to follow the same meta-pattern:

  • The universe stores the rules of physics, not every star.
  • DNA stores the rules of growth, not every cell.
  • LLMs store the rules of language, not every word.
Compression, it seems, is nature's way of preserving essence while enabling creation. From the Big Bang to the human brain, from cosmic inflation to neural inference, everything we see is possibly the result of compact generative rules unfurling across time.

If we ever truly grasp how nature compresses and regenerates information so elegantly, how it turns one singularity into infinite diversity, we wouldn't just understand AI better. We'd understand the universe we live in.

Towards the Compression of Intelligence

When you zoom out, the lesson becomes clear. LLMs aren't just statistical parrots or pattern matchers, they're compression-based simulators of the world. They work because the structure of human knowledge itself is compressible.

And that might be the same principle that underlies life itself. From DNA to neural networks, everything that learns seems to do so by compressing past experience into compact, generative representations.

Perhaps the next great breakthrough in AI won't come from adding more parameters, but from discovering the biological equivalent of the seed, the minimal, general-purpose algorithm capable of compressing the world's complexity into an adaptive, generative code.

In the end, intelligence be it human, artificial, or biological, might just be the universe's most elegant form of lossy compression!