Tensors: Language’s Geometric Foundation


When Albert Einstein sought to describe gravity as the curvature of spacetime, he confronted a representational crisis. Time and distance, once considered absolute, proved relative to each observer’s frame of reference. How could physics describe something objective when its basic measurements depended on who was measuring? The answer arrived through the work of Italian mathematicians Gregorio Ricci-Curbastro and Tullio Levi-Civita: tensors. A century later, these same mathematical objects would become the computational substrate of large language models.

Tensors are not merely arrays of numbers. While they can be represented as scalars, vectors, matrices, or higher-dimensional grids, their defining property is geometric invariance. A tensor captures information that transforms in predictable ways across coordinate systems while preserving underlying quantities—areas, volumes, proper times, physical invariants. As Dionysios Anninos of King’s College London notes, tensors are “the most efficient packaging device we have to organize equations” and “the natural language for geometric objects.” Einstein used this property to condense what would have been sixteen interdependent equations into a single tensorial statement relating matter to the fabric of spacetime.

The parallel to modern AI is structural, not metaphorical. Transformer architectures—the foundation of models like GPT, Claude, and their successors—are tensor algebra made computational. Pedro Domingos’s recent work on tensor logic demonstrates that “essentially all neural networks can be constructed using tensor algebra” and provides explicit tensor equations implementing the attention mechanism at the core of LLMs. The embedding matrix that converts words to vectors, the attention weights that determine which tokens influence others, the forward pass through multiple layers—each operation is a tensor contraction expressible through Einstein summation notation.

Consider the attention mechanism specifically. Given an embedding matrix X indexed by position and dimension, the model computes queries, keys, and values by multiplying X with learned weight tensors. The comparison between query at position p and key at position p′ yields attention weights through a normalized dot product. The final attention output aggregates value vectors weighted by these comparisons. Each step is a tensor operation: joins along shared indices followed by projections that sum out the contracted dimensions. Without tensor formalism, implementing this would require nested loops and manual index management. With tensors, the entire computation becomes expressible in a dozen equations that GPUs can execute in parallel.

The correspondence between Einstein’s problem and the transformer’s goes deeper than notation. Einstein needed descriptions that remained valid regardless of an observer’s motion or position. Transformers need representations of meaning that remain stable regardless of where words appear in a sequence or how the same concept is phrased. The physical invariant in relativity is objective reality; the semantic invariant in language models is meaning itself. These are categorically different types of invariance: physical invariants like mass-energy are precisely quantifiable under Lorentz transformations, while semantic invariants resist such formalization—”meaning” cannot be measured with the precision of a four-vector. Yet both domains require mathematical frameworks that preserve structure across transformations, and tensors provide this framework at the appropriate level of abstraction for each. When a transformer processes “The cat sat on the mat” and “On the mat, the cat sat,” the tensor operations extract equivalent semantic representations despite different surface structures—not because meaning is identical to mass-energy, but because both problems demand coordinate-independent packaging of relational information.

This isn’t coincidental engineering. The success of deep learning depends fundamentally on expressing complex functions as compositions of tensor algebra amenable to parallelization. Modern GPUs are optimized for tensor operations—matrix multiplications, contractions, broadcasts. The entire scaling paradigm of LLMs, from billions to trillions of parameters, rests on the computational tractability that tensor formalism provides. A model that couldn’t express its computations as tensor operations couldn’t leverage the hardware that makes training and inference feasible at scale.

Other mathematical frameworks address problems of structure preservation and transformation. Geometric algebra offers alternative representations for rotations and reflections. Category theory provides abstract machinery for compositional reasoning. Graph neural networks encode relational structure through mechanisms not purely reducible to tensor contractions. Yet tensors uniquely combine properties essential to both relativistic physics and language models: they are differentiable (enabling gradient-based learning), composable (enabling deep layering), parallelizable (enabling GPU acceleration), and expressively complete for the multilinear operations that attention requires. The convergence is not proof of necessity, but it is strong evidence of adequacy—tensors emerged independently in physics, linear algebra, and machine learning because each domain encountered problems they happen to solve well.

Domingos’s tensor logic work suggests implications extending beyond current architectures. He demonstrates that logical rules and Einstein summation are “essentially the same operation,” differing only in atomic data types. This observation opens paths toward systems that combine the scalability of neural networks with the reliability of symbolic reasoning. His framework allows reasoning in embedding space with controllable temperature parameters: at zero temperature, purely deductive inference with no hallucination risk; at higher temperatures, analogical reasoning that borrows inferences between similar concepts. The mathematical unity of neural and symbolic approaches through tensor algebra may address fundamental limitations of current LLMs.

Einstein famously begged a friend for help understanding tensors, fearing he was losing his mind. He persisted because the physics demanded it—there was no other way he could find to describe gravity objectively. A similar practical necessity operates in AI. The representational demands of language processing—variable-length inputs, contextual relationships, compositional meaning—require mathematical machinery that preserves structure while enabling transformation. Tensors provide this machinery. They may not be the only possible foundation, but they are a natural one—perhaps optimal given current hardware and algorithmic constraints—for the geometric problem that language understanding presents.

The tools shape the thoughts they enable. Newton’s calculus made dynamics tractable. Maxwell’s vector notation made electromagnetism comprehensible. Tensors made both relativity and transformers possible. In each case, the notation didn’t just describe what was already known—it revealed what could be computed, extended, and eventually understood. Whether tensors will remain AI’s dominant mathematical language or yield to some future formalism remains open. What is certain is that they have already enabled a generation of capabilities that seemed impossible without them.


Não é conteúdo sobre tecnologia. É tecnologia repensando conteúdo. – por MBi

Pesquisa & Artigos

OpenAI: A Conta Chegou

Em maio de 2024, Sam Altman disse em Harvard que a combinação de iA com anúncios era algo “uniquely unsettling”. Em outubro do mesmo ano,

A iA Não Rouba Criatividade. Apenas Torna Opcional.

O Espelho da Média Entro em qualquer cafeteria de São Paulo e reconheço o lugar antes de olhar a fachada. Madeira de demolição, luminárias pendentes

A iA Não Matará Consultoria. Revelará Onde Estava o Valor Real

A pergunta “a iA substituirá a consultoria estratégica?” está mal formulada. Encapsula uma falsa dicotomia entre apocalipse setorial e imunidade completa, quando a evidência aponta

Renato Kim Panelli

Renato Kim Panelli
R

Empreendedor e engenheiro com mais de 25 anos de experiência integrando tecnologia, estratégia de negócios e inovação. Combina expertise técnica em engenharia de materiais com formação em administração pela Babson College (MBA) e conhecimento jurídico através de graduação em direito.

Fundou a MBi – Mind Blowing Innovative, especializada em soluções baseadas em IA e estratégias de dados para transformação de negócios. Histórico comprovado em liderança de P&D, tendo gerenciado portfólios superiores a $250.000 anuais e desenvolvido produtos que geraram receitas acima de $15 milhões.

Pesquisador com publicações e patentes em tecnologia automotiva, com expertise em metalurgia do pó, planejamento estratégico e design de algoritmos.