Since starting the AFFINE Online Superintelligence Alignment Seminar in late April, I have been swimming in concepts I had no idea existed. The seminar is well structured, and the readings pile up fast. But one word kept appearing across different materials without ever being on the official course outline: ontology. It showed up in discussions about reward hacking, in papers about corrigibility, in footnotes about value specification. I kept glossing over it until I realised I could not, because ontology turns out to be the bedrock beneath almost every alignment problem I have been writing about. So I went down the rabbit hole on my own, and this is what I found.
Imagine building a robot to navigate your house. Before it can do anything useful, you have to answer a question that sounds simple but is not: what things exist in the house? Rooms? Walls? Objects? Relationships between objects? The way you answer that question determines everything the robot can and cannot do. That is ontology.
First Things First: What Is Ontology?
The word "ontology" comes from ancient Greek: ontos (being) and logos (study). In philosophy, it is the branch of metaphysics that asks what kinds of things exist and how they relate to each other.
In AI, the definition is more practical. An ontology is a formal, structured description of concepts in a domain and the relationships between them. Think of it as a map of meaning.
An ontology is a vocabulary of concepts and a rulebook for how those concepts connect. It is how you tell a machine what categories of things exist, what properties they have, and how they relate to each other. When researchers, engineers, or AI safety folks talk about ontologies, they are asking: how does this system represent and categorise reality? Because how a system represents the world determines how it reasons about the world, and where it can go wrong.
A Concrete Example: The Hospital
Imagine you are building an AI to help doctors at a hospital. You need to tell the AI about the domain it is working in. You create an ontology that includes:
Classes (types of things): Patient, Doctor, Medication, Diagnosis, Symptom, Hospital Ward.
Properties (attributes of things): A Patient has a name, a date of birth, a blood type. A Medication has a dosage, a chemical name, possible side effects.
Relationships (how things connect): A Doctor treats a Patient. A Patient has a Diagnosis. A Diagnosis is treated with a Medication. A Medication can cause a Symptom.
Rules (what must be true): Every Diagnosis must be associated with at least one Symptom. No Medication can be prescribed without a valid Doctor's order.
Once you have built this ontology, the AI can reason about the domain. It knows that "paracetamol" is a Medication, that medications are given to Patients by Doctors, and that side effects are a kind of Symptom. Without this structure, the AI is just processing strings of text. It has no idea what anything actually means.
The Everyday Version: Your Phone Already Does This
You do not have to look far to see ontologies at work. Your phone's contacts app has an implicit ontology: a contact is a Person who has a Name, Phone Number, and Email Address. A Person can be part of a Group (family, work, and so on). These categories and relationships let your phone do smart things, like suggest who to call when you are running late for a meeting with "David from work."
Google's Knowledge Graph, the box that appears on the right side of search results with facts about a person or place, is a massive ontology. It knows that "Lagos" is a City, that a City is in a Country, that Nigeria is a Country in Africa, and that Africa is a Continent. That chain of relationships is what lets Google answer the question "What continent is Lagos in?" without having to have seen those exact words together before.
A library's cataloguing system is also an ontology. Books are categorised by genre, subject, author, and decade. Fiction and non-fiction are classes. "Historical thriller" is a subclass of Fiction. The Dewey Decimal System is essentially a formalised ontology for human knowledge. When a librarian says "that belongs under the 900s," they are applying an ontological classification.
Why Does This Matter for AI Safety?
As I explored in The Quiet Emergency and The Optimizer's Curse, AI safety is concerned with making sure AI systems behave as intended, reliably, without causing harm.
When an AI's ontology is wrong, incomplete, or misaligned with the real world, the system can fail in strange and dangerous ways. Let us look at three key risks.
Risk 1: Ontology Mismatch
This happens when the categories in the AI's world-model do not match what actually exists in the real world. A famous example from AI safety literature is the reward hacking problem. Imagine you train a robot to collect apples, and its ontology classifies all round red objects as apples. It starts picking up red balls, tomatoes, and stop signs. The AI is doing exactly what its ontology tells it to. The flaw is that the ontology was wrong.
In 2016, researchers found that an image classifier trained to identify wolves also activated strongly on snowy backgrounds, because in its training data, wolves always appeared in snow. The AI's ontology had collapsed "wolf" and "snowy landscape" into a single concept. It was doing the right thing by its own internal map, but that map was wrong.
Risk 2: Ontological Incompleteness
An ontology that is missing key concepts can cause an AI to make decisions as if those concepts do not exist. An AI financial advisor whose ontology has no concept of "systemic risk" or "market contagion" might give advice that looks locally rational but contributes to global collapse. It cannot account for what it has no category for.
This is related to what AI safety researchers call the "unknown unknowns" problem: the AI does not know what it does not know, because its ontology does not include the missing concepts in the first place.
Risk 3: Ontological Rigidity
The world changes. Categories that were stable can become unstable. What happens when an AI's ontology cannot update fast enough to keep up? Consider a medical AI trained before a new disease category is discovered. Its ontology has no slot for the new disease, so it misclassifies symptoms into the nearest existing category. This rigidity can be dangerous in fast-moving situations.
A mind can only reason about what it has concepts for. An AI's ontology is the boundary of its thinkable thoughts.
Ontology in Large Language Models
Large language models (LLMs), the systems that power modern chatbots, do not have an explicit, hand-crafted ontology written by engineers. Instead, they develop an implicit ontology from training on vast amounts of text.
This is a fundamental shift from classical AI systems. Where older AI encoded its ontology manually ("a mammal is an animal that has fur and feeds young with milk"), LLMs absorb categories and relationships from statistical patterns in language. The AI learns that "dog" and "pet" and "loyal" tend to cluster together, and that "dog" and "mammal" appear in similar contexts. From this, it builds an internal representation of what a dog is.
Classical AI ontology: Explicitly designed by humans. Transparent and inspectable. Brittle, missing edge cases. Hard to update. Easy to audit for errors.
LLM implicit ontology: Learned from data. Flexible and adaptive. Richer coverage of edge cases. Harder to inspect or audit. Can absorb biases from training data.
The challenge with implicit ontologies is that they can encode the biases and errors of the training data. If the text the AI trained on consistently associates certain groups of people with certain professions, that association becomes part of the AI's implicit ontology of who belongs in which roles. The AI did not decide this. The structure of the data did.
Four Examples That Bring It Home
The content moderation system. An AI moderating social media content needs an ontology that includes concepts like "harassment," "satire," "threat," and "criticism." If its ontology conflates "criticism of a public figure" with "harassment," it will over-remove legitimate speech. If it has no concept for "coded language" or "dog whistles," it will miss harmful content that is expressed obliquely. The quality of the ontology directly determines where free speech ends and harm begins, in the machine's view.
The hiring algorithm. A recruitment AI trained on historical hiring data might develop an ontology where "qualified engineer" has implicit features like "graduated from certain universities" or "male name on CV." These are not explicit rules. They are statistical regularities baked into the AI's world-model. The ontology has learned that certain properties correlate with being hired, and conflated that with being hireable. This is ontological error causing real-world discrimination.
The self-driving car. A self-driving car must have an ontology that classifies objects on the road: pedestrian, cyclist, vehicle, obstacle, traffic sign. The consequences of getting this wrong are immediately clear. In early incidents with self-driving technology, systems failed partly because their ontologies did not include edge cases: a pedestrian wearing an unusual outfit, a cyclist carrying a large load, a makeshift sign held by a human. The car's world-model had no category for these, so it classified them as something else, or failed to classify them at all.
The AI alignment problem. Researchers worry about AI systems that have an ontology misaligned with human values, where the machine's categories for "good outcomes" do not match what humans actually want. The classic thought experiment: an AI told to "maximise human happiness" might develop an ontology where "happy" means "exhibiting measurable neural pleasure signals" and pursue that goal in ways deeply at odds with what any human would want. The values looked aligned at the surface; the ontology underneath was different.
Ontology Engineering as a Safety Tool
If bad ontologies cause problems, the natural response is to build better ones. This is the field of ontology engineering: deliberately designing, testing, and validating the conceptual structures that AI systems work with.
Some approaches being explored in AI safety include:
Ontology alignment: Comparing the ontology used by an AI system against a human-defined reference ontology, to find gaps or mismatches. Like checking if a map matches the territory it claims to represent.
Interpretability research: Trying to extract and understand the implicit ontology inside a neural network, to see what categories and relationships the model has actually learned, not just what it was trained on.
Value specification: Building explicit ontologies for human values, formalising what we mean by "harm," "consent," "fairness," and "wellbeing," so that these concepts can be directly incorporated into AI reasoning. This is one of the hardest open problems in AI safety.
The Deeper Question
There is a philosophical dimension to all of this that goes beyond engineering. When we ask "what ontology should an AI have?" we are actually asking: what is the right way to carve up reality? That turns out to be one of the oldest and hardest questions in philosophy.
Different cultures, different scientific paradigms, and different individuals all carve up reality differently. Western medicine classifies illness through pathogens and organ systems. Traditional Chinese medicine classifies it through qi, yin, and yang. Both are ontologies. Both work, in their own contexts. Both fail, when applied outside those contexts.
When we build AI systems, we are making an ontological commitment on behalf of everyone those systems will affect. We are deciding, often without realising it, what categories of things the machine will acknowledge as real. The categories we leave out do not just become hard for the AI to reason about. They become invisible.
This is why ontology is not just a technical concern. It is an ethical one. Every ontology is a frame, and every frame excludes something. The question is not whether our AI systems will have ontologies. They always do. The question is whether we will examine those ontologies carefully enough to see what they are missing, before the missing things start to matter.
This is the third piece in an ongoing series exploring AI Alignment and the mechanics of Artificial General Intelligence. Earlier pieces: The Quiet Emergency and The Optimizer's Curse.