102 – What is a model – The Conscious Look

What is a model? Explanatory power, prediction, and the art of knowing the limits

There is a map of the London Underground that has been reproduced more than a billion times. It hangs in train carriages and ticket halls, appears on tourist guides and phone screens, and is recognized by more people worldwide than almost any other piece of graphic design. It is also, by any conventional cartographic standard, wrong. The distances between stations bear no relationship to the actual distances underground. The curves and bends of the tunnels have been regularized into clean horizontal, vertical, and diagonal lines. The River Thames, which the trains cross at several points, is reduced to a mild decorative squiggle. And yet the map works. For the purpose for which it was designed — helping passengers navigate from one station to another — it is arguably one of the most successful maps ever made. Its precision is a direct consequence of its incompleteness.

This observation is not merely a curiosity about graphic design. It is a description of how all useful representations of the world operate. What we call a model — in the precise sense this series uses the word — is any internal or external representation that organizes the features of some part of the world, generates expectations about what will happen next within that part, and guides action accordingly. The Underground map is a model. Newton’s equations of motion are a model. Our understanding of a colleague’s personality is a model. The story we tell ourselves about who we are and what we value is a model. What distinguishes a useful model from a useless one is not its correspondence to every feature of the territory it represents — no model achieves that, and the ones that try to are generally the least useful. What distinguishes a useful model is something more specific: the combination of explanatory power, predictive power, grounding in deeper principles, and — the most neglected of the four — an honest account of where the model stops working.

This article examines each of these four properties in turn. Together they constitute a standard against which any claim to knowledge can be measured. Whether we are evaluating a scientific theory, a political argument, a personal belief, or our own most confidently held convictions, the same four questions apply: does this model explain the relevant observations? Does it make predictions that could be wrong? Is it grounded in something more fundamental? And does it know its own limits?

The first property: explanatory power

A model has explanatory power when it takes a range of observations that previously appeared unrelated or arbitrary and organizes them into a coherent account that makes them less surprising. Before Newton, the motion of a cannonball, the fall of an apple, and the orbit of the Moon were three separate phenomena, each requiring its own description. After Newton, they were three instances of a single underlying relationship between mass, distance, and force. The observations did not change. What changed was the framework within which they were placed — and with that change came the satisfying sense of understanding that marks genuine explanatory progress.

This feeling of understanding is worth examining carefully, because it can mislead us as easily as it can guide us. A conspiracy theory, a superstition, and a well-confirmed scientific law all produce the same subjective feeling of explanatory satisfaction — the sense that previously puzzling observations now make sense, that the pieces fit together, that we understand why things happened as they did. What distinguishes the conspiracy theory from the scientific law is not the feeling of understanding but the quality of the explanatory structure: whether it genuinely organizes the observations it claims to explain, whether it does so more efficiently than competing explanations, and whether it would work equally well if the observations had been different.

That last criterion is one of the most useful tests available. A model with genuine explanatory power would not explain just any observations equally well — it would explain this particular set of observations, and a different model would be needed to explain a different set. A model that can explain everything with equal facility, regardless of what the data shows, explains nothing in any meaningful sense. The astrological chart that reveals the client’s personality after being told which sun sign they are does not demonstrate explanatory power; it demonstrates the human capacity to find confirming patterns in sufficiently vague descriptions. Genuine explanatory power is selective: it makes some observations expected and others surprising, and the observations that have actually occurred are the expected ones.1

The theory of evolution by natural selection illustrates an important asymmetry that any honest account of explanatory power must acknowledge: a model can possess explanatory power of the highest order while its predictive power remains inherently limited in scope. Darwin’s theory organizes an extraordinary range of observations — the fossil record, the geographic distribution of species, the molecular similarities between organisms, the existence of vestigial structures, the emergence of antibiotic resistance in bacteria — into a single coherent framework of variation, selection, and inheritance. Its explanatory reach is genuinely extraordinary. Yet what the theory predicts, in specific terms, is constrained by the nature of the mechanism it describes. Evolution predicts that adaptive pressures will produce adaptations. It cannot predict in advance which specific adaptations will arise, because the available genetic variation, the precise character of the selective pressure, and the sequence of contingent historical events are all unknowable in advance. We can predict with confidence that a bacterial population exposed to an antibiotic will eventually develop resistance; we cannot predict which specific mutations will achieve this. We can predict that a population isolated on an island will diverge from its mainland relatives; we cannot predict whether the divergence will manifest in body size, coloration, beak morphology, or some combination of all three. The theory is not weakened by this limitation — it is clarified by it. Explanatory power and predictive power are distinct properties, and a model can have one in abundance while the other is available only partially or within a restricted domain.2

The second property: predictive power

Explanatory power looks backward, at observations already made. Predictive power looks forward, at observations not yet made. A model has predictive power when it specifies, in advance, what should be observed under conditions that have not yet been examined — and when those predictions are specific enough to be capable of being wrong.

The importance of that last clause cannot be overstated. A prediction that is consistent with any possible outcome is not a prediction at all. It is a description dressed in the grammatical form of a forecast. The economic model that predicts either growth or contraction depending on “conditions,” the medical treatment whose advocates claim it works except when it does not, the political theory that explains every election result as a confirmation of its principles — none of these is making genuine predictions, because none of them is sticking its neck out in the way that genuine prediction requires.

What makes a prediction genuine is its falsifiability: the existence of a clearly specifiable outcome that would, if it occurred, constitute evidence against the model.3 The physicist who predicts the deflection of light around the Sun to a specific angular value, and then observes a solar eclipse to check whether the measured deflection matches — this is prediction in the relevant sense. The observation could have come out differently. That it did not is informative. It counts, in a way that unfalsifiable claims cannot count, as evidence in the model’s favor.

There is a practical implication here that extends well beyond physics. When we encounter a model — of the economy, of a social phenomenon, of a person’s likely behavior — that seems to predict everything, it is worth asking specifically what it would predict if we changed a particular variable. If the answer is always compatible with whatever we subsequently observe, the model is not generating genuine predictions; it is generating post-hoc explanations masquerading as predictions. The retrospective sense that events were inevitable, that the model saw them coming all along, is one of the most reliable signs that we are dealing with explanation rather than prediction.4

The third property: grounding in deeper principles

The third criterion distinguishes models that stand alone from models that are embedded within a larger explanatory structure. A model is grounded in deeper principles when its claims are not merely empirical regularities — patterns that happen to hold in the cases observed — but can be derived from, or shown to be consistent with, more fundamental facts about how the world works.

The deepest form of this grounding is deduction from first principles: the derivation of a model’s predictions through logical inference from foundational axioms, without any appeal to empirical observation at the intermediate steps. When Maxwell derived the existence and speed of electromagnetic waves purely from his four equations of electromagnetism — equations that themselves followed from well-established experimental findings about electric and magnetic fields — he was not extrapolating a pattern from data. He was following a chain of logical necessity from premises to conclusion. The electromagnetic wave was not discovered by looking; it was deduced by reasoning, and the looking came afterward to confirm what the reasoning had already established. This is the standard against which all other forms of grounding are measured. A model that can be derived deductively from first principles does not merely fit the observations; it had to fit them, given the truth of the premises.5

Most models in science and everyday life do not achieve this standard, and that is not a criticism of them. Deductive grounding from first principles is available only in domains where the foundational axioms are themselves well-established and the relevant mathematics is tractable. Outside of physics and, to a degree, chemistry, models are more commonly grounded in what might be called mechanistic understanding: knowledge of the causal process by which an outcome is produced. Consider two models of the same phenomenon — one that simply describes an observed pattern (“in the studied sample, people who sleep fewer than six hours per night perform worse on memory tasks the following day”) and one that connects this observation to deeper mechanistic knowledge about synaptic consolidation, the role of slow-wave sleep in memory formation, and the biochemistry of adenosine clearance. Both models make the same prediction about the relationship between sleep and memory. But the grounded model does more: it tells us which interventions should affect the outcome, which apparently similar phenomena should show the same pattern, and which apparently dissimilar phenomena are, at the deeper level, the same thing in a different guise. It also tells us where the pattern might break down — under what conditions the relationship between sleep and memory consolidation would change, because the deeper mechanism operates differently under those conditions.

The spectrum runs, then, from the purely empirical generalization at one end — a pattern observed in the data, with no explanation of why it holds — through mechanistic grounding in the middle, to full deductive derivation from first principles at the other end. As we move along this spectrum, models gain in reach and reliability. A purely empirical generalization holds within the observed cases and may or may not hold outside them; there is no principled way to know. A mechanistically grounded model holds wherever the mechanism operates; its limits are the limits of the mechanism. A deductively grounded model holds wherever the axioms hold; its limits are the limits of its foundations. The distance between these positions is not merely a matter of intellectual tidiness. It is a practical matter of how far we can trust the model when we carry it into conditions we have not yet encountered.

This is what William of Ockham was pointing at when he formulated the principle now called Occam’s razor: that, all else being equal, explanations that invoke fewer independent assumptions are preferable to those that invoke more.6 The deeper the grounding — and the deeper still when that grounding reaches all the way to deduction from first principles — the fewer independent assumptions are required. One foundational equation does the explanatory work that would otherwise require many separate empirical generalizations. Grounding in deeper principles is not merely an aesthetic preference for elegance. It is a practical criterion for models that will continue to work when taken outside the specific conditions in which they were developed.

This criterion has an immediate application to the models we use in everyday reasoning. A belief about a person, a policy, or a social phenomenon is more reliable when it is grounded in a general principle that has been established across many different cases than when it rests solely on a specific observation. The belief that a particular person is untrustworthy because they once told a lie has less grounding — and is accordingly less reliable — than a belief grounded in knowledge of the psychological conditions under which deception occurs, how persistent deceptive tendencies are, and what distinguishes strategic deception from situational dishonesty. The first belief might be right. The second is more likely to make accurate predictions across a wider range of future situations.

It is worth pausing here to ask a question that the preceding analysis raises but has not yet answered: can a model have strong predictive power while possessing little or no explanatory power? The answer is yes — and the examples are instructive. Actuarial mortality tables predict with remarkable precision how many people in a given demographic cohort will die in a given year, without explaining why any particular person will die. Credit scoring models predict default rates across large populations with high accuracy, without providing a causal account of the mechanism linking any specific variable to financial failure. Most strikingly, the machine learning systems that now identify tumors in radiology scans with accuracy rivaling experienced clinicians have no explanatory model of cancer at all; they have learned to recognize patterns in training data whose causal structure remains entirely opaque to them. In each case, predictive accuracy has been purchased without explanatory understanding.

These models share a specific and serious vulnerability that explanatory grounding would prevent. A model that captures patterns without understanding mechanisms borrows its reliability from the stability of the conditions in which it was trained. When those conditions change — when a pandemic arrives and overturns the mortality tables, when a financial crisis rewrites the relationship between credit variables and default, when a novel tumor type appears that was absent from the training data — the model fails without warning, because there is no mechanism to alert it that its domain of validity has been exceeded. Explanatory power, by contrast, converts this borrowed reliability into owned reliability: a model grounded in the mechanism retains its validity wherever the mechanism operates and loses it only when the mechanism itself changes, which is a predictable and often detectable event. The distinction between prediction that knows why it works and prediction that merely works is, therefore, not merely academic. It is the difference between a model that can anticipate its own limits and one that discovers them only in failure.7

This observation connects the third property directly to the fourth. A model with strong predictive but weak explanatory power is particularly vulnerable to the failure mode that the next section describes: the failure to know its own limits in advance.

The fourth property: knowing the limits

The fourth criterion is the one that receives least attention and causes most damage when neglected. A model knows its limits when its proponents can specify — in advance, not merely in retrospect — the conditions under which the model’s predictions become unreliable, the domain beyond which its explanations no longer apply, and the observations that would constitute evidence against it.

Every model has limits. This is not a flaw to be apologized for; it is a structural feature of what it means to be a model at all. A model that captured every feature of the territory it represented would not be a model; it would be the territory itself. The Underground map that showed every curve of every tunnel, every geological formation the tracks pass through, every change in gradient and air pressure — this map would be useless precisely because of its completeness. Omission is not a failure of the model; it is what makes the model a model. But what matters is whether the omissions are acknowledged, and whether the model’s users know which questions to take elsewhere.

Newton’s laws of motion are among the most precisely confirmed scientific claims in the history of human inquiry. Tested against the orbits of planets, the trajectories of projectiles, and the motion of every macroscopic object moving at speeds far below that of light, they have proven reliable to an extraordinary degree. They also fail — completely and irrecoverably — at the speeds approached by subatomic particles and at the scales where quantum mechanics applies. A physicist who did not know these limits would apply Newton’s laws where they do not hold and be systematically wrong without knowing why. The fact that Newton’s laws are wrong at quantum scales does not diminish their value in the domain where they work. What would diminish their value is using them outside that domain while believing they still apply.8

The same principle holds for every model we use. The model of economic behavior that works well in stable, competitive markets does not necessarily work in conditions of extreme scarcity or social disruption. The model of a friend’s personality that accurately predicts their behavior in familiar situations may fail entirely when they face circumstances they have never encountered before. The self-model that correctly describes who we were at thirty may not describe who we are at fifty. In each case, what matters is not that the model eventually fails — that is inevitable — but whether we knew in advance that it would fail under those specific conditions.

The diagnostic question that The Conscious Look recommends across all domains applies here in its most fundamental form: what would have to be true for this model to be wrong? If the answer is nothing — if every possible observation is compatible with the model — then the model’s limits are invisible, which means it can be applied anywhere and will eventually cause harm in exactly those places where it no longer applies.

The map and the territory

The philosopher Alfred Korzybski observed, in a phrase that has become the organizing metaphor of this series, that the map is not the territory.9 This observation is so simple that it can be received as a truism and set aside. It deserves more sustained attention than that.

The map is not the territory in a specific and important sense: the map is a selective, simplified, purposive representation of a territory that exists independently of any representation. The territory has features the map does not show. The territory has boundaries that are not the edges of the map. Things happen in the territory that the map gives no warning of. And — most importantly — the territory does not change when the map changes: a cartographer who draws a road where no road exists does not thereby create a road.

The four criteria discussed in this article — explanatory power, predictive power, grounding in deeper principles, and known limits — are ways of asking how good a particular map is. A map with strong explanatory power organizes the territory’s features into a legible pattern. A map with strong predictive power tells us what we will find when we go somewhere we have not yet been. A map grounded in deeper principles derives its layout from knowledge of the terrain rather than from arbitrary convention. And a map that knows its limits has a legend that specifies where the map’s accuracy has been verified and where it is extrapolation.

No map satisfies all four criteria perfectly. The appropriate response to this fact is not to abandon the map — we cannot navigate without one — but to hold it with the combination of confidence and humility that its actual quality warrants. Confidence, because the map is the best available guide to the territory, and acting without a guide is not a form of freedom but a form of blindness. Humility, because the territory is always larger, more complex, and more surprising than the map.

This is the beginning of The Conscious Look: not the abandonment of our models, but the practice of evaluating them honestly against the standards that distinguish a good map from a bad one.

Notes

The philosophical term for this property is the contrastive character of genuine explanation: an explanation is genuine when it can answer not just the question “why did this happen?” but the question “why did this happen rather than something else?” An explanation that answers both questions with equal facility — that would work just as well if the observation had been different — fails the contrastive test and therefore fails to explain in any deep sense. The psychologist Peter Wason documented the human tendency to seek confirming rather than disconfirming evidence in his famous 2-4-6 task, published in 1960, which is among the most replicated findings in the study of human reasoning.
The distinction between the explanatory power and the predictive limitations of evolutionary theory has been discussed extensively in the philosophy of biology. The philosopher Elliott Sober, in his The Nature of Selection (1984) and subsequent work, has argued that evolutionary theory is best understood as explaining why adaptations exist rather than predicting which ones will arise — a distinction that maps precisely onto the contrast between retrospective explanation and prospective prediction developed in this article. The theory’s apparent weakness in specific prediction is not a defect but a consequence of the nature of the mechanism it describes: natural selection operates on variation that is itself produced by processes — genetic mutation, recombination, environmental perturbation — that are either intrinsically random or too sensitive to initial conditions to permit precise long-range prediction. This is not a failure of the theory. It is an honest account of what the mechanism can and cannot tell us in advance.
The requirement that genuine predictions be falsifiable is associated primarily with the philosopher Karl Popper, who developed it as a criterion of demarcation between scientific and non-scientific claims. Popper’s original motivation was to distinguish Einstein’s general relativity — which made specific, precise predictions that could in principle be tested and refuted — from Freudian psychoanalysis and Marxist historical theory, which he argued could accommodate any possible observation. The falsifiability criterion has been debated extensively since Popper proposed it, and most philosophers of science now regard it as necessary but not sufficient as a criterion of scientific status. For the purposes of this series, the important point is the weaker claim: predictions that cannot be wrong provide no evidence for the model that generates them.
Kahneman describes this as the narrative fallacy — the tendency to construct coherent stories about past events that give them a sense of inevitability they did not have before they occurred. The related phenomenon he calls hindsight bias is the tendency to believe, after an outcome is known, that we would have predicted it in advance. Both biases systematically inflate our confidence in our models’ predictive power by retroactively converting explanations into predictions.
The clearest historical example of deduction from first principles producing empirical predictions is James Clerk Maxwell’s derivation of electromagnetic radiation in his 1865 paper “A Dynamical Theory of the Electromagnetic Field.” Starting from his four equations describing the behavior of electric and magnetic fields — equations which themselves summarized decades of experimental work by Faraday and others — Maxwell derived by pure mathematical reasoning that oscillating electric and magnetic fields would propagate through space as a wave, and that this wave would travel at the speed of light. The identity of light as an electromagnetic wave was therefore not an experimental discovery in the usual sense; it was a deductive consequence of Maxwell’s equations. Heinrich Hertz confirmed the existence of electromagnetic waves experimentally in 1887, twenty-two years after Maxwell’s derivation. This sequence — deduction first, observation second, with observation confirming what the logic had already established — is the gold standard of scientific grounding, and its rarity makes each instance all the more instructive. The same logical structure appears in Einstein’s general relativity, where the precession of Mercury’s orbit and the deflection of light around the Sun were deductive consequences of the field equations, confirmed by observation only after the mathematics had already determined what the observations must show.
William of Ockham (c. 1287-1347) was an English Franciscan friar and philosopher whose principle of parsimony — that entities should not be multiplied beyond necessity — has been reformulated many times in the history of science and philosophy. The version commonly called Occam’s razor in contemporary usage is a rough approximation of his original argument, which was embedded in a broader nominalist philosophy. The razor is a heuristic, not a logical principle: simpler explanations are not always correct. But in the context of choosing between models of equal predictive power, the simpler model — the one that requires fewer independent assumptions — is generally to be preferred, because it is more likely to generalize correctly to new cases.
The vulnerability of pattern-based models to regime change has been studied extensively in the context of machine learning under the heading of distribution shift: the failure of a model trained on data from one distribution when it encounters data from a different distribution. The financial crisis of 2008 provided a dramatic real-world illustration: credit risk models trained on data from the preceding period of relative stability systematically underestimated default correlations under stress conditions, because the stable-period data contained no information about how the system would behave under the novel conditions of a simultaneous collapse across multiple asset classes. The models were not wrong within their domain of training; they were wrong about where their domain ended. This is precisely the failure that explanatory grounding is designed to prevent: a model that understands the mechanism by which credit defaults occur can, in principle, reason about how that mechanism behaves under novel stress conditions, even without having observed such conditions directly.
The failure of Newtonian mechanics at high velocities was one of the central problems that motivated Einstein’s development of special relativity in 1905. Mercury’s orbit around the Sun shows a small precession — a slow rotation of the orbital ellipse — that Newtonian mechanics could not fully account for. The precise amount of this precession was one of the first quantitative predictions that general relativity made and that subsequent observation confirmed. The fact that Newton’s laws are an approximation — accurate to extraordinary precision within a defined domain, and systematically wrong outside it — is among the clearest available demonstrations of the point this section is making: models can be extraordinarily useful while being, in some sense, wrong.
Alfred Korzybski (1879-1950) was a Polish-American philosopher and engineer who introduced the map-territory distinction in a 1931 address to the American Mathematical Society. His broader project, which he called General Semantics, was an attempt to improve human reasoning by clarifying the relationship between words, thoughts, and the world. His major work, Science and Sanity (1933), is demanding but rewarding. The phrase “the map is not the territory” has since passed into common usage far beyond the context of Korzybski’s original argument, which is unfortunate in one respect — its full philosophical force is often lost in the popularization — and fortunate in another: it has become one of the most widely shared epistemic cautions in circulation.

102 – What is a model