S1-202 – The 7 plus or minus 2 problem

Why the brain is always compressing

In the summer of 1956, the cognitive psychologist George Miller published a paper in the journal Psychological Review that has become one of the most widely cited in the history of the field. Its title was deliberately playful: “The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information.” What Miller had noticed, across a series of experiments involving the memorization of digits, letters, and words, was a convergence so consistent that it seemed almost impossible: the number of distinct items that a person could hold in active awareness at any given moment hovered, regardless of the nature of the items, reliably around seven. Not six. Not eleven. Seven, give or take two.¹

The finding has been refined and debated since Miller published it, and contemporary cognitive neuroscience suggests the true limit may be closer to four items in its purest form. But Miller’s core observation has proven durable in a way that the exact number has not: there is a strict and small ceiling on the number of distinct elements that consciousness can manage at once, and that ceiling is far lower than most of us realize. This, too, is easy to verify immediately. Anyone can read a 10-digit phone number once, look away, and try to recite it. Most people succeed, just barely, and only if they have been silently rehearsing. With two more digits, most fail. The limit is not a matter of intelligence or effort. It is a structural property of the cognitive system, as fixed as the fovea’s resolution or the ear’s range of audible frequencies. We are working with a very small workspace.

What makes this limit philosophically interesting, and what makes it the subject of this article in a series about models and their limits, is not the number itself but what the mind does to work around it. The solution the brain has evolved to this constraint is one of the most elegant and consequential operations in all of cognition. It is called chunking, and understanding it illuminates not only how we think but what happens to knowledge as it is compressed and why expertise and the knowledge illusion arise from the same underlying mechanism.

The constraint and the workaround

The working memory limit is not, as it might initially seem, a limit on how much information a person can process. It is a limit on how many distinct items the conscious workspace can hold simultaneously. The crucial distinction is between an item and a chunk. A chunk is a unit of information that has been built up through prior learning into a single, integrated object of attention. The individual letters H, E, A, R, T are five items when encountered for the first time, five distinct squiggles that must be memorized individually. For a literate adult who has encountered the word 10,000 times, H-E-A-R-T is a single chunk: one item in working memory, but an item that carries within it an enormous amount of associated meaning, sound, and connection.²

This is why an expert chess player can reconstruct the positions of 20 pieces from a 5-second glance at a mid-game board, a feat that seems to violate the working memory limit until we realize that the expert is not holding 20 pieces in mind. They are holding four or five familiar patterns, each of which contains multiple pieces in known relationships. The items in the expert’s working memory are chunks, and each chunk is a compressed representation of a structure that would require many separate items to represent explicitly. The limit has not changed. What has changed is the resolution of the items that are loaded into it. More meaning per slot.

The same principle applies in every domain of expertise. The experienced internist who looks at a cluster of symptoms does not hold each symptom separately in working memory while reasoning about their combination; they recognize the cluster as a familiar pattern, a single item in the diagnostic vocabulary built up over years of practice. The seasoned programmer who reads a block of code does not process each line independently; they recognize familiar structures, design patterns, and control flows as single units of attention. The fluent reader of any language does not decode each letter in a word; they perceive the word, and often the phrase, as a single perceptual unit. All of these are chunking in action: the compression of complex structures, through experience and learning, into single objects of awareness.

What chunking costs

The efficiency of chunking is real. It is what makes expertise possible. Without it, we could not navigate the complexity of the environments we actually inhabit (social, professional, physical), because the complexity exceeds what the raw cognitive system could process item by item in the time available. Chunking is not a workaround for a deficient system. It is the mechanism by which a system with strict limits on conscious bandwidth manages to function in a world that does not respect those limits.

But chunking has a cost that is not always visible from inside the expertise it enables. When a complex structure is compressed into a chunk, the internal details of that structure are no longer separately available to working memory. The chunk is accessed as a unit, not decomposed. The expert who perceives a clinical pattern or a chess position or a code structure as a single item gains the efficiency of rapid recognition at the expense of conscious access to the individual components. This is usually not a problem: in most situations, the chunk is the appropriate unit of analysis, and the internal details are not needed. But it becomes a problem in two characteristic situations.

The first is when the chunk is wrong. Because chunks are built through experience with typical cases, they encode what is normally true of structures of a given kind. When an atypical case arrives (the patient whose symptoms fit the familiar cluster but have an unusual cause, the chess position that looks like a known pattern but has a critical difference, the code structure that resembles a standard design but is subtly broken), the chunking system misidentifies it. The chunk fires on superficial similarity and suppresses the signals that would reveal the difference. The expert has become, in the specific sense that expertise produces, unable to see what the novice might notice: the detail that falls outside the expected pattern. The same compression that enables rapid and accurate recognition in typical cases produces confident misidentification in atypical ones.³

The second situation is when the chunk is transferred outside its domain of origin. Chunks are built in specific contexts, from specific data, for specific purposes. A political analyst who has developed rich chunks for the dynamics of elections in one country may find that those chunks fire, inappropriately, when applied to elections in a different political culture with different institutional arrangements. A physician trained in one disease population may apply diagnostic chunks built on that population to patients whose background risk profile is entirely different. In each case, the chunk is not wrong within its domain; it is a genuine compression of real regularities. It becomes wrong when applied outside the conditions under which it was built, because the details that the compression discarded are the details that differ between domains.

There is a striking demonstration of this tradeoff that comes from a direction nobody expected. In 2007, researchers at the Primate Research Institute of Kyoto University led by Tetsuro Matsuzawa published a finding that became one of the most discussed in comparative cognition: a young chimpanzee named Ayumu consistently outperformed adult humans on a rapid numeral-location memory task.⁴ In the experiment, the numerals 1 through 9 appeared randomly arranged on a touchscreen for a fraction of a second (as briefly as 210 milliseconds, less than the time it takes to blink) before being simultaneously covered by white squares. The task was to touch the squares in the ascending numerical order of the now-hidden numerals. Adult chimpanzees performed comparably to humans on this task. Young chimpanzees, Ayumu above all, performed dramatically better: faster, more accurate, and seemingly unaffected by the brevity of the exposure that caused human performance to collapse. Matsuzawa’s proposed explanation, the cognitive tradeoff hypothesis, is directly relevant to this article’s central argument. He suggests that humans have sacrificed the eidetic-like rapid visuospatial capture that young chimpanzees possess to make room for language. When a human sees the numeral 5 on a screen, they cannot prevent the activation of its rich semantic associations (fiveness, counting, arithmetic, its position in a sequence), and this semantic richness, the product of exactly the kind of chunking this article describes, interferes with the pure positional snapshot that the task requires. Ayumu sees a symbol with a location. The human sees a concept with a history. The cognitive compression that gives us language, mathematics, and the accumulated knowledge of civilization is the same compression that makes us lose, decisively and embarrassingly, to a chimpanzee doing arithmetic at 210 milliseconds. The capacity that produces our greatest cognitive achievements is the same capacity whose richness costs us the speed.

The knowledge illusion

The cognitive scientist Steven Sloman and the philosopher Philip Fernbach have documented a phenomenon they call the knowledge illusion: the systematic tendency of human beings to believe they understand things in more depth and detail than they actually do.⁵ Their most striking demonstration involves what they call the illusion of explanatory depth. People are asked to rate their understanding of everyday objects (a toilet, a bicycle, a zipper) on a scale from one to seven. They typically rate themselves as reasonably knowledgeable. They are then asked to write a detailed causal explanation of how the object works: step by step, in enough detail that someone could understand the mechanism from the explanation alone. Their ratings collapse. The experience of trying to articulate the mechanism reveals that what felt like understanding was something considerably shallower.

Chunking is the cognitive mechanism that produces this illusion. When we encounter a word (bicycle, democracy, inflation, quantum), we recognize it immediately. The recognition is smooth, fast, and accompanied by a feeling of familiarity that is easily mistaken for understanding. But recognition is not understanding. The chunk fires in response to the word, producing the feeling of access to associated meaning, without actually unpacking that meaning into its constituent parts. What we have accessed is the label for a structure, not the structure itself. And because chunking is rapid and automatic, the difference between recognizing a label and understanding the thing it names is largely invisible from the inside.

This is the mechanism behind the finding from the upcoming article 212: that knowing the name of a thing is not the same as knowing the thing, and that most of what passes for knowledge in public discourse, and in private conviction, is recognition of labels rather than genuine understanding of mechanisms. The word democracy can be held in working memory as a single chunk. What it compresses (all the institutional arrangements, historical contingencies, philosophical arguments, and practical trade-offs that constitute actual democratic governance) is vastly more than any working memory can hold simultaneously. The chunk feels like understanding because the label is fluent and the associated network of connections fires rapidly. It is not understanding. It is a map of a name.

The political consequence of this is direct and underappreciated. Research by Sloman, Fernbach, and their colleagues has consistently found that people who hold the most extreme political positions, who are most confident about the correctness of their views on complex policy questions, show the most severe illusion of explanatory depth when asked to explain the mechanisms by which their preferred policies would work.⁶ This is not a partisan observation. It applies to extreme positions across the ideological spectrum. The pattern is a prediction of the chunking model: strong emotional and tribal association with a position increases the fluency of the label, which increases the feeling of understanding, which increases the confidence, which reduces the motivation to decompose the chunk into its actual components. The chunk becomes self-reinforcing. The more strongly held the position, the less the mechanism is examined.

The same illusion operates not only in individual minds but in the relationship between the individual and the civilizational systems that surround them. In 1958, the economist Leonard Read published a short essay called “I, Pencil,” written in the first person from the point of view of a humble pencil narrating the extraordinary complexity of its own production: the cedar logged in Oregon, the saws made of steel derived from iron ore, the graphite mined in South America, the rubber for the eraser from trees originally transplanted to Malaya, the brass ferrule whose supply chain disappears into further supply chains. Read’s central claim was as simple as it was disorienting: not a single person on the face of the earth knows how to make a pencil. The knowledge required is distributed across thousands of people, none of whom possesses more than a small part of it, coordinated not by any central understanding but by the price system that signals, without anyone needing to comprehend the whole. Milton Friedman brought the essay to a mass audience in his 1980 television series Free to Choose, holding up a pencil on screen and demonstrating, in 2 minutes, that he had no idea where the brass ferrule came from, or the yellow paint, or the glue. “This black center,” he said, “we call it lead but it’s really graphite. I’m not sure where it comes from.” The pencil is a chunk. It sits in working memory as a single, familiar, effortlessly accessed unit. What it compresses (the interlocking knowledge of thousands of people across dozens of countries) is invisible from inside the chunk entirely. The Sloman and Fernbach experiments demonstrate the illusion at the scale of the individual explaining a toilet or a zipper. Read and Friedman demonstrate it at the scale of civilization explaining itself.⁷

Jordan Peterson uses a closely related example in his lectures that approaches the same point from the direction of failure rather than complexity. We drive our cars with complete ease, he observes, navigating traffic, changing lanes, reaching our destination, without the slightest conscious access to the engineering that makes the journey possible. The internal combustion engine, the fuel injection system, the transmission, the differential, the braking mechanism, the suspension geometry: none of it enters awareness. The car is a chunk, accessed as a single operational unit, and it functions as such right up to the moment it stops functioning. When the car breaks down by the side of the road, the chunk dissolves, not into understanding, but into the sudden, vertiginous realization that there was never any understanding there to begin with. The breakdown does not reveal knowledge that was previously hidden. It reveals the absence of knowledge that was previously invisible. This is the knowledge illusion in its most visceral form: not the gentle deflation that comes from trying to explain a zipper, but the abrupt confrontation with the gap between competent use and genuine comprehension. Peterson’s observation is that this gap characterizes not just cars but almost every tool and institution on which modern life depends, and that the ease with which we navigate the world is a measure not of our understanding but of how rarely those invisible systems fail.

The compression of experience

There is a further dimension to the chunking problem that extends beyond factual knowledge into the domain of judgment and values. Experience, as it accumulates, is itself compressed. We do not carry the full detail of everything that has happened to us in explicit, articulate form; we carry summaries, interpretive frameworks, and condensed representations of patterns observed across time. These compressed representations are genuinely valuable: they are the basis of practical wisdom, social intuition, and the kind of tacit knowledge that article 604 of this series examines. But they are also subject to the same distortions that affect all chunking: they encode what was typically true in the contexts where they were built, and they apply that typical-case template to new situations that may differ in crucial ways.

The experienced parent who applies a parenting chunk built from their first child to their second child of different temperament. The veteran manager who applies a leadership chunk built in one organizational culture to a very different one. The long-married person whose model of their partner was chunked years ago and has not been consciously updated as the partner changed. In each case, the chunk is a genuine compression of real experience, and in each case it is being applied with a confidence that the evidence does not fully support, because the evidence is being filtered through the chunk rather than examined directly. The compression that made the experience manageable has, over time, become a barrier to seeing what is actually there.

This is the connection between the cognitive limit and the central argument of this series. Every model is a compression. Every compression discards details. The question is whether we know which details have been discarded, and what would happen if those details turned out to matter. The working memory limit is not the enemy of good thinking. It is a structural feature of the cognitive system within which all thinking must occur. The practice the series calls The Conscious Look is, at this level, the practice of occasionally and deliberately decompressing the chunks we rely on: asking what is inside them, whether the internal structure still matches the reality it was built to represent, and what kinds of experience might have been discarded in the compression that would be relevant now.

Notes

¹ Miller’s original paper estimated the limit at seven plus or minus two, meaning the range from five to nine items. Subsequent research, particularly by Nelson Cowan beginning in the late 1990s, has argued that the true capacity of working memory when chunking is carefully controlled is closer to four items. The discrepancy partly reflects the difficulty of preventing experimental participants from grouping items into chunks even when instructed not to. The distinction between four and seven matters less for this article’s purposes than the core point: the limit is small, well below what most people intuitively believe. Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87-114.

² The term chunking was introduced by Miller himself in the 1956 paper, and he was explicit that the critical unit of working memory capacity is the chunk (a unit of information that varies enormously in the amount of raw information it encodes) rather than the bit in the information-theoretic sense. A single chunk might encode one letter or an entire sentence or a complex perceptual scene, depending on the observer’s prior learning. This insight, that the relevant unit of cognitive capacity is the meaningful unit rather than the physical signal, is one of the founding insights of cognitive psychology.

³ The phenomenon of expert misidentification through chunking is related to what Gary Klein has called recognition-primed decision-making: expert decision-makers in naturalistic settings typically do not evaluate options analytically but recognize situations as typical of familiar types and apply the response that has been effective for that type before. This is fast and usually effective. It becomes dangerous in two conditions: when the situation is atypical in a way that resembles a familiar type, and when the expert has been trained in a domain with long feedback loops, where the consequences of a decision are delayed and the chunking system receives inadequate correction. The literature on this is reviewed in Klein, G. (1998). Sources of Power: How People Make Decisions. MIT Press.

⁴ Inoue, S., and Matsuzawa, T. (2007). Working memory of numerals in chimpanzees. Current Biology, 17(23), R1004-R1005. The research was conducted at the Primate Research Institute of Kyoto University as part of the long-running Ai Project, which has studied chimpanzee cognition since the late 1970s. The finding generated significant debate in the literature. Cook and Wilson (2010) argued in a letter to Science that Ayumu’s advantage partly reflected his extensive training on the task compared to the untrained human participants, and subsequent studies showed that humans with equivalent practice could approach, though generally not match, Ayumu’s performance on trials with five numerals at 210 milliseconds. The question of whether the chimpanzee advantage is absolute or a function of training history remains contested. What is not contested is Matsuzawa’s cognitive tradeoff hypothesis as a framework: that human semantic encoding of symbols, the foundation of language and abstract thought, occupies precisely the cognitive resources that pure visuospatial positional memory would otherwise employ. For a comprehensive treatment of the Ai Project and its findings, see Matsuzawa, T. (2013). Evolution of the brain and social behavior in chimpanzees. Current Opinion in Neurobiology, 23(3), 443-449.

⁵ Sloman, S., and Fernbach, P. (2017). The Knowledge Illusion: Why We Never Think Alone. Riverhead Books. The illusion of explanatory depth was first systematically demonstrated in Rozenblit, L., and Keil, F. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26(5), 521-562. Sloman and Fernbach’s contribution was to extend the finding to political and policy beliefs and to develop the social dimension of the knowledge illusion, the observation that much of what feels like individual understanding is actually distributed across a community of specialists whose knowledge the individual can access but does not personally possess.

⁶ Fernbach, P. M., Rogers, T., Fox, C. R., and Sloman, S. A. (2013). Political extremism is supported by an illusion of understanding. Psychological Science, 24(6), 939-946. The study found that asking participants to explain the mechanisms underlying their favored policy positions, rather than simply listing arguments for them, reduced their confidence in those positions and their reported extremity. This effect held across a range of policy topics and was not mediated by the persuasiveness of arguments. The mechanism of the effect is consistent with the knowledge illusion account: the act of attempting to articulate the mechanism reveals the limits of the understanding, which calibrates the confidence more accurately. The finding suggests a specific and testable intervention for reducing political extremism (not argument, but the requirement of mechanistic explanation), though the authors note that the effect was modest in magnitude and may not be durable.

⁷ Leonard Read, “I, Pencil: My Family Tree as Told to Leonard E. Read,” The Freeman, December 1958. Reprinted many times since, with an introduction by Milton Friedman who used it in Free to Choose (1980), both the television series and the accompanying book of the same title written with Rose Friedman. The essay is freely available at econlib.org. Friedman’s introduction notes that he knows of no other piece of literature that so effectively illustrates both Adam Smith’s invisible hand and Hayek’s emphasis on the importance of dispersed knowledge. The connection to Hayek is direct and important: the pencil argument is not merely a demonstration of the knowledge illusion in individual minds but an argument about the structural impossibility of any central authority possessing the knowledge that is distributed across millions of participants in a complex economy. This argument is developed further in article 804 of this series, which examines what political systems assume about human knowledge and why central planning fails on Hayekian grounds regardless of its assumptions about human motivation.

S1-202 – The 7 plus or minus 2 problem

Why the brain is always compressing

The constraint and the workaround

What chunking costs

The knowledge illusion

The compression of experience

Further reading

Notes

Leave a reply Cancel reply

S1-202 – The 7 plus or minus 2 problem

Why the brain is always compressing

The constraint and the workaround

What chunking costs

The knowledge illusion

The compression of experience

Further reading

Notes

Share this:

Leave a reply Cancel reply