The Bottleneck and the Coastline

Every statistical operation does the same thing. It takes a high-dimensional world and crushes it through a lower-dimensional aperture. What survives the crush is your result. What doesn’t survive is invisible to you — and that invisibility has a shape.

These two papers develop that observation from first principles to operational consequences.

The Bottleneck Operation: high-dimensional data crushed through a narrow aperture. What survives is the result; what's destroyed is the kernel. The analyst's choice of bottleneck is a self-portrait.

The Bottleneck — Statistics as Compression

The mean is the most aggressive bottleneck possible: project an entire distribution onto a single point. What survives? A single number. What dies? Everything else — shape, spread, multimodality, dependence. The variance isn’t an independent quantity. It’s the residual of the first compression — it quantifies what remains after the mean has done its work.

Once you see this, the classical “paradoxes” dissolve:

Four classical paradoxes as projection errors: Simpson's Paradox, Base Rate Fallacy, P-Value Misinterpretation, and Regression to the Mean — all the same error with the bottleneck invisible.

Simpson’s paradox is two different bottlenecks (aggregated vs. stratified) destroying different information. Same data, different compressions, different conclusions. Not a paradox — a projection mismatch.

The base rate fallacy is confusing a bottleneck-property (the test’s sensitivity) for a world-property (the probability you’re actually sick). The test measures itself — how well it sorts patients. Translating that into a posterior requires reintroducing what the test destroyed: the base rate.

P-value misinterpretation is the same error at industrial scale. The p-value tells you how the apparatus behaves when the null is true. It tells you nothing about the world unless you supply the prior — which the test’s bottleneck never carried. The ASA’s 2016 clarification, the replication crisis, decades of confusion: all consequences of mistaking the instrument for the phenomenon.

Regression to the mean is the first projection mistaken for signal. Extreme values are partly bottleneck artifact; repeat measurement regresses because the noise component doesn’t replicate.

The paper then turns this lens on two contemporary cases. First, adversarial AI attacks — where the attacker succeeds by characterizing what the defender’s safety classifier can’t see (its kernel) and routing the payload through that blind spot. Encoding arbitrage, context-window manipulation, persona injection: all bottleneck navigation. Second, Anthropic’s Sabotage Risk Report — where the institution tries to triangulate past the blind spots of any single evaluation by deploying multiple independent bottlenecks whose intersection constrains the unknown more tightly than any one alone. Honest institutional epistemics as multi-bottleneck triangulation.

The deeper point: the choice of bottleneck is itself data about the chooser. Every time an analyst picks a test, a model, a summary statistic, they’re choosing what to preserve and what to destroy. That choice is a self-portrait.

The Coastline — Surveillance Power as Fractal Scaling

Mandelbrot showed that Britain’s coastline gets longer as you measure it with a shorter stick. The same thing happens with surveillance.

The predictability coastline: as data resolution increases from GPS to purchases to communications metadata to content to biometrics, each new data type opens an entirely new axis of prediction rather than refining existing ones.

The predictability coastline $C(\varepsilon)$ traces how an observer’s predictive power grows with data resolution. Each new data type doesn’t sharpen existing predictions — it opens an entirely new axis, as the radar chart shows. The shape of the coastline, not any single number extracted from it, is the object of interest.

But the coastline depends on what you’re trying to predict. So we define the coherent coastline — a worst-case envelope over diverse prediction targets. What survives that adversarial re-questioning is system-intrinsic. What doesn’t is measurement artifact.

Across six systems (Lorenz attractor, Hénon map, three financial time series, and a thermostat baseline), the coherent coastline produces a clean three-tier separation:

Scatter plot of coherent capacity vs. fragility ratio showing three distinct tiers: chaotic attractors (high coherence, low fragility), financial markets (moderate, fragile), and structureless noise (near-zero coherence, extreme fragility). The diagonal from robust to illusory shows that coherence is what survives adversarial re-questioning.

The diagonal tells the story: the Lorenz attractor keeps its structure no matter how you interrogate it. The thermostat loses 97% — almost everything it appeared to have was measurement artifact.

The Capture Threshold

The paper’s sharpest result is the capture threshold: the data resolution at which an observer’s model of you exceeds your own self-model.

The key is kernel asymmetry — not finer-grained observation of the same variables, but the observer seeing variables the self-model projects away entirely. You know your daily routine. You don’t know the pattern in your heart rate variability that predicts your decisions before you make them. The observer who has your biometrics does.

The diagram shows the phase transition: below, asymmetric information where you retain sovereignty. Above, asymmetric agency — the observer predicts behaviors you haven’t decided on yet.

Defense Isn’t About Volume

The fractal structure of the coastline implies something counterintuitive about privacy defense: deleting data uniformly barely helps. What matters is which data you protect.

Data that bridges otherwise disconnected behavioral clusters — the link between your work-self and your home-self, between stated beliefs and revealed preferences — is 7.7× more valuable to the observer than data that merely adds density within an existing cluster. Protecting bridge data is the high-leverage intervention. Uniform data minimization is security theater with extra steps.

The Connection

The filtration $\mathcal{F}_\varepsilon$ is the bottleneck, parameterized by resolution. $C(\varepsilon)$ measures what survives the projection. The capture threshold is the point where the observer’s bottleneck resolves distinctions that your self-model’s bottleneck collapses.

The bottleneck paper establishes the grammar. The coastline paper measures how that grammar scales with data — and where it breaks the subject.

Two-paper architecture: The Bottleneck Primitive provides the grammar (what compression does, why paradoxes appear); The Coastline of Predictability provides the scaling law (how compression scales with data, where it breaks the subject). Both rest on the same foundation: every epistemic act is projection through finite bandwidth.

The Bottleneck Primitive — statistics as compression

The Coastline of Predictability — surveillance power

Citations:

Close, L. J. (2026). The Bottleneck Primitive: Statistics as the Study of Information Compression. Zenodo. https://doi.org/10.5281/zenodo.18667644

Close, L. J. (2026). The Coastline of Predictability: Coherent Multi-Scale Measurement of Surveillance Power. Zenodo. https://doi.org/10.5281/zenodo.18668211

BibTeX

@article{close2026bottleneck, author = {Close, Larsen James}, title = {The Bottleneck Primitive: Statistics as the Study of Information Compression}, journal = {Zenodo}, year = {2026}, doi = {10.5281/zenodo.18667644}, url = {https://zenodo.org/records/18667644} } @article{close2026coastline, author = {Close, Larsen James}, title = {The Coastline of Predictability: Coherent Multi-Scale Measurement of Surveillance Power}, journal = {Zenodo}, year = {2026}, doi = {10.5281/zenodo.18668211}, url = {https://zenodo.org/records/18668211} }