Was The 2020 Census Algorithmically Polluted?

Here’s a provocative Substack essay that argues that the 2020 Census was systemically, algorithmically polluted by a single data scientist.

The 2020 census was marketed as an “actual enumeration,” a neutral count of people for apportionment and funding. It was not. The same official who helped block a basic citizenship question in 2018, John M. Abowd, then the Census Bureau’s Chief Scientist, pushed through a new, opaque methodology in 2020 called differential privacy. The new system deliberately injected mathematical noise into every block count in America, turning the census from a headcount into a model with knobs. The knob that mattered most was a single parameter, epsilon, a secrecy shroud known only to a small inner circle. Abowd argued that a single added question about citizenship posed an intolerable risk to data quality because there was, he said, not enough time to test it. Then he rushed an untested algorithm that altered every count in every neighborhood. The irony is so sharp it cuts: the man who warned that one question might distort the census approved a method that guaranteed distortion.

Start with the record. On January 19, 2018, Abowd sent Commerce a technical memo urging rejection of a citizenship question. He then testified for several days in federal court. The transcript, nearly 700 pages, cemented a narrative that any citizenship question would degrade data and impede participation. The courts cited this drumbeat of doubt, and the question was blocked. The administration lost the public fight. But the inside fight over how to publish the data was only beginning. Abowd immediately advanced a quiet revolution in disclosure avoidance, adopting differential privacy for the first time ever in a US census. That choice, made outside the glare that attended the citizenship question, had far more sweeping consequences.

Differential privacy sounds harmless. In truth, it is a mechanism that turns correct data into false data according to a secret recipe. Abowd did not merely suppress a few cells in tiny places. Instead, he ran an algorithm across the map that perturbed the population of every census block, and it postprocessed the results so the fabricated numbers looked tidy. The output retained familiar columns, but the counts were no longer the counts. Abowd convinced his colleagues in the Bureau that implementing differential privacy was merely compliance with 13 U.S.C. § 9, its duty to protect confidentiality. Privacy is important. But privacy, as a constitutional matter, follows the enumeration, it does not negate it. A 2021 Harvard analysis of Abowd’s manipulation showed what this means in real life. When researchers simulated the Abowd’s algorithm using public test data, they found that differential privacy moves people around on paper, shifting them from one neighborhood to another in ways that make communities look less diverse and change their apparent political makeup. In plain terms, the system can make a mixed neighborhood look whiter or more uniform, and a balanced district look more partisan than it is. The study also showed that the noise makes it impossible to meet the Supreme Court’s “One Person, One Vote” rule, which requires legislative districts to have nearly equal populations. If each district’s population count is warped by secret noise, some citizens’ votes end up weighing more than others. When a method, by design, destabilizes the precise block totals that redistricting depends on, it stops being disclosure avoidance and becomes statistical alteration. The framers mandated counting people, not blurring them.

The core lever in differential privacy is epsilon, the privacy loss budget. Abowd kept this number secret throughout 2020. Cities, states, researchers, and map drawers who saw the early demonstration files warned that the counts were veering away from reality. They had no way to tell whether errors in their communities were genuine undercounts or synthetic artifacts of the algorithm. Abowd’s system also crippled the ability of local governments, analysts, and other record‑keepers to find and fix mistakes. Normally, if a city discovers a counting error that affects federal funding, it can appeal through the Count Question Resolution (CQR) Program. With differential privacy, that safeguard collapses, because the published data are wrong on purpose, no one can separate genuine miscounts from the algorithm’s fake ones. This nullifies the traditional oversight process and leaves states helpless to correct funding or representation errors. Alabama tried to challenge this secrecy in State of Alabama v. U.S. Department of Commerce (2021), arguing that differential privacy was unconstitutional and illegal, but the court dismissed the case for lack of standing cost the state billions in lost federal funding. Lawsuits and FOIAs followed. Only in 2021 did the Bureau reveal that its chosen global epsilon was 19.61, and even then, the design of the system prevented outsiders from verifying that this figure was actually used. The system was structured so that no one, not even Congress, could audit the dial that governed the size and allocation of the noise across the nation. Abowd’s answer was simply, “Trust me.”

Epsilon is not a philosophy, it is a number with consequences. The average census block contains about 105 people. With an epsilon of 19.61 and the Bureau’s noise allocation strategy, the algorithm effectively invented or erased on the order of ten to thirty people in many small areas. A block of 105 real residents could be published as 95, 115, or even further off, depending on postprocessing and the way the privacy budget was spent in that region. Across millions of blocks those errors do not cancel. They compound in the design of wards, precincts, and districts. Redistricting is a sum of blocks. Distort the blocks, and you distort the districts, the legislatures, and the House. This practice is not merely bad policy; it is plainly unconstitutional. The Supreme Court’s opinion in Department of Commerce v. House of Representatives (1999) made clear that statistical sampling for apportionment is illegal on statutory grounds. Abowd’s algorithmic manipulation is statistical sampling by another name, an unlawful substitution of estimated data for an actual enumeration required by the Constitution.

The proof arrived in March and May of 2022 when the Bureau’s own quality checks exposed a lopsided pattern. Fourteen states had statistically significant coverage errors, eight with overcounts and six with undercounts. The tilt was unmistakable. Democratic-leaning states were widely overcounted. Republican-leaning states were widely undercounted. Florida’s undercount was roughly three quarters of a million people. Texas’s undercount was on the order of a half million. Minnesota and Rhode Island kept seats they would have lost under an accurate count. Colorado gained a seat it did not deserve. Florida and Texas each missed multiple seats they should have gained. Analysts estimate the net effect was a shift of nine House seats away from Republican-leaning states and toward Democratic-leaning states. The Electoral College moved with them. More than $86 billion in federal formula funds followed.

Defenders say the pandemic caused the problem. That explains some fog, not the direction of the wind. The pattern of overcounts and undercounts tracked politics too cleanly to dismiss as random. A privacy method that was sold as neutral in theory coincided with partisan advantage in practice, and the guardians of the method refused to allow a transparent audit of its settings or its state by state allocation. Abowd, a Democrat donor, insisted that publishing epsilon values and the allocation mechanics would let bad actors reverse engineer the data to identify individuals. That claim collapses under basic scrutiny. If the risk of disclosing individuals is truly so sensitive that even the budget of the noise must be hidden, then differential privacy is the wrong tool for a decennial census that decides representation. The constitutional priority is accuracy of the count for apportionment. Privacy can be protected with targeted suppression or an “undetermined” flag for sensitive attributes. What cannot be justified is injecting falsity into the total number of people who live in each place.

If all this is true, President Trump’s call for a mid-decade census is more than justified. The constitution calls for an enumeration of citizens, not an algorithmic approximation poisoned by partisan pollution. A new count is needed to restore accuracy and remove illegal aliens from the census.

(Hat tip: Director Blue.)

Tags: , , , , , , , , ,

9 Responses to “Was The 2020 Census Algorithmically Polluted?”

  1. Kurt says:

    The Constitution calls for an enumeration of persons, not citizens.

    OTOH, the census should differentiate between citizens and others, for many and good reasons, including apportionment.

    Kurt

  2. Bob G says:

    If I understand this report correctly–the original census data exists; “epsilon” was applied to its analysis and reporting.

    If this understanding is correct, then the original 2020 census data, still existing, can be published straight, in unaltered form. States adversely affected (i.e., fewer congressional districts improperly apportioned) could file suit in federal court for rectification of the congressional seat apportionment, without waiting for another census. Since the federal DOJ–currently directed by the Trump administration–would be in the position of defending against the suit, such a case could be quickly and fairly resolved…

  3. JackWayne says:

    Kurt is correct about “people”. However, the House has the sole power of apportionment. They could easily define People to be whatever they want. Citizens only, citizens and non-citizens. It would be interesting to see how the parties would vote. It’s not a given that all Republicans would define People as citizens only.

  4. Malthus says:

    The people who cast the votes don’t decide an election, the people who count the votes do.–Joseph Stalin.

    The bureau that enumerates the US population does not determine US House representation, the algorithm does. Josef Stolen (AKA Joe Biden)

    When an “unlawful substitution of estimated data for an actual enumeration required by the Constitution” is accepted as the basis for political representation, you effectively usurp the legitimacy of representative government.

    John M. Abowd, the Census Bureau’s Chief Scientist, did not conceive organize and execute this scheme without authorization. Depose him before Congress, expose his handlers and indict them for seditious conspiracy to defraud the US Treasury.

    It may not be possible to convict the conspirators of treason but financial crimes are sufficient grounds for imprisonment. Just ask Al Capone how that works.

  5. Malthus says:

    “States adversely affected (i.e., fewer congressional districts improperly apportioned) could file suit in federal court for rectification of the congressional seat apportionment, without waiting for another census.”

    And John Roberts could dismiss the plaintiffs for “lack of standing”.

  6. Leland says:

    Abowd blocked a necessary census questions and substituted a new variable in its place. Despite claims the variable is consistent, it apparently was not, as it created errors that can’t be cancelled out. Don’t let the noise of data manipulation confuse that the data collected was also garbage.

  7. Northern Redneck says:

    Reading this gave me a strong sense of deja-vu.

    This is the same genre of cleverly-named (as cover) mathematical game-playing that has long underpinned the “science” (sic) behind “global warming” (sic) or “climate change” (sic) or “excessive flatulence” (okay, I made that one up) or whatever it is that they’re calling “it” this week.

    Remember the 2009 leak from the University of East Anglia which showed how the data were being “adjusted” and/or computer simulations were “repackaged” as being actual measured data, mysterious undisclosed “algorithms,” and so forth? (Most folks focused on the e-mails, but just reading the comments in the leaked code was very, very enlightening.) Different gamers, same game.

    Everything about “the left” (sick) is totally fraudulent…

    (And as Yogi Berra famously said, “It’s deja-vu all over again.”)

  8. Tregonsee314 says:

    JackWayne said

    Kurt is correct about “people”. However, the House has the sole power of apportionment.

    The relevant portion of the US Constitution is
    this Article I, Section 2, Clause 3, in part:

    Representatives and direct Taxes shall be apportioned among the several States which may be included within this Union, according to their respective Numbers, which shall be determined by adding to the whole Number of free Persons, including those bound to Service for a Term of Years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States, and within every subsequent Term of ten Years, in such Manner as they shall by Law direct.

    The Three Fifths clause was made moot by the 13th Amendment. At first glance one might argue that it takes a constitutional amendment to change this clause BUT this section “, and excluding Indians not taxed, ” was changed by the Indian Citizenship Act of 1924 a simple Law passed by Congress and signed by the president (then Calvin Coolidge) From this I derive these points

    1) Any change in the meaning of “Number of free Persons” to mean citizens must come from the legislative branch. Given the Indian Citizenship Act a simple law might suffice, but a detailed constitutional amendment would probably be less prone to judicial hijinks

    2) Abowd’s Epsilon and scrambling seems utterly adverse to the intent of enumeration and the “respective Numbers” language. There is NO need to obfuscate the data, it is not released to the public until 70 years after the census (the 1950 data having been released in 2020). There is nothing in the Constitution that permits or requires this. On top of that Census data is incredibly valuable in a historic sense. Having amused myself with genealogy the censuses of the late 19th and early 20th century are INCREDIBLY useful

    3) Presuming the raw data still exists States such as Texas and Florida should immediately move to have the 2020 allocation recalculated based on the raw data. There is clear standing as they have suffered damage. In addition, as this is an interstate issue, it is clearly in the purview of the Supreme Court; the inferior Article 2 courts have no part to play in this(thus avoiding the current trouble makers in the Appeals and District courts).

    4) If the raw data has been tampered with such that it is no longer reflective of a simple enumeration Congress should act to have the 2030 enumeration done early. The constitution implies that they may do this “and within every subsequent Term of ten Years, in such Manner as they shall by Law direct.” It requires within every term of ten years not near or at exactly 10 years. The executive may NOT initiate this as the apportionment and enumeration are within Congress’ powers. The Executive may, however specify the details of the enumeration (such as the questions). Any attempts by Census employees to impede this count should be viewed as clear insubordination and grounds for dismissal for cause. In particular there should be NO manipulation of the data nor should estimates of homeless or transient populations be permitted only actual counts by a census taker on a date certain.

  9. 10x25mm says:

    In October 2019, the Census Bureau released 2010 Census DAS (Disclosure Avoidance System, Census speak for Differential Privacy) demonstration data products which even the Census Bureau admitted contained notable distortions and errors that would have produced substantial redistricting errors.

    Supposedly their Data Stewardship and Executive Policy Committee (DSEP) reduced these distortions in their June 2021 final DAS release. The final approved DAS production settings were a total privacy-loss budget for the redistricting data product of ε=19.61, which includes ε=17.14 for the persons file and ε=2.47 for the housing unit data. The Census Bureau then rereleased 2010 Census DAS demonstration data products which the Bureau claimed reduced the redistricting distortions found in the 2019 demonstration release.

    The Census Bureau is lying. Their 2021 approved privacy loss budget still compromises redistricting, particularly redistricting of the more numerous state legislative districts.

Leave a Reply