The Internet Was Weeks Away From Disaster and No One Knew
In late March 2024 the internet came within weeks of catastrophe, and almost nobody outside a small circle of engineers noticed. A patient attacker operating under the name Jia Tan spent more than two years grooming his way into a tiny, beloved data compression tool called XZ, then planted a backdoor so surgically engineered that it would have handed him a master key to OpenSSH, the lock on essentially every Linux server on Earth.
Published Feb 25, 202653:00 video36 min readAdded Jun 14, 2026Open on YouTube →
At a glance
In late March 2024 the internet came within weeks of catastrophe, and almost nobody outside a small circle of engineers noticed. A patient attacker operating under the name Jia Tan spent more than two years grooming his way into a tiny, beloved data compression tool called XZ, then planted a backdoor so surgically engineered that it would have handed him a master key to OpenSSH, the lock on essentially every Linux server on Earth. The exploit was already shipping in pre release versions of Fedora, Debian testing, and Ubuntu, and it was racing toward Red Hat Enterprise Linux, the operating system that runs governments, banks, and hospitals.
It was caught by accident. Andres Freund, a German engineer at Microsoft who works on the PostgreSQL database and is not a security researcher, noticed his SSH logins were running about half a second slower than they should. He could not let it go. Pulling that thread unraveled one of the most sophisticated software supply chain attacks ever attempted, very likely the work of a nation state.
This Veritasium film, written and co presented by Henry van Dyck alongside Derek Muller, rebuilds the whole story from first principles: why Linux ended up running the world, how secure logins work, how compression works, exactly how the backdoor threaded itself through the loader and into the authentication path, and the uncomfortable lesson that the real vulnerability was never the code. It was the people. Below is the entire video, rebuilt in order, with every name, date, and mechanism intact.
The jammed printer that started everything
The story does not open with a hacker. It opens with a Xerox 9700, one of the first commercial laser printers, freshly installed in the MIT AI lab and constantly jamming. Richard Stallman, a researcher there, had a fix in mind. Years earlier he had solved an identical annoyance by writing a small program that pinged you whenever a printer jammed, so you never again walked over to find an hour of nothing. He wanted to do the same here. As he tells it, "You'd wait an hour figuring, I know it's gonna be jammed, I'll wait an hour and go collect my printout, and then you'd see that it'd been jammed the whole time. Frustration up the wazoo."
The catch was that Xerox had not given the lab the printer's source code, and without it Stallman could not write his patch. So he tracked down the original developer and asked for a copy. The answer changed his life: "No, I promised not to give you a copy." Stallman was stunned, then angry, and turned on his heel and walked out, maybe slamming the door. Thinking about it later, he realized he was not looking at one isolated jerk but at a social phenomenon that was quietly reshaping computing.
That phenomenon had a history. In the late 1960s, engineers at AT&T's Bell Labs invented an operating system called Unix and shared it freely across universities and research labs. It was a time of openness. By the 1980s AT&T reversed course, pursuing Unix clone developers for copyright infringement, even suing the University of California at Berkeley. Companies started making employees sign non disclosure agreements barring them from ever sharing code. As Stallman puts it, "This was my first encounter with a non-disclosure agreement, and I was the victim. And the lesson it taught me was that non-disclosure agreements have victims. They're not innocent, they're not harmless."
He could have adapted. He could have had fun coding and made money. But, he says, at the end he would have had to look back and say "I have spent my life building walls to divide people," and he would have been ashamed. So he chose another path.
Stallman builds the free world: GNU, the GPL, and four freedoms
In 1985 Stallman quit MIT and founded the Free Software Foundation to promote four basic freedoms: the freedom to run software for any purpose, to study it, to change it, and to share it. To guarantee them he wrote a legal license that any developer could attach to their code, the General Public License, or GPL. And to stick it to AT&T, he began rebuilding a Unix style operating system entirely from scratch so AT&T could not sue. He named it GNU, a recursive acronym for "GNU is Not Unix."
Recreating Unix meant rebuilding three layers: the utilities (the everyday tools and commands), the shell (the terminal you type into), and the kernel (the core that talks to the hardware and manages memory). Over seven years the GNU Project rebuilt almost all of it from nothing, producing the GCC compiler, the Bash shell, and a host of core utilities. One piece was always missing: the kernel.
That gap closed in the fall of 1991. Stallman traveled to the University of Helsinki to give a talk promoting the project, and sitting in the audience was a computer science student already writing his own kernel from scratch. His version was not free, but after hearing Stallman, he adopted the GPL. He first wanted to call it Free Unix, or Freax, until a friend told him it sounded terrible, so it was renamed after the student himself: Linus Torvalds. Linus plus Unix gave Linux. Bolted onto the GNU utilities and shell, that kernel became a complete operating system. (Strictly, "Linux" is only the kernel, which is why some insist on calling the whole thing GNU/Linux.)
Because the code was open and free, a new development model took hold. Anyone could inspect it, improve it, fix flaws, and push the whole ecosystem forward. Software cleaved into two ideologies: proprietary closed source systems controlled by companies, and open source projects where the code was free. As one engineer in the film puts it, free here means two things, free as in no cost and free as in you can change it however you want, and the second matters far more. People will happily pay for technology, but they hate filing a support ticket with a giant company and waiting, when an engineer is itching to just fix it themselves.
Why Linux quietly ate the world
Because developers could grab the freely available base and bolt on whatever their specific device needed, without reinventing the wheel, Linux spread into everything. The film leans into the joke: not just a Mac and a PC, but a Linux too, with "an estimated 30 million Linux users out there," standing in the corner the whole time. Your robot vacuum is Linux. Your camera is Linux. Most TVs and most electronics are Linux.
And it runs the serious machines. As one expert notes, Linux is used "in anything of high-security need," partly because building something like a weapon system in secret means you do not want to involve an outside tech company or one extra person more than necessary. Every single one of the world's top 500 supercomputers runs Linux. It is in the Pentagon and on US nuclear submarines. Banks, manufacturers, hospitals, governments, and defense organizations all run Linux servers. Android, with over 3 billion devices, is built on Linux, and Linux powers the majority of internet servers. Windows and macOS are familiar, but they are dwarfed by Linux. No single company could have imagined every use, but because anyone can tweak Linux to fit, it now covers them all.
All of it rests on one assumption: that the code is secure. The faith behind that assumption has a name, Linus's Law, "given enough eyeballs, all bugs are shallow." With so many people reading the code, bugs, accidental or malicious, should not stay hidden for long.
The crack in the foundation: one volunteer, no pay
There is a fatal flaw in that comforting idea. The open source movement is not one big well staffed project. It is an ecosystem of thousands of small tools and libraries, each doing one job: networking, security, compression. Many begin because a single person wanted to fix one problem and built the tool themselves, unpaid, on nights and weekends. If it is useful, one project adopts it, then another, until millions of machines depend on one person's passion project. The film invokes the famous xkcd "Dependency" comic: the entire structure of modern digital infrastructure balanced on a tiny block "some random person in Nebraska has been thanklessly maintaining since 2003." What happens when that block is compromised?
In this story the maintainer is not from Nebraska. Lasse Collin is from Finland, and since 2005 he had been building a small compression tool called XZ. It compresses so well that it ended up in almost every major Linux distribution. For two decades nearly all the work of keeping it compatible with ever changing hardware fell on Lasse alone. He was never paid, and for a long time that was fine.
Then came the pressure. The film reads the actual mailing list messages aloud. Strangers needling him: "Over one month and no closer to being merged. Not a surprise." Then harder: "Progress will not happen until there is a new maintainer. Submitting patches here has no purpose these days. The current maintainer lost interest or doesn't care to maintain anymore." Lasse's reply is heartbreaking: "I haven't lost interest, but my ability to care has been fairly limited, mostly due to long-term mental health issues, but also due to some other things. It's also good to keep in mind that this is an unpaid hobby project." The reply that comes back is colder still: "The community desires more. You ignore the many patches bit rotting away on this mailing list. Right now, you choke your repo."
Lasse is burning out. And exactly then a kind, capable helper appears, signing off as a "helper elf": Jia Tan. For months Jia has been quietly taking load off Lasse, responsive and helpful, and now he offers to step up as maintainer. Lasse, exhausted after twenty years, is told "Jia Tan may have a bigger role in the project in the future." It sounds too good to be true. It was. Those pressuring accounts and the helpful new contributor were two faces of the same operation, a social engineering pincer designed to manufacture an opening and then fill it.
2005Lasse Collin starts XZ. A Finnish volunteer begins maintaining a small lossless compression tool, unpaid, on his own time.
2009XZ released. Over the next decade and a half it becomes the default choice whenever a Linux project needs strong compression, eventually a dependency in the chain behind OpenSSH.
2021"Jia Tan" appears. A new contributor begins submitting harmless, competent patches and building a reputation for being helpful and responsive.
2022The pressure campaign. Sock puppet accounts with free email addresses and almost no other footprint badger Lasse to hand over the project, while Jia plays the supportive "helper elf."
2023Jia gains trust, then control. Lasse, worn down by burnout and mental health struggles, signals that Jia may take a bigger role. Jia gradually becomes a co maintainer with commit access.
Feb 2024Jia courts the packagers. He emails Rich at Red Hat about exciting new XZ features and wins him over almost immediately, racing to make a March or April deadline tied to RHEL 10.
Mar 2024The backdoor ships. Compromised XZ lands in pre release Fedora, Debian testing, and Ubuntu, just as Andres Freund notices a strange half second slowdown in his SSH connections.
29 Mar 2024Exposed. Andres emails the Debian security team and posts a detailed writeup to a public security list. Red Hat convenes an emergency meeting and rolls Fedora back. Jia Tan vanishes.
Figure 1. The patience of the attack is the point. This was not a smash and grab but a multi year campaign of trust building and psychological pressure, the human exploit that made the technical exploit possible. The actual code change was the last and smallest step.
Secure logins, explained: how SSH actually works
To understand the prize, you have to understand the lock. The film rewinds to 1995 at the Helsinki University of Technology, where a hacker captured thousands of usernames and passwords flowing across the campus network in a sniffing attack. The flaw was obvious in hindsight: logins were sent in plain text, so anyone intercepting them could just read them. Tatu Ylonen, a researcher there, made it his mission to fix it. "Password sniffing was perhaps the most serious security issue on the internet back then," he says.
His solution had to do two things. First, establish a secure connection: if two machines could agree on a shared secret to scramble their data, an eavesdropper would only hear gibberish. You could agree on a secret in person, but on the open internet that is impractical. You have to agree on a shared secret with someone you have never met, while an attacker listens to every word. The film demonstrates the trick with a jar of paint, the classic Diffie Hellman key exchange. Both sides agree on a public color (say red). Each adds a private color in secret. They swap the mixtures, then each stirs in their own private color again. Because mixing is easy but un mixing is practically impossible, both ends arrive at the identical final shade, an olive green, without ever transmitting their private colors. In the real exchange, paint becomes huge numbers, and reversing the mix becomes the discrete logarithm problem, computationally hopeless to undo.
That solves eavesdropping but not impersonation. A man in the middle, Casper in the film, can sit between the two parties, run a legitimate key exchange with each, and relay and tamper with everything while both ends think the connection is private. So the second problem is authentication: proving the other party is who they claim to be. The film walks through RSA. One party picks two enormous secret prime numbers and multiplies them into a bigger public number. Anyone can scramble a message using that public number, but only the holder of the two prime factors can unscramble it, because factoring that product is practically impossible. If you trust that a given public key really belongs to the right person, anything you encrypt to it can be read only by them, and the man in the middle is foiled.
Tatu combined both steps, securing the channel and authenticating the user, into a single protocol for remote logins. It gave the familiar plain text shell, a terminal where you type commands, but now over an encrypted connection. He called it Secure Shell, or SSH. It was instantly useful: most Linux servers have no keyboard or monitor, so you need to log in remotely. As Linux spread, SSH spread with it. As one expert says, "SSH is literally the maintenance backbone of the entire internet." The dominant open source implementation is OpenSSH, and because it is so important it is one of the most heavily scrutinized projects in existence. Bypassing its authentication, in the film's words, "is like having the master key to the hotel. It lets you into every room."
That is why Jia wanted in. But attacking OpenSSH head on is nearly impossible. The open source model offered a side door: not only are operating systems stitched together from many programs, each program is itself stitched together from other programs, its dependencies. OpenSSH is scrutinized; its dependencies are not equally so. And Lasse Collin's XZ was linked, through a chain of dependencies, right into OpenSSH.
How compression works: Huffman, LZ77, and LZMA
Before the attack, the film honors the tool itself, because XZ is genuinely elegant. Lasse's goal was better lossless compression on Linux: shrink any data (code, an image, text) and get back exactly what you put in.
It builds the idea in layers, using the lyrics of Rick Astley's "Never Gonna Give You Up" as the test text. Naively, every character gets a fixed 8 bit code. But some characters appear far more often than others (N shows up 430 times, J just once), so why give them all the same length? Huffman coding assigns short codes to frequent symbols and long codes to rare ones. You count frequencies, repeatedly merge the two least frequent symbols into a combined node, and reinsert until you have a single Huffman tree. To read a symbol's code you walk the tree: right is 1, left is 0, so R might be 1001. Common symbols sit near the top with short codes; rare ones sit deep with long ones.
Huffman has a blind spot: it never notices that whole chunks repeat, like "NEVER " over and over. So the next idea looks at repeating chunks instead of single symbols, keeping a rolling dictionary of what was just seen. When a chunk reappears, you do not rewrite it; you store a short code of two numbers, how far back to look and how many characters to copy. On decompression you read along, and at each pointer you jump back, copy, and paste. Abraham Lempel and Jacob Ziv published this in 1977, hence LZ77. Feed the resulting stream of literals and pointers, which themselves have frequencies, into a second Huffman pass and you get DEFLATE, the algorithm behind the ubiquitous .zip file. In the demo it shrinks the file by about 85 percent.
You can do better. Real data is not random chunks; after "Never gonna" you might get "give you up," "let you down," or "run around and desert you," each with its own probability. A Markov chain captures those probabilities, letting the encoder spend few bits on likely next chunks and more on unlikely ones. Pair that with a much larger search window, so it can point much further back in memory, and you get the Lempel Ziv Markov chain Algorithm, LZMA, developed by Igor Pavlov around 1998. It often shrinks files to about 70 percent of a typical .zip. Lasse adapted LZMA to Linux and named it XZ, not as an acronym but simply because it sounded cool. As one expert says, "I think XZ is a wonderful project." Released in 2009, it spread everywhere over the next fifteen years, because you compress a file once and everyone who downloads it forever gets the smaller version. And so it became a dependency of OpenSSH.
The three step plan: how the .xz backdoor worked
With trust earned and a path into OpenSSH identified, Jia made his move under a hard deadline. Red Hat ships two flavors of Linux: Fedora, free and public, and Red Hat Enterprise Linux (RHEL), a paid, stable, secure subscription used on the most important machines, governments and hospitals among them. Jia wanted his code in RHEL, but RHEL only gets a major release roughly every three years, with RHEL 10 due around March or April 2024. He had to move fast. The attack had three steps.
Step one: the Trojan horse. XZ's code lives on GitHub, which tracks every edit using Git, itself written by Linus Torvalds. Jia began with small, innocuous changes, including quietly making himself the primary contact for bug reports and tweaking tools he would need later. He could not paste the malicious payload into the source, that would be too obvious. Compression software, though, is full of binary test blobs, lumps of binary data used to check that compression and decompression still work. Nobody reads them; they are assumed to be garbage. So Jia hid his payload inside a test file, a Trojan horse that looked harmless. Then, in the project's build script, he slipped in a small, easy to miss change, buried among auto generated code, that quietly unpacked the payload from the test blob and stitched it into the XZ library at build time. The poison never appeared as readable source on GitHub.
Step two: Goldilocks. Now the payload had to hijack the exact right moment in SSH's login flow, the RSA authentication step. The goal: make Jia's code run first every time SSH checks a key, look for a secret master key only he holds, let him in if present, and otherwise call the real code so no one notices. But Jia could not simply rewrite RSA_public_decrypt, the function that verifies the client. That function does not even live in OpenSSH; it comes from a shared crypto library. Modern systems use shared libraries so ten programs needing the same code share one copy instead of bundling ten. When a program starts, the dynamic linker fills in a table of addresses pointing to the functions it needs, the Global Offset Table, or GOT. To call a shared function, the program looks it up in the GOT and jumps there. So to hijack authentication, Jia needed to overwrite the GOT entry for RSA_public_decrypt.
To get a foothold he abused an IFUNC resolver. IFUNCs exist so a library can keep several versions of a function, say one tuned for Intel and one for AMD, and pick the right one at startup based on the hardware. Crucially, the resolver runs your own code very early in program startup. But there was a catch: a library can only define IFUNC resolvers for its own functions, and RSA_public_decrypt does not belong to XZ. Worse, the startup timing fought him from both sides. Too early, and the loader would later overwrite his change with the real address. Too late, and the system marks the GOT read only to prevent exactly this kind of tampering, freezing the entry. He had to strike in a tiny window after the real RSA_public_decrypt address is written in but before the table is locked. That window is the Goldilocks zone.
To hit it precisely, he weaponized a debugging feature. Because shared library linking is bug prone, Linux offers a dynamic linker audit hook that lets you run code every time the linker writes a symbol's address into the GOT, normally for profiling. There are no real guardrails; the hook can run arbitrary code. Jia used his IFUNC resolver to install the audit hook early, so that when the linker wrote in the genuine RSA_public_decrypt address, the hook fired and swapped in his payload at exactly the right instant. One last wrinkle: audit hooks are normally configured by the system, not by a library like XZ, so the variable he needed was hidden from him. From inside the IFUNC he scanned a small region of binary code looking for the hook, and because it was raw bytes he wrote a tiny disassembler to turn them into readable instructions, located the hook in memory, and planted his code. From then on, when RSA_public_decrypt was called legitimately, it triggered the payload, and he was in.
Step three: the cat burglar. Being in is not enough; you must get out cleanly and stay invisible. The hidden master key is not a simple password but a miniature cryptographic exchange in its own right: the backdoor first checks for a shared secret, then authenticates the user, and only if both pass does the payload run. As the film puts it, it is "a miniature version of the encryption from SSH inside of SSH," except SSH uses crypto to keep attackers out while this backdoor uses crypto to ensure only the attacker can get in. To avoid detection, the payload wipes evidence of itself from SSH logging, and Jia laid in numerous safety checks so it only fires on the right kind of connection and never crashes in a way that draws attention. Meticulous and cautious by design, it would slip through only where it could run invisibly. With all three steps in place, he could own the machine undetected.
Figure 2. The full mechanism in one picture. A payload hidden in a never read test blob is unpacked by the build script into the compiled liblzma library, which sshd loads as a dependency. An IFUNC resolver installs a dynamic linker audit hook that, at exactly the right instant, rewrites the GOT entry for RSA_public_decrypt to point at the attacker's code. From then on, every login attempt runs the backdoor first, granting root to anyone holding the secret master key.
Almost caught by a memory leak: the Valgrind scare
The plan nearly tripped over itself. With the deadline closing, an open source developer requested removing the very dependency that links XZ to OpenSSH, which would have wrecked everything. Jia grew frantic, pushing his compromised XZ everywhere he could: into an experimental Debian build, filing to add it to Ubuntu, and pressing Rich at Red Hat, over weeks of escalating messages, to pull the updated XZ into Fedora. Rich, who loves keen upstream contributors ("That kind of makes my day, it's my happy place"), eventually added the updated XZ to a Fedora pre release. Jia had succeeded. Except there was a bug.
Low level code like the backdoor does not get automatic memory management; if a function grabs memory it must hand it back, and if it does not, the program keeps swelling, a memory leak. To catch this, developers run Valgrind, which executes the program slowly while watching every memory operation. Valgrind started "raising hell" on Jia's code. Red Hat put XZ version 5.6.0 into Fedora 40 and immediately got a bug report: the backdoor was generating invalid write errors, because its logic was written by hand, bypassing the compiler's safety checks, and accidentally wrote outside the memory stack. The real fault was inside the malicious test file, but Jia could not simply fix that without exposing the backdoor. So he invented a cover story: he claimed the random data used to generate the original test files was not reproducible and he was replacing it, sneaking the memory fix into the new blob. Rich's colleague calls it "a very convincing and plausible explanation for why this test blob has to be updated. But of course, it's not the real reason."
Then, to hide the fact that the bug had vanished suspiciously, Jia padded the surrounding IFUNC code with comments and cosmetic changes that did nothing functional but looked like a plausible fix. Rich's reaction captures the whole tragedy: "I know that this is the evil hacker Jia Tan, but I'm like, ooh, that's clever." And from a packager's seat, none of it was suspicious. New software has bugs; software is full of bugs; it is not a packager's job to fix every upstream bug, especially when the upstream author who wrote it can fix it faster. Rich sent the bug to Jia and got a fix back a day later. "From my point of view, it's problem solved. It worked, system worked, right? I made the right call."
Henry hacks Derek: the backdoor on live TV
To prove it was real, the team downloaded Jia Tan's publicly available compromised XZ from Fedora and swapped in their own secret key instead of Jia's, taking control through the genuine backdoor. Their target was the veritasium.com website, though to avoid wrecking real traffic they cloned the site to a near identical URL. Derek, watching, is visibly nervous: "Man, when you guys do these things, I just, I start to get more and more scared now. I want it to work for the video, but I also don't want it to work 'cause I don't wanna screw stuff up."
Henry runs a script that opens a port (nc) on the server, then connects from a second terminal, copies files over, and ends with root access, "it thinks that we own the thing." The hacked page becomes "Henrytasium," topped with "Videos Derek would never approve of," a gallery of rejected pitches: surviving seven days underwater at -1,000 feet, why it is almost impossible to shoot 4,000 meters, a CIA torture exposé, and climbing Everest on xenon gas. Derek, half amused and half alarmed, declares the whole video is a ploy to green light Henry's projects, then offers to actually make whichever idea the comments most upvote. "Look, I'm not pleased, I would like you to change it back. It doesn't seem like this should be possible on a Linux server."
The deliberately loud demo makes the point that the real attack would be silent. As they note, you would not deface a site so everyone notices; you would change it subtly to skim data, harvest credit card details, or reroute payments. With root you can copy, change, or delete anything: documents, crypto tokens, secret communications (and since our communication networks also run Linux, those streams are yours too), or you could encrypt everything and demand ransom. The possibilities, Henry says, "really are endless." After two and a half years, Jia had free rein on any machine installing the new Fedora pre release, plus Debian testing and Ubuntu pre release environments, with RHEL 10 looming. All that was left was to wait for release.
The half second that saved the internet
Then Andres Freund ruined everything for Jia. A German programmer at Microsoft working on the open source database PostgreSQL, Andres is not a security researcher and not a hacker. In March 2024 he was testing the unstable Debian release to make sure Postgres ran smoothly, and while checking server connection times he noticed something odd: a slowdown. Not much, at worst about half a second, but enough to nag at him. The film reproduced it on their own version of the hack and found the same thing, consistent slowdowns of roughly 400 to 500 milliseconds. Having already seen the earlier XZ and Valgrind weirdness, Andres dug deeper. He examined recent additions to OpenSSH, traced the delay to an XZ update, and found binary test files that were never actually used in any test. He could not stop thinking about it: "I remember sitting in a bunch of meetings and not really being able to concentrate because it feels like, I need to continue looking into this."
Eventually he saw it: not a bug, a backdoor. And it was meticulous, hunting through memory for the audit hook, decoding raw bytes, wrapping everything in custom encryption and safety checks so it triggered only on the right connection, even garbling its own strings to dodge detection. Ironically, that very paranoia is what exposed it. "If they had done less obfuscation, I probably would not have noticed that anything was wrong." The extra machinery cost time, and that time was the half second Andres felt.
Figure 3. The tell. The backdoor's elaborate obfuscation, scanning memory, decoding bytes, running its own crypto, added hundreds of milliseconds to every SSH login, a slowdown of roughly 400 to 500 ms over the normal baseline. That tiny, persistent lag is what a meticulous database engineer noticed and could not let go. Had the attacker been lazier, nobody would have felt a thing.
Exposure, panic, and a quiet hero
Because XZ's listed security contact was Jia Tan himself, Andres could not report through normal channels. Instead he emailed the Debian security team directly and posted a detailed writeup to a public security mailing list (oss-security). Then all hell broke loose. Rich was pulled into an emergency Red Hat meeting on a Friday evening, obviously not routine because the head of security was there, and learned the community had found a backdoor in XZ. "Immediately I'm like, WTF? How did this happen?" Red Hat rolled Fedora back and told users to revert, while the whole open source world began dissecting the project.
Andres became, deservedly, a hero, even getting a shout out from the CEO of Microsoft. As one expert says, "what are the chances that someone who isn't looking for a security bug spends days investigating this? Big kudos to the researcher, saved us all from possibly a doomsday on the internet." Rich is gracious about his own role: "Andres did a brilliant job because he did what I should have done, actually," chasing the bug "like a crazy hound sniffing around." Yet when the story broke, mainstream coverage was strangely muted, a fact the experts in the film still find baffling given that millions of systems were at stake and the damage could have ranged from spying to ransom to taking down entire countries.
Who is Jia Tan?
The most chilling section is the one with no answer. Rich believes the person he corresponded with was one individual, but that a group stood behind him, working for at least the two and a half years we know about. The accounts that pressured Lasse share telltale traits: free email addresses, almost no footprint outside the XZ threads, the signature of sock puppet identities manufactured to apply pressure as one stage of a long social engineering campaign.
Who would spend perhaps a million dollars and two and a half years to forge a master key to every hotel room on the internet? Not a criminal gang, the experts argue, because no criminal group has the patience to wait that long with no near term return. The patience points to a nation state. The clues are deliberately contradictory. The alias Jia Tan reads as an Asian name, and commits are timestamped in UTC+8, Beijing time, pointing at China, which is exactly why analysts suspect it is not China, because every other part of the operation was so meticulous that such an obvious tell feels staged. The attacker also worked on Chinese New Year but not on Christmas, and nine changes fall in UTC+2, a zone covering Israel and parts of Western Russia, which is why some experts floated APT29, a Russian state backed group known as Cozy Bear. But, as the film admits, "of course we don't know who it is, and we likely will never know." The moment the exploit went public, Jia Tan vanished and was never heard from again. As one expert puts it, in a sense it does not matter whether the source was Russian, Chinese, or Iranian; the point is to defend against such backdoors regardless of origin.
The real lesson: it was never the code
The film ends on the uncomfortable part. This near miss, several experts say, is "the canary in the coal mine." As attackers grow more sophisticated and make fewer mistakes, the gloves are off, and the Linux community is not fully ready. In the aftermath, the open source world combed countless small projects for similar campaigns and found almost nothing, which is precisely what frightens Andres: "I'm worried that we didn't find other backdoors. The incentives are just too clear," with state sponsored actors preparing for the next cyber escalation. If so many are incentivized to plant backdoors, where are they all?
Some called the episode proof of a fundamental flaw in open source. The counterargument in the film is sharp. Closed source would be no better; who is to say state spies are not already employed as engineers at large companies inserting exactly these backdoors, except there no volunteer runs free testing to catch it by chance. It took a multi year social engineering campaign, layers of misdirection, and code built to survive constant scrutiny to pull this off in public. Compare that to a closed source hack, where sometimes all it takes is a court order, or a company quietly burying a breach. As one former open source researcher (formerly at the Japanese telecom giant NTT) puts it, it is only because XZ was open that it was picked apart, analyzed, and turned into a conversation about security at all. And that conversation lands on the real vulnerability: "It's not the code, it's the people. The system has not supported them enough."
Question
Open source (XZ / Linux)
Closed source
How a backdoor gets in
Years of social engineering, sock puppet pressure, code built to survive scrutiny hard
A court order, an insider hire, or a quietly buried breach often easy
Who can inspect the code
Anyone, anywhere, anytime fully public
Only the company and whoever it permits opaque
Chance an outsider catches it
A passing engineer noticed a 500 ms lag and dug in it happened
No community tester exists to stumble onto it unlikely
The real weak point
One unpaid, burned out volunteer carrying a critical tool people, not code
Corporate incentives, secrecy, and legal pressure people, not code
Aftermath
Public post mortem, global scrutiny, a security conversation transparent
Often no disclosure at all hidden
The final word belongs to the human cost. As Rich says of Lasse Collin: "I feel for Lasse that he's given this beautiful gift to the whole world and what has humanity done back to him? We've poisoned his gift." And then, implicitly, blamed him for not maintaining it free, forever, for everyone. "But why are we demanding that Lasse do anything when he's not being paid for this stuff? That's, in my opinion, quite unfair." Even after all of it, on a Saturday evening, Lasse worked with the Red Hat team on a workaround for a bug in RHEL 9 he had added to XZ. He could have told them to get lost. He didn't. "What a brilliant guy."
Key takeaways
The world runs on Linux, and Linux runs on a sprawl of small dependencies, many maintained by single, unpaid volunteers. That structure is its great strength and its single greatest weakness.
The attack on XZ was overwhelmingly a human exploit. More than two years of trust building and a coordinated sock puppet pressure campaign created the opening; the malicious code was the smallest, last piece.
The payload never appeared in readable source. It hid in a binary test blob and was unpacked at build time, then used an IFUNC resolver plus a dynamic linker audit hook to hijack the GOT entry for RSA_public_decrypt in the narrow Goldilocks window before the table was locked.
The backdoor was a master key to OpenSSH, the lock on nearly every server on the internet. Compromise ranged from silent data theft and ransom to, in the experts' words, taking down entire countries.
It was caught by luck and obsession, not by process. Andres Freund, not even a security researcher, noticed SSH logins were about half a second slow, and the attacker's own obfuscation is what created that tell.
Attribution points everywhere and nowhere: deliberately contradictory time zones and work patterns, very likely a nation state, identity unknown, the actor gone the moment it went public.
The deepest lesson is not technical. Critical open source infrastructure depends on people the system does not pay, protect, or support, and that is the vulnerability worth fixing.
Chapters
Timestamps are clickable. Click one and the player jumps there and keeps playing while you read.
0:00 The Free Software Foundation
5:03 Why is Linux so popular?
9:57 The XZ Weakness
12:07 End To End Encryption - SSH
18:40 How To Compress Data
23:47 How The .XZ Hack Worked
34:24 A Bug In Jia's Code
38:27 Henry Hacks Derek
43:16 The Back Door Is Exposed
47:16 Who is Jia Tan?
50:33 Open Vs Closed Source
Notable quotes
We were weeks away from millions of internet servers being accessible to whoever crafted the backdoor. Anything from spying, to ransom, to taking down entire countries, you could have done it with this backdoor.
Derek Muller, 0:42
This was my first encounter with a non-disclosure agreement, and I was the victim. And the lesson it taught me was that non-disclosure agreements have victims. They're not innocent, they're not harmless.
Richard Stallman, 2:13
With enough eyeballs, all bugs are shallow.
Derek Muller, on Linus's Law, 8:35
I haven't lost interest, but my ability to care has been fairly limited, mostly due to long-term mental health issues. It's also good to keep in mind that this is an unpaid hobby project.
Lasse Collin, read aloud, 10:18
Having a way to bypass the authentication in secure shell is like having the master key to the hotel. It lets you into every room.
Derek Muller, 10:46
If they had done less obfuscation, I probably would not have noticed that anything was wrong.
Andres Freund, 46:38
I think it has to be a nation state actor, here.
Derek Muller, 48:48
It's not the code, it's the people. The system has not supported them enough.
Derek Muller, 53:30
I feel for Lasse that he's given this beautiful gift to the whole world, and what has humanity done back to him? We've poisoned his gift.
Derek Muller, 54:08
Resources mentioned
Veritasium, Derek Muller's channel, written and co presented here by Henry van Dyck.
The most important infrastructure on Earth, the software running banks, hospitals, governments, submarines, and most of the internet, rests in places on a single exhausted volunteer working for free. An attacker did not break the cryptography or outsmart the code review. They befriended a burned out maintainer, wore him down with fake pressure, and were handed the keys. The internet was saved not by a system but by one stubborn engineer who could not ignore a half second of lag. The fix is not more eyeballs on the code. It is supporting the people who hold the whole thing together.
Full transcript
[Derek] In 2021, a hacker uncovered a fatal weakness in the world's most important operating system.
- What would you do with a key that gets you into any server on the internet?
- Is this live to the public right now?
- Yeah, it's live on the server.
- Look, I'm not pleased. I would like you to change it back.
[Narrator] At the time, just about everyone believed that hacking this system was impossible, but they were wrong.
- Well, I can tell you how many systems would have been compromised, which would have been millions. Actually, I'm still surprised the mainstream news outlets haven't really covered this very much.
- How close did we come?
- We were weeks away from millions of internet servers being accessible to whoever crafted the backdoor. Anything from spying, to ransom, to taking down entire countries, you could have done it with this backdoor.
This hacker had realized the entire operating system rested on a single part, maintained by a single person, and that by compromising that one part, they could infect almost any server on the internet. So, how could we ever let ourselves get this vulnerable? Well, the story begins with a jammed printer.
[Narrator] The AI lab was buzzing. They had just installed the Xerox 9700. It was one of the first ever commercial laser printers. It was a pretty big deal. The only problem was it kept jamming.
[Stallman] You'd wait an hour figuring, I know it's gonna be jammed, I'll wait an hour and go collect my printout, and then you'd see that it'd been jammed the whole time. Frustration up the wazoo.
Richard Stallman, a researcher at the lab, thought that he had a solution. Years earlier, he had solved a similar problem by coding a simple program that sent an alert whenever there was a jam. Now, it didn't fix the problem mechanically, but it did make sure that a jam wouldn't go unnoticed. He thought he could do a similar thing now. The only problem was that Xerox hadn't provided them the source code for the printer, and without it, Stallman couldn't write his code. So he tracked down the original developer.
[Stallman] And I said, "Hi, I'm from MIT. Could I have a copy of the printer source code?" And he said, "No, I promised not to give you a copy." I was stunned. I was angry. All I could think of was to turn around on my heel and walk out of his room. Maybe I slammed the door. And I thought about it later on because I realized that I was seeing not just an isolated jerk but a social phenomenon that was important and affected a lot of people.
[Henry] This social phenomenon had slowly invaded the world of computer research. In the late 60s, engineers at AT&T's Bell Labs invented an operating system called Unix, which they shared widely across universities and research labs. This was a time of freedom. But by the 80s, AT&T started going after Unix clone developers for copyright infringement. Later, they even sued the University of California at Berkeley. The tech landscape had shifted. They wanted to close off software development. Companies were now making their employees sign non-disclosure agreements, prohibiting them from ever sharing their code with other programmers.
[Stallman] See, this was my first encounter with a non-disclosure agreement, and I was the victim. And the lesson it taught me was that non-disclosure agreements have victims. They're not innocent, they're not harmless.
[Henry] Stallman wondered, maybe he could adapt to this new world.
[Stallman] But I realized that that way I could have fun coding and I could make money. But at the end, I'd have to look back at my career and say, "I have spent my life building walls to divide people." and I would've been ashamed of my life.
So Stallman chose a different path. He quit his job at MIT and in 1985 established the Free Software Foundation, and it worked to promote four basic freedoms. You should be free to run software for any purpose, free to study it, free to change it, and free to share it. Now, to ensure those freedoms, he created a legal license that developers could attach to their code called the General Public License. And to stick it to AT&T, he started to work on a project based on Unix but built from the ground up, so AT&T couldn't sue. He called the project GNU, a recursive acronym for GNU is Not Unix.
Now, to replicate a Unix system, the GNU Project had to recreate three layers of functionality. They needed the utilities, which were the everyday tools and commands, the shell, which is the terminal that people use to interact with the machine, and finally, the kernel, which is the core that talks to the hardware and manages memory. Now, over the next seven years, the GNU Project made much of that from scratch. They created the GCC code compiler, the Bash shell, and a host of other core utilities. But they were always missing one key component. The kernel.
That changed in the fall of 1991 when Stallman visited the University of Helsinki to give a talk promoting the project. In the audience was a young computer science student who just happened to be building his own kernel from scratch. His version wasn't free, but after hearing Stallman speak, the student changed his mind and adopted the General Public License. At first, he wanted to call it Free Unix, or Freax, but his friend thought that sounded terrible, so he renamed it after the student himself, Linus Torvalds. Linus, Unix. Well, that's how he got Linux.
That kernel, combined with the other components from the GNU Project, became a full operating system. Now, technically, Linux only refers to that kernel, but a lot of people use it to refer to the whole operating system, so GNU and Linux and whatever else. Because the code was open and free and the projects built on it were too, a new model of software development took hold. Anyone could inspect the code, improve it, fix flaws, and generally just push development forward for everyone. So, software split into two competing ideologies. Proprietary closed source systems controlled by companies, and open source projects where the code was free.
- It's free in two ways. It's free as in you don't have to pay for it, but it's all free to change it in any way you want, and that seems to be the much more important aspect. People are happy to pay for technology, but so often do they run into some roadblocks where you have to file a support ticket with some large company, they may or may not get the help they need, and engineers are just itching to just fix it themselves.
Developers could take that basic code which was freely available and then add on their own features relevant to their specific device. They didn't have to reinvent the wheel every time. So that's why Linux spread into all sorts of different applications.
- Hello, I'm a Mac.
- And I'm a PC. No one else.
- No one.
- Hi, I'm Linux. There are an estimated 30 million Linux users out there.
- How long you been standing there?
- A long time.
And it's not even just limited to computers. Your electronic vacuum is definitely Linux. Your camera is definitely Linux. Most TVs, most electronics are Linux. Linux even runs some of the most sensitive machines on the planet.
- You can assume that Linux is pretty much used in anything of high-security need, not necessarily because Microsoft, for instance, couldn't build something equally secure, but because usually there's secrecy involved in building, let's say, a new weapon system, and you don't necessarily want to have to work with some tech company. You don't want to involve more people than absolutely necessary.
[Henry] Of the top 500 supercomputers in the world, every single one runs Linux. It's used in the Pentagon and on US nuclear submarines.
- Every bank you can think of really, manufacturers, hospitals, governments, defense organizations and things like that, they're all running Linux servers.
Today, Linux is everywhere, and most people are familiar with Windows and macOS, but they are not the most popular operating systems in the world. No, they are dwarfed by systems running a Linux kernel. Android, with over 3 billion devices, is built on Linux. And it also powers the majority of internet servers.
- There is no one company that could have imagined all the different cases where computers are used these days, and Linux, thanks to its adaptability where everyone can just tweak it in little ways to make it fit their use case, now covers all the use cases.
But all of this, it all relies on one key assumption. That the code is secure. Now, there's a good reason to feel this way. Because there are so many people looking at the code, there's this idea that bugs, either intentional or unintentional, won't be too deep to catch. It's known simply as Linus's Law. That with enough eyeballs, all bugs are shallow.
But there's a big problem with this assumption. The open source movement isn't one big project. It's an ecosystem. You need thousands of small tools and libraries each doing a different job, like networking, security, or compression. Now, a lot of these projects start because one person wants to fix a specific problem, so they build it themselves. They're often unpaid, coding on nights and weekends just to make the tool work. If it's useful, one open source project adopts it, then another, and suddenly you have millions of machines all relying on one person's passion project. That's how the entire ecosystem can end up quietly resting on a project maintained by a single volunteer. There's a famous XKCD comic that captures this idea perfectly. But what happens when that block is compromised?
In our story, our person isn't from Nebraska. No, Lasse Collin is from Finland, and he's been working on a small data compression tool called XZ since 2005. XZ is so good at compression that it's now used in almost every major Linux distribution. For the past 20 years, almost all of the work of keeping the tool compatible with ever-evolving hardware, it's all fallen on Lasse. He's never been paid for it, but up till now, he's been okay with that. Recently, though, he's been under more and more pressure.
"Over one month and no closer to being merged. Not a surprise."
"Progress will not happen until there is a new maintainer. Submitting patches here has no purpose these days. The current maintainer lost interest or doesn't care to maintain anymore."
Lasse responds, "I haven't lost interest, but my ability to care has been fairly limited, mostly due to long-term mental health issues, but also due to some other things. It's also good to keep in mind that this is an unpaid hobby project."
But it's not enough.
"I'm sorry about your mental health issues, but it's important to be aware of your own limits. The community desires more. You ignore the many patches bit rotting away on this mailing list. Right now, you choke your repo."
Lasse is burning out. But just when he thinks he can't handle it anymore...
"Nice job to both of you for getting this feature as far as it is already. Just trying to do my part as a helper elf." Signed, Jia Tan.
For months, Jia has been taking some of the load off Lasse. He's been incredibly helpful. Now he offers to step up and take over as maintainer of the project. To Lasse, it sounds almost too good to be true.
"As I've hinted in earlier emails, Jia Tan may have a bigger role in the project in the future."
Finally, Lasse can step back and breathe after 20 years of hard work. But Jia is not who he appears to be. And he's identified Lasse Collin's XZ project as a weak link in the Linux ecosystem, one that could give him access to almost every computer on the internet.
Today we take secure remote logins for granted. I mean, they've worked reliably for over 30 years. But it all started in 1995 at the Helsinki University of Technology when a hacker captured thousands of usernames and passwords sent over the campus network in a sniffing attack. In hindsight, the problem's obvious. These login requests were being sent totally in plain text, so anyone who intercepted the data could just read it. When Tatu Ylonen, a computer researcher at the university, learned of the attack, he made it his mission to ensure that it would never happen again.
[Tatu] Password sniffing was perhaps the most serious security issue on the internet back then.
To do this, his solution needed to ensure two things. First, machines had to establish a secure connection. If both computers could agree on a shared secret code that they would use to scramble their data, then even if they were overheard, anyone without that secret code would just get gibberish. Now, you could agree on that shared secret ahead of time in person.
- Password.
But on the internet, that's rarely practical. No, you have to agree on that shared secret ahead of time without ever having met and also with someone listening in the entire time. It sounds really tricky, but there is a way to do it, and I can show you how using this jar of paint. Say I'm trying to send a message to Gregor over there. First step is we agree on a shared public color. Let's pick this red. This is no secret, anyone can see this. Now we each pick our own private color. I'm gonna pick yellow, and he can pick whatever he wants. So we take our private color, and then I'm gonna mix that with the public color. It's worth saying now that these mixtures are assumed to be impossible to unmix, so even if you know this orange and you know this red, you can't exactly deduce the exact shade of yellow we used to create it, and this is important for the actual computer example later. Okay, so I'm gonna send this over to Gregor.
- So, I mixed in my secret color with the public, and I'm gonna pass this to Henry.
- So, Gregor sent me this, which looks like a sort of dark green sort of color. And what we're gonna do now is we're gonna mix it with my original private color.
- Okay, now that I have Henry's secret color mixed in with the public, I'm gonna add some of my own.
- So we end up with this sort of distinct olive color. There's my yellow in there, I can see, and whatever Gregor had in his side. And the thing is because each set of paints went through the same process, they both end up with this same olive green, even though we never shared our secret colors. So we end up with this shared secret color at the end that no one else can get, and that means that we can use it as our secret code when sending information.
Now, in the real exchange, we use big public numbers instead of colors, but the idea is the exact same. Each side mixes in their own private number using some math that, when you try to reverse it, leads to a discreet log problem, which makes it practically impossible to unmix them. That way, we solve the first problem.
But there is another threat that's unaccounted for. Say a hacker, like Casper here, tries to sit in between us. Now we can create a legitimate connection, so we end up with a shared secret code, and Casper could do the exact same thing with Gregor. Now, whenever I send a message, he can relay that to Gregor, he can change and modify it and send his response back. And to each of us, the connection looks legitimate, but Casper's sitting between us the whole time. He's a man in the middle. So, I need a way of authenticating that Gregor is really who he says he is. Now, we could do this again by agreeing on a password ahead of time in person, but we need a practical way to do it over the internet. This was the second problem that Tatu had to solve.
To make that happen, Gregor can take two really big prime numbers, which he keeps secret. He then multiplies them together to get an even bigger number, which he then makes public. Now, when I want to send Gregor a message, I just take that big public number and I scramble it in a way that only Gregor, who knows the two prime factors that make up that big public number, can successfully unscramble. For anyone else, getting those two prime factors is practically impossible. So, as long as I know that that big public number actually belongs to Gregor, I know that anything encrypted to that key can only be read by him. This is called RSA encryption, and it means that if I know the certificate is valid, then I accept the connection. And by authenticating Gregor, it foils our man in the middle, Casper Devious. All right.
Tatu Ylonen combined these two steps, securing the channel and authenticating the user, into a protocol for remote logins between machines. It gave you the same simple text shell people were used to, a plain terminal where you type commands, but now the connection was encrypted. He called it Secure Shell, or SSH. And it was immediately useful. Many Linux machines don't even have keyboards or monitors, especially not servers, so you wanna be able to log in and control them remotely. So SSH was soon adopted on almost every machine that ran Linux. And as Linux spread, so too did SSH. Today, when you control a machine remotely, there's a good chance you're using SSH.
- SSH is literally the maintenance backbone of the entire internet.
And the most widely used open source SSH implementation is called OpenSSH. And because it's so popular, it's heavily protected.
- I mean, OpenSSH is probably one of the most closely examined projects out there because it's just so vitally important to the security of servers everywhere. Having a way to bypass the authentication in secure shell is like having the master key to the hotel. It lets you into every room.
[Henry] This is why Jia Tan wants a way into OpenSSH, but trying to hack it directly would be almost impossible. Lucky for Jia, the open source model doesn't just mean that operating systems are stitched together from many programs, but that each of those programs is itself stitched together from other programs. Those are called dependencies.
- OpenSSH is one of the most scrutinized software packages, but that doesn't extend to all of its dependencies.
Jia believes that if he can compromise a dependency of OpenSSH, he can sneak an exploit into the main project. And it just so happens that Lasse Collin's compression tool XZ is linked through a chain of these dependencies.
Now, Lasse's original goal with XZ was to find a better way to compress data on Linux. That data could be anything. Code, an image, text. But what was important to Lasse was that once you compressed and decompressed it, it had to come back exactly the same. The method had to be lossless. Let me give you an example. We're gonna take the lyrics to Rick Astley's hit "Never Gonna Give You Up" and we're gonna try to compress it. Now, say we take this and we represent it as a stream of characters, and each one gets a fixed-width 8-bit code. Now, that works, but it's inefficient. If we go through this stream and just count up how often each symbol appears, you'll notice there's a pattern. Some appear more frequently, like N with 430 uses, and some, barely at all, like J with one use. To save space, why don't we give the ones that appear more frequently shorter codes, and the rarer ones, well, they can afford to be long. But how do we do that? So, let's start by counting up how often each symbol appears and sorting that from most frequent to least frequent. We take the two least frequent symbols and join them together into a pair. We then treat that pair as a new combined symbol whose frequency is the sum of the two it represents. We can then reinsert that back into the list. Then we do it again. We take the two least frequent items, combine them, and then reinsert them back into the list. And we do that over and over again until we get this massive structure called a Huffman tree. Now, to get our codes, we just walk the tree. A step right is a 1, a step left is a 0. So, for example, to get R, we just go right, left, left, right, so the code is 1001. So what you'll notice is the more commonly occurring symbols naturally appear at the top of the tree, so they get shorter codes, while the ones that appear less frequently are at the bottom of the tree.
The system works well, but it also has a weakness. In our "Never Gonna Give You Up" example, it always encodes N-E-V-E-R space. It doesn't realize that this whole chunk repeats. So, what if instead of looking at symbols, we looked at those chunks? Now, they don't have to be words, they can be parts of words or even longer. They just have to be patterns that repeat. So let's scan through the text but keep a rolling dictionary of what we've just seen. Then, as we move forward, we can check whether the next chunk has already appeared. And if it has, we don't need to write that chunk again. We just write a code with two numbers, how far back to look, and how many characters to copy. Now, when we decompress, we can just read along and whenever we hit one of these codes, we jump back, copy the matching chunk, and paste it into place. Two scientists, Lempel and Ziv, published this algorithm in 1977, so it became known as LZ77.
But some of these symbols and pointers show up more often than others. They actually have their own frequencies. So we can feed that whole stream into another Huffman tree to get a second layer of compression. And in our demo, it actually gets the file down 85% smaller than the original. This might look new, but you've almost certainly used it yourself. It's called deflate, but it's better known for the files it creates, .zip. If you ever clicked Close on this before, you've definitely used it.
But Huffman only uses the overall frequency of a chunk repeating. Real data isn't just random chunks. In our example, after "Never gonna", you might get "give you up", "let you down", or "run around and desert you". You might get "make you cry", you might get "say goodbye" or "tell a lie and hurt you". Each one has its own probability, and you can represent these probabilities with a mathematical tool called a Markov chain. The algorithm can then encode the stream of data so that the more probable next chunks cost few bits and the less probable ones cost more. If you combine that with a much bigger search window so it can point much further back in memory, then you get the Lempel Ziv Markov chain algorithm, or LZMA. LZMA was developed by Igor Pavlov around 1998, and it often beats much more familiar methods. In many cases, it can shrink files to about 70% of the size of a typical .zip.
Lasse took this elegant compression algorithm and made it work on Linux, and he called it XZ not because it stood for anything, but just because it sounded cool.
- I'm using XZ quite a lot. I think XZ is a wonderful project. There are lots of different ways of compressing data. Some of them are fast but they don't compress very well, and some of them are slow but they get extremely good compression.
But across Linux, projects are constantly shipping the same files and updates to millions of machines, so XZ is perfect. You compress something once, then you get a smaller file to download forever. Lasse released XZ in 2009, and over the next decade and a half, it went from a niche tool to the common choice whenever a project needed effective lossless compression. So, XZ quietly spread everywhere, eventually becoming a dependency of OpenSSH.
- So, it was at some point in about February 2024 and Jia Tan, he emails me. He's got all these new features in the new version of XZ.
[Henry] He wins Rich over almost immediately.
- So I get to talk to hundreds of contributors all the time, and I do get a feel for them. I feel, you know, are they good coders, which is what I really care about. Are they conscientious people, are they helpful? Do they respond to bug reports quickly? And in all of the dimensions, Jia Tan would be a very good contributor because he's obviously a good coder. He's very responsive, he's very keen, and I love all that.
All indications are that Jia is a great contributor, and this puts Rich at ease, so he lets his guard down. And that's often where the problems start on the internet. You can't keep your guard up forever. But lucky for us, with today's sponsor, NordVPN, you don't have to. NordVPN's Threat Protection Pro blocks dangerous websites before they load. It stops malicious downloads and it strips out trackers and intrusive ads automatically. And it works even when you're not connected to the VPN, so a lot of these attacks never get the chance to start in the first place. I use NordVPN whenever I'm traveling or working on public wifi because it means that I don't have to think about who's running the network. It's just one click and it's so fast that I often forget that it's on. Not just that, if there's a show that's no longer available in my region or a sports team that's blacked out, like I'm often watching international football and they don't quite have it where I'm going, well, in that case, I can just switch my server location with one click to unlock the content. Apparently you can even use it to find better deals on plane tickets by changing your IP address to another country. I haven't tried it yet, but that sounds fascinating. So, if you wanna try it, you can get the best deal by going to nordvpn.com/veritasium. When you use that link or this QR code, you'll get a huge discount. Also, you get a 30-day money back guarantee through Nord. It's a no brainer. So again, that's nordvpn.com/veritasium or you can click the link in the description below. Thanks so much to Nord, and let's get back to Jia and the prize he's got his eyes on.
- At this point, we were preparing RHEL 10.
[Henry] See, Red Hat ships two major flavors of Linux. Fedora, which is free and publicly available, and Red Hat Enterprise Linux, or RHEL, which is available through a paid subscription. This one has to be stable and secure because it's widely used on the most important machines, like in governments and hospitals. Jia wants his code in RHEL, but RHEL only has a new major release about once every three years.
- So, there's definitely a deadline, and that deadline was around sort of March, April in 2024.
Jia has to act fast. He wants complete control of any compromised machine. And to pull it off, he has three steps in his plan.
Step one, the Trojan horse. The code for XZ lives on a website called GitHub, which tracks all edits to XZ's code using a tool called Git, which was also developed by Linus Torvalds. So, Jia starts by making small changes. He changes the primary contact for bug reports to his own email. He tweaks small tools that will help him later. But he can't sneak in the payload this way. I mean, it'd be too obvious. So he needs a way to sneak it in without it ever appearing as normal source code on GitHub.
- So, when you're writing compression software, it's very often the case that your software is full of these binary blobs, as we call them, so just lumps of binary which are used to test the compression or the decompression is still working.
Nobody reads these test blobs. They're included without ever appearing in the human readable source code. They're assumed to be garbage data. But for Jia, this is the perfect place to hide his payload, inside something that at first glance looks harmless. But in reality, it's a Trojan horse. But with a Trojan horse inside of XZ, it's still just a lump of data in a binary blob. He has to unpack it. So, in the code that builds the project, he slips in a small easy-to-miss change. It hides among all the automatically generated code and quietly unpacks his payload, inserting it into the XZ library. But now that it's inside of XZ, it still has to pick the right time to act.
On to step two, Goldilocks. Jia's end goal is to compromise a very specific part of the SSH connection process, the RSA authentication step. He realizes that if he can slip a small malicious component in there, let's call it the payload, then every time SSH checks for a key, his code will run first. It will quietly look for a special master key that only he knows, and if it sees that key, it'll let him straight in. If it doesn't, it'll call the real code and no one's the wiser. So, he will have his backdoor entrance to OpenSSH.
But he can't just go in and rewrite RSA Decrypt, the function that verifies the client's identity during the login. It's not that easy. See, when you build an application, you could take all the code you need from different libraries and bundle it into your application. But there's a big drawback to this approach. If 10 different applications on a system all bundle the same library, you end up with 10 separate copies on your machine, so it's redundant. That's why modern systems mostly use shared libraries. When an application starts, the linker fills in a table of addresses. These addresses point to the functions and variables it needs from the libraries it links to. That table is called the Global Offset Table, or GOT. Now, when it wants to use something from a shared library, it just checks the GOT and jumps to the right spot in memory. RSA Decrypt doesn't belong to OpenSSH at all. It comes from a shared crypto library. So to hijack authentication, Jia can overwrite the GOT entry that tells SSH where it is. And to do that, he can use a little known tool called an IFUNC resolver.
- The IFUNC is used where let's say you wanna optimize your code to run on Intel's hardware and AMD hardware. Now, you could write the software just for Intel, and it would run very fast on Intel and it probably would run very badly on AMD hardware.
[Henry] Instead, you keep multiple versions of the same function and the IFUNC resolver picks the right one for the hardware you're on. At first glance, that sounds like a way for Jia to trick the system into thinking it's running hardware that needs his own compromised version of RSA Decrypt. But there is a catch. A library can only define IFUNC resolvers for its own functions. And since RSA Decrypt doesn't belong to XZ, it can't use an IFUNC resolver to override it. But IFUNC can still help him.
- So it will, very, very early on in the running of the program it will do this sort of determination of what hardware is available, and crucially, it does let you run your own code in the library very early on.
Now, at this early stage, from within an IFUNC resolver, Jia could try to directly rewrite the GOT entry for RSA Decrypt. But at this point, the system is still filling in the GOT, so even if Jia changes the RSA Decrypt slot, the loader will come along later and write the real address back in, wiping out his change. And there's a limit on the other side as well. To make this sort of hijacking harder, once every entry is filled on the GOT, the system marks the table Read Only. That means that if Jia waits too long, the RSA Decrypt entry is frozen. So he has to slip it in at a very precise moment. After the RSA Decrypt entry is filled in legitimately, but before the table gets marked Read Only. And that tiny window is the Goldilocks zone. And to hit it, he's gonna need another tool.
So, linking shared libraries in the GOT often leads to bugs, so Linux has a special debugging feature that tracks what the system's doing. It lets you run code whenever the linker writes a symbol's address into the GOT. It's called a dynamic audit hook, and normally you'd use it to profile performance. But crucially for Jia, there are no real guardrails. The hook can run any code he wants. And this is where IFUNC finally pays off. Jia uses an IFUNC resolver to set the audit hook early. Then, when the linker writes in the real RSA Decrypt address, the hook fires and swaps in his payload. Right in the middle of the Goldilocks zone.
There is one final complication, though. Audit hooks are normally configured by the system, not by libraries like XZ. So when Jia is first looking for the audit hook variable that he's supposed to rewrite, it's actually hidden from him, so he first has to find it. Within the IFUNC, he scans a small region of binary code, hunting for signs of the hook. But it's just raw bites, so he writes a tiny decoder to turn them back into instructions that he can read. Now Jia can find where the hook lives in memory and finally plant his code. Then, when RSA Decrypt gets called legitimately, it triggers the payload and he's in. But now that he's in, what does he do? And how does he get out of there cleanly?
Step three, the cat burglar. With Jia's exploit in place, SSH isn't just checking for a legitimate login anymore. It's also listening for a hidden master key. And Jia is careful, he doesn't want anyone else stumbling onto the backdoor, so that master key isn't just a simple password. It's actually a mini cryptographic exchange of its own. First, the backdoor code checks for a shared secret, and then, second, it authenticates the user. And only if both checks pass does the payload run. In effect, it's like the backdoor is running a miniature version of the encryption from SSH inside of SSH. But in SSH, it uses that encryption to keep the attackers out. In this case, the backdoor is using that encryption to make sure that it's only the attackers that can get in. But he's still careful. One of the main ways defenders catch intrusions is through SSH logging. So, to cover his tracks, he wipes evidence of the backdoor ever firing. And this is on top of the numerous safety checks that he's inserted throughout the process to make sure the system supports the backdoor and doesn't crash and draw attention. And this is the genius of Jia's trap. It's cautious and meticulous, designed to slip through only where it will run invisibly. With all three of these steps complete, he can finally control the machine undetected. All he needs to do now is get his updated XZ implemented in the next release.
But just as Jia is completing his backdoor, an open source developer requests to remove the dependency that links XZ to OpenSSH. This would spell disaster for Jia Tan. He becomes frantic, pushing harder and harder to get his compromised XZ into major Linux releases. He gets it into an early experimental build of Debian. He files a request to have it added to Ubuntu. He's trying to land the backdoor everywhere he can before anyone realizes what's going on. And it's then that Rich gets his first message from Jia. Over the next few weeks, he gets more and more insistent, urging Rich to add the updated XZ into the next release of Fedora.
- I'm always very keen to talk to keen upstream contributors, contributors who are really excited about new things in their software, who are really willing to help us get stuff into Fedora. So, you know, that's great, love it. That kind of makes my day, it's my happy place.
Eventually, Jia gets what he wants. Rich adds the updated XZ to a pre-release version of Fedora. Jia has succeeded. Except there's a bug.
In low-level code like the backdoor, things you normally take for granted, like memory management, are not done automatically. If a function grabs a bit of memory, it also has to give that memory back when it's done. And if it doesn't, then every time the function runs, it grabs more and more memory and then never releases it. Over time, the program just keeps growing. That's called a memory leak. And to catch problems like this, developers use a tool called Valgrind. It runs the program more slowly but watches every memory operation for anything suspicious. Valgrind is raising hell on Jia's code.
- We put XZ, this version, 560, into Fedora 40. We get a bug report initially.
And the backdoor in XZ specifically is generating invalid writes errors. Well, the logic was written by hand, bypassing the compiler's safety checks, and so they accidentally wrote outside the memory stack. Now, lucky for Jia, all this isn't immediately obvious. Rich still hasn't noticed what's happening.
- New software has bugs, right? It's the state of nature of software. Software is absolutely full of bugs all the time.
[Henry] Now, the real problem is inside the malicious code in the test file. But Jia can't just go and fix that, that would completely expose the backdoor. So he invents a cover story. He claims that the random data he used to generate the original test files, well, it's not reproducible, so he's replacing it. And in this updated code, he fixes the memory error.
- It's a very convincing and plausible explanation for why this test blob has to be updated. But of course, it's not the real reason.
All right, so now the real fix is in, but if the bug just magically went away, it would look a bit suspicious. So he has to find a way to cover it up.
- So what he then does is he changes the IFUNC code in a way where he adds like a whole bunch of comments and changes to the code around it that doesn't actually change the code but is plausible enough to look like he's changing how the IFUNC works to fix the Valgrind bug.
- It does, listening to it and I'm like I know that this is the evil hacker Jia Tan, but I'm like, ooh, that's clever. You know?
- Yeah, I mean, look, the guy is obviously not an idiot, right? But none of this is suspicious. This is what we expect from compression software. And as a packager, it's not really my job to fix every bug in upstream software. As soon as it gets to a certain level of difficulty, my thought here is, well, Jia Tan has actually been writing this software, right? So he's got it all in his head, he knows how it works. It's easier for me to just give him the problem. And I send the bug over to him and like a day later he sends the fix back. From my point of view, it's problem solved. It worked, system worked, right? I made the right call. I don't see, at that point, knowing what I know then, I don't see that there's any problem.
- So we downloaded Jia Tan's version of XZ, which was available on Fedora publicly, but we made a slight modification. Instead of using Jia's secret code, we're using our own, and that means that we can take advantage of Jia's backdoor. In this case, we're targeting the veritasium.com website. And once we get control of it, I got a little trick in store for Derek. Now, to make sure I don't mess with any real traffic too bad and lose my job, we actually cloned the Veritasium website and put it on a very similar URL, but it will work the same. Of course, Derek doesn't know that I've covered my bases.
- Oh no. Man, when you guys do these things, I just, I start to get more and more scared now. I want it to work for the video, but I also don't want it to work 'cause I don't wanna screw stuff up, so.
- Yeah, it's the risk you take, I guess, letting us run rampant.
- It is a concern.
- I'm gonna execute a script here, which is gonna open up. It's opening up a port on the Veritasium server. And then on this side I'm gonna execute a little script.
- Uh-oh. Henrytasium. Who is this goof? On the main photo, you spent time getting all suited up there.
- Of course.
- Looking sharp, sir.
- Thank you, thank you.
[Derek] "Videos Derek would never approve of." Uh-oh.
- The concept was over the years that we've worked together, you've said no to a bunch of my ideas, and I figured now with control of the website it's about time the world saw it.
- "Surviving 7 days living underwater. How do saturation divers live at -1,000 feet?" I mean, you wouldn't be outside, right? So I don't know why you need goggles there and like a respirator but you're not underwater. "Why it's almost impossible to shoot 4,000 meters." It's a sniper video. Yeah. "The CIA lied: exposing how the CIA lied about torture." I feel like that still goes into a tough territory for us. "How xenon gas replaced oxygen. I attempted to climb Mount Everest on xenon gas." That sounds like a terrible idea. This is what this whole video is about, this whole video is just about trying to get me to green light your projects. You know, if people like these video ideas, they can feel free to let us know in the comments and we can actually make them. The top upvoted comment one, I will green light happily.
- Let's go!
- It is live, yeah, it's live on the server, yeah.
- If anyone's on the website right now, that would be very strange for them. Look, I'm not pleased, I would like you to change it back. It doesn't seem like this should be possible on a Linux server. So the big question is, how did you do it?
- The address is the server, the seed is our code to get in, and then the command is what we're doing to essentially open up, in this case nc, which is like opening up a port on the machine that we can then access from this second terminal. Then what we're doing is on this side we're running a script that's connecting to that port that's just been opened up, copying our files and then by the end we're gonna have root access on the server. That means that it thinks that we own the thing.
- That's so crazy. This is a very scary hack. I do not like it.
- Another thing is that this is a very obvious way of demonstrating this attack. Like I've changed everything on the website, you immediately know that I've gone in and hacked the server. If we were doing this for real, we would do it a lot sneakier.
- I mean, as you say, right? The thing to do would not be to totally rework someone's website so everyone notices, but to change it subtly so nobody notices so you can skim data or, yeah, like get credit card details or take payments to a different location, stuff like that.
- So you can copy anything you want, you can change anything you want, you can delete anything you want. So if there's any interesting documents or crypto tokens, any files you're interested in, those are yours now. If there's secret communications going across these, and let's keep in mind all of our communication networks are also built around Linux, those communication streams are yours now. If you wanted to encrypt something and ask for ransom, that's possible now.
[Henry] The possibilities really are endless. After two and a half years of hard work, slowly infiltrating the XZ Project and weaving in this ingenious backdoor, Jia's done it. He now has free rein on any machine that installs the new Fedora pre-release. And he also gets the same access on Debian testing and Ubuntu's pre-release environments. And with RHEL 10 coming up, his code could infect some of the most important computers. Now he should be able to relax, wait for the release, and he's got his backdoor key.
But just when he thinks everything's going right... Andres Freund is a German programmer. He's not a security researcher, he's not a hacker. He's just an employee at Microsoft working on an open source project called Postgres. One day in March 2024, he tries out the unstable release of Debian to make sure that Postgres will run smoothly. But while checking the server connection times, he notices something odd. A slowdown. It's not much. In the worst case, it's only half a second, but it's enough to make Andres suspicious. We tested the connection times ourselves on our own version of the XZ hack and we found the exact same thing. Consistent slowdowns of about 400 to 500 milliseconds. Andres had already seen the problems with XZ and Valgrind weeks earlier and this only makes him more suspicious, so he digs in deeper. He looks at recent additions to OpenSSH and traces the delay back to an update in XZ. He sees the binary test files but notices that they were never used in a test. It's even stranger. Andres tries to get back to work, but he can't stop thinking about it.
[Andres] I remember sitting in a bunch of meetings and like not really being able to concentrate because it feels like, I need to continue looking into this.
Eventually, Andres sees it. This isn't some bug, this is a backdoor. And this backdoor is meticulous. It hunts through memory to find the audit hook, it implements a decoder to read those raw bites, and then it wraps everything in custom encryption and safety checks so that it only triggers on the right kind of connection. I mean, it even garble its own strings so that it won't be detected. It's incredibly cautious. But all of that takes time, and in the end, that's what grabs Andres's attention.
- If they had done less obfuscation, I probably would not have noticed that anything was wrong.
[Henry] Now, XZ's security contact is Jia Tan, so Andres can't exactly report it through the usual channels. Instead, he emails the Debian security team directly and posts a detailed report to a public security mailing list. Then, all hell breaks loose.
- I'm called up on I think it was a Friday evening, in fact, I'm sure it was a Friday evening, to join a internal Red Hat meeting. It's immediately obvious that this is not a normal meeting because like our head of security is there. It's explained to me that it's been found by somebody in the community that XZ has a backdoor, and immediately I'm like, WTF? How did this happen?
To cover their bases, Red Hat quickly rolls Fedora back and tells all their users to revert, and the whole open source community starts digging into the project to understand what went wrong. One thing is clear, though. Andres is a hero.
- Now, the fact that this was discovered in a different test at all, that was lucky. But then what are the chances that someone who isn't looking for a security bug spends days investigating this? So, big kudos to the researcher, and yeah, saved us all from possibly a doomsday on the internet.
- I think that Andres did a brilliant job because he did what I should have done, actually, which is I should have looked at the, you know, I should have looked at the bug when I saw it and I should have gone there, you know, like a crazy hound sort of sniffing around trying to find out what's going on.
[Henry] Andres even gets a shout out from the CEO of Microsoft. But when the story breaks, the mainstream response is surprisingly muted.
- Actually, I'm still surprised now that the mainstream news outlets haven't really covered this very much.
- Well, I can tell you how many systems would have been compromised, which would have been millions.
- Anything from spying, to ransom, to just taking down entire countries, you could have done it with this backdoor.
[Henry] I guess the big question is, who is Jia Tan?
- That's the question, isn't it? Okay, so my feeling is that Jia Tan, the person that I talked to I believe is one person, but I also believe that behind him must be a group of people. And they worked for quite a while. I mean, they were at this for perhaps two and a half years that we know about.
If you look back at the accounts pressuring Lasse, they share some similarities. They use free email addresses and they have almost no footprint outside of the XZ threads. These were very likely sock puppet accounts, identities manufactured to apply pressure as part of a multi-stage social engineering campaign.
- Now, who spends a million dollars and takes two and a half years to attempt to break into every hotel room on the internet with a master key? I think it's not a criminal organization because I don't think a criminal organization would have that patience to spend that time without any real return. So I think it has to be a nation state actor, here.
- A lot of the aliases, like Jia Tan, they sound like Asian names, and the published changes are all timestamped in UTC+8, Beijing time. So the signs point to China. And that's why it's probably not China. I mean, why would they make it that obvious? Every other part of the operation has been so meticulous, so cautious. And they also worked on Chinese New Year, but not on Christmas. And over the years, there were nine changes that fall outside of the Beijing time into UTC+2, which is a time zone that includes Israel and parts of Western Russia. That's why some experts have speculated that this could be the work of APT29, a Russian-state-backed hacker group also known as Cozy Bear.
- But again, do we know? No, of course we don't know who it is, and we likely will never know. Jia Tan himself just disappeared as soon as this exploit became publicly known and never heard from again.
- In a sense it doesn't matter whether this was Russian or Chinese or Iranian. We need to protect from these types of backdoors no matter where they're coming from.
- I see this as like, you know, the canary in the coal mine of what's gonna be happening as attackers get more sophisticated, they make fewer mistakes. You know, the gloves are off in a way. I don't think that the Linux community is fully, you know, is fully ready for this yet.
In the aftermath of XZ, the open source community poured over countless small similar projects looking for similar campaigns, but they found almost nothing.
- I'm worried that we didn't find other backdoors. The incentives are just too clear. There are state-sponsored parts of either governments, militaries or even private contractors working for states that are all preparing for the next cyber escalation, some kind of a war, some kind of a geopolitical conflict, and where are all of those backdoors? There's just too many people incentivized to put backdoors for the few backdoors that we're actually discovering.
Now, some experts have argued this reveals a fundamental flaw in the open source model, but not everyone agrees.
- Closed source software would be no better here. In fact, who's to say that there aren't already state spies working as paid software engineers at some of the larger companies putting in exactly backdoors like this? But then there would be no community member running free testing and detecting this by chance. This backdoor, if anything, underlines the ethos of open source.
- I mean, just think of what it took to get this done in public. There was a multiple-year social engineering campaign, there were all these layers of misdirection, and then there was code that was designed to withstand constant scrutiny. Compare that now with a closed source hack. Sometimes all it takes to get a backdoor installed there is a court order, or you have a public company that can just brush a breach under the rug. I actually used to work as an open source researcher myself at the Japanese telecom giant NTT, and my perspective is that it's only because this is an open source project that it's been picked apart, analyzed, and turned into a conversation about security at all. One that focuses on the fundamental vulnerability. It's not the code, it's the people. Now, the system has not supported them enough.
- I feel for Lasse that he's given this beautiful gift to the whole world and, you know, what have we, what has humanity done back to him, right? We've poisoned his gift. And then I think implicitly a little bit, not everyone's saying this, but implicitly we're blaming him for not being there to maintain this stuff for free forever. But why are we demanding that Lasse do anything when he's not being paid for this stuff? And that's, in my opinion, quite unfair. On this Saturday evening, we were working together on a workaround for this bug in RHEL 9 that he's added to XZ, and he absolutely could have told us to get lost, and didn't. What a brilliant guy.