Hard Drives of Doom

travis+web@subspacefield.org

1 Introduction

Sometimes you go through life feeling like you haven’t gotten anywhere near your full potential, and that if someone would only give you a chance, you’d be able to do something impressive. Maybe you know it, and your friends know it, but how in the world do you explain it to someone without sounding like a pompous fool? You could talk about the thousands of little things that come up, but by themselves, they’re probably unimpressive. What you really need is an anecdote that really exemplifies your potential. After some thinking, I realized that I have one of these, something I’ve told in bits and pieces, and which has been buried in my homepage, but has never been told in its entirety, in part because hard drive encryption wasn’t considered prudent back then, it was considered outright paranoid. Time has shown the wisdom of my approach, and now I suppose that hard drive encryption is common enough that this tale can be told in full, and in the incremental, investigative way that it occured.

2 The Enclosure

The time is sometime around 1995. The Internet, though open to the public, is not a household word. “The Web” is two years old, and the main browser is Mosaic. Netscape, which would later become Mozilla Firefox, is in beta. Almost nobody has “home pages”. I am working for a company that sells database migration tools. The largest SCSI drives are single digit gigabytes. There is one AT&T “Teradata” machine in the building, and it is the size and shape of a full-sized refrigerator. The name suggests that it actually holds a terabyte of data, and is meant to impress, but at the time I was still a bit skeptical.
Now for those of you who don’t remember SCSI, let’s have a history lesson. At the time, most computers had ATA/IDE drives. These drives had to be put two to a bus, and one had to be master, and one had to be slave, and the slave suffered some serious performance penalties. In most computers, this limited you to two hard drives; you sometimes hooked up two CD-ROM drives to feel that the capacity wasn’t wasted. Since most hard drives were only a few gigabytes, and since ATA was slow, if you were serious, you forked out bigger bucks for SCSI. SCSI allowed you to “daisy chain” up to 7 hard drives for each controller, though it was hard to fit that many in your case.
Around this time, the company is getting rid of some older parts. One of them is a large external RAID storage array, apparently custom-made by a company named “Eagle Storage” or something like that, which held eight enterprise-class, full-height, 5.25” 2.5GB “wide” differential SCSI drives. It was the size of the refrigerators meant for dorm rooms, and had two built-in 300W power supplies and 80mm fans (one either half of the unit), was awkward to carry, and weight about 100 pounds when loaded with those drives.
Now, this got my attention! I was a year or so out of college, still poor, and I had the ability to upgrade to 20GB of space. Not only that, but I’d have differential SCSI, which allows you to run much longer cables to the drives (allowing for external enclosures), and wide SCSI, which allows you to have up to 15 drives on a chain. The only problem was that I had to buy a relatively rare HVD SCSI controller. I was able to find one on the Internet; it cost several hundred dollars, but I was very happy to have it all. The drives were built like tanks, probably several pounds each. Larger capacity drives were on the market, but not “enterprise class” ones. So I had about 16-20GB of storage in one computer, and I was quite happy with it.

3 Seduction by Pure Evil

Now, over the next year or two, some new 4.5 GB drives came on the market. Specifically, Micropolis had some 7200RPM 4.5GB HVD SCSI drives that came on the market, and I stumbled across a wonderful deal on used ones. I don’t have the prices now, but I could easily buy several as a recent graduate getting paid $35-40k/yr, so it couldn’t have been very much, perhaps $50-100 per drive.
I assumed at the time that the resale market for wide, HVD SCSI drives was simply too small; it was a rare technology, and so trying to sell them was a bit like trying to sell Betamax tapes when VHS dominated. But I should not have accepted such a glib explanation; I should have dug further, because this is where I became ensnared. I bought four.

4 Death’s Icy Embrace

For a while, they seemed to be working fine. However, every once in a while, I’d notice the computer lock up. Later, I noticed that the SCSI bus light was on. In some cases, the computer would still be running, but anything that caused access to the disk drives would simply freeze. You couldn’t kill the processes that were frozen like this; if you tried, your process would freeze. Eventually, the system would slowly freeze “to death”, and I’d have to do a hard reboot. Attempting to do a graceful shutdown at any point would simply cause the computer to freeze up completely, and so a hard reboot was necessary.
Now, the whole point of doing a clean shutdown is to avoid the possible data corruption from a hard reboot, or “power cycle”. So it was no surprise to me to find, on occassion, that there was some corruption. Files would end up in lost+found, and I’d move them back to their original places.
But over time, this happened more and more. I blamed something about the OS, not knowing the drives were at fault. I wrote a tool that would examine what was in lost+found, particularly when it was a directory with files and subdirectories, and automatically try and find where to reattach it to the main file system tree. It was like you had a picture of a tree, and every day you find pieces of branches laying at its feet, and you have to graft them back.
Then I started noticing sectors of NULs in the middle of certain files. This was a seriously insidious kind of corruption. It wasn’t just an annoyance now; I was having silent corruption of the contents of my files. I had never lost significant data in my life, and this computer held everything, including all my class assignments from college. It was like a scrapbook and diary for me, and something was eating away at it, silently. I had no idea how extensive the damage was, since I didn’t keep checksums of every file (indeed, that’s silly, since certain files are supposed to change contents).

5 The Security Landscape

Living in Austin as I was, the location of Steve Jackson Games, I was familiar with the 1990 Operation Sundevil, which was chronicled in The Hacker Crackdown. I met “Erik Bloodaxe” of LoD, who later became an editor of Phrack, as well as “Minor Threat”, who had a few years earlier finished “Tone Loc”, which is (as of 2010) still a widely-used wardialing program.
The first edition of Applied Cryptography had come out in 1994, just a few years earlier. It was widely regarded as a watershed event; it came out just after the Clipper Chip, and during a time when you could not export encryption technology by the same laws that prevented exporting sophisticated military weaponry. Although it is accepted now, many specialists back then who saw encryption as necessary for the security of the Internet were having a hard time getting their point across. By publishing Applied Cryptography, Bruce Schneier was a modern Prometheus, stealing fire and putting knowledge of it in the minds of the people.
Of course, this watershed moment had historical precedents; for example, David Kahn’s book “The Code Breakers”, while largely historical, did give out some information on how to apply cryptography, but this was the first description of modern algorithms like DES, and better yet, IDEA.
Were there encryption programs at this time? Well yes, Phil Zimmerman wrote Pretty Good Privacy (PGP) in 1991 [A→] [→A] http://www.philzimmermann.com/EN/essays/WhyIWrotePGP.html , literally putting working software in your hand, and he is perhaps more Promethean than Bruce Schneier. Prometheus was chained to a rock, with an eagle tearing out his liver each day, which regrew. Phil was hounded by the government after its worldwide spread, he was forced to defend himself against these arms-trafficking regulations, trying to prove that he didn’t export it himself. That legal hounding to prove a negative cost him quite a bit of money ($1M?) and so he was forced to start a business around it; a business with which I interviewed, and which is being purchased by Symantec at this very moment, some 20 years after writing the program!
There was also a Norton disk encryption product called Diskreet, but it only ran in DOS, and by this time, I had totally moved over to NetBSD as my main operating system (Linux and FreeBSD were competitors at the time). It was known to some of us that it only did DES encryption, which I considered too weak, and that it didn’t handle the keyspace properly, making it a joke to break. In fact, the DESCHALL project broke a DES-encrypted message in June 1997.
So, to summarize, there were threats against youg people or publishers with computers, the information was out there, most of the tools were unsuitable because they were using DES, or simply weren’t designed right, and there was one good program, PGP, which did a lot of what you wanted.

6 My Crypto Contribution (idea_filter)

So, at this time, I was backing up hard drives to tapes. I wanted to be able to encrypt things as a Unix filter, so that I could encrypt tapes with something like this:
dump -cvf /dev/stdout / PIPE gzip -9c PIPE idea_filter PIPE dd of=/dev/rmt8 bs=8k
I may not have gotten the flags to dump right, but you get the idea. I’d then compress it heavily, encrypt it, and then stream it out to tape. DES was dead, and with no obvious candidate to choose, I took Schneier’s suggestion and picked IDEA. It was also getting used in PGP.
So, as of 1997, I was encrypting my backup tapes, and I could therefore leave them in an untrusted place, and nobody who found them would be able to do anything with them. Other companies were paying Iron Mountain to truck them off and store them deep in guarded vaults; I kept mine at my friend’s house. This is the power that crypto gives you; you no longer need to care (much) about the confidentiality of the data.
But wait, there’s more. Once I had written this as a “streaming filter” that read STDIN, I decided to try doing it a different way. I’d take an IV and a filename as an argument, and I’d encrypt that file in place, overwriting the old data. This was nice, because if you were encrypting a file, you typically didn’t want the plaintext still lying around. Not even PGP could do this. And the way I did it was interesting; I’d read a sector, encrypt it, back up a sector (using lseek), and then write it. So the file was incrementally encrypted from beginning to end.
This seems unimportant until you realize what this allows you to do. Instead of just encrypting files, you can encrypt any block device, like a disk partition, or the entire hard drive!

7 The Boot Disks

A big part of security is being prepared, thinking through your problems. Back then, I used to worry about not being able to boot my system (disks were less reliable back then). I did this in part by mirroring the first slice of the disk (including MBR, boot sector, and /) to another identical drive, so if one wouldn’t boot I could swap in the other. I had a little script which would do the mirroring, and then fix up the names of the drives so that the mount points (which took up most of these disks) all worked still.
Another part had to do with getting a minimal operating system on a floppy. At that time, I could build a kernel containing only the drivers I needed (using BSD’s kconfig system). But userland was a bigger problem, especially since most programs were statically linked (dynamic linking hadn’t spread to us yet). It turned out there was a program called crunchgen, which would allow you to compile several programs together, and they’d share the same library functions, making a huge executable you could link to several names, and when called with that name, it behaved as that program. This worked wonderfully, and so I included idea_filter in that list of programs, and I made “rescue” CDs which could be used to repair a broken system.

8 The Big Con

It’s 1998 and I’m preparing for the drive to San Antonio for the 7th USENIX Security Symposium. Well, time to put this thing to work! I boot off the floppies, and start idea_filter encrypting in-place on the raw disk devices. There are nine drives total, and nine idea_filters running in parallel, at surprisingly different rates of speed. It takes longer than I thought; it’s almost time to leave and they aren’t even close to done! I can’t wait, and I can’t stop the process, so I leave with the drives being encrypted. Hopefully they’ll finish in a hour or two and nobody will ever figure out how to use the floppy (not that guessing the passphrase would be easy).

9 The Abomination

I came back and - guess what, the SCSI bus was hung. Of all the bad luck!
I could tell that not all the drives had finished encrypting, so now I had nine drives, all with their beginnings encrypted, and the ends still plaintext.
This was terrible. I sat and thought about what I could do.

10 A Spark of Light

It occured to me that encrypted data has an entropy of nearly 1, and that disks write sectors at a time (or none at all), so the boundary between encrypted and decrypted is a boundary between sectors. It’s also possible to distinguish the entropy of a sector of plaintext from a sector of encrypted data, even if the plaintext is part of a binary, and maybe even if it’s part of a mp3 or compressed file. So, I could do a binary search on each disk, testing one sector for entropy, and either deciding it’s encrypted or unencrypted, and make my next test further ahead or behind, respectively.
I found source code to “ent” online, which did a wonderful suite of entropy tests. Using a script I cobbled together and “dd”, I extracted a sector at a time, decided if it was encrypted or not with ent. Ent conducts various tests, and the chi-square test is the most common in the academic literature; it tells you how likely a random distribution (e.g. encrypted) would be to exceed it. However, I found it rather useless for this purpose. Instead, I used the Shannon information-theory entropy measurement (using a threshold of about 7.5 bits per octet, determined empirically), and performed a binary search. Once I found the boundary, I did a few tests for a few octets on either side to make sure it wasn’t some local abberation. It was surprising how reliable an entropy test on 512 octets was; I found no abberations.
Most of the drives, I found the boundary just fine.
But on one, the one which had encrypted the most, there was a hitch.

11 Eureka!

It seemed that this fast Micropolis drive had reached the 2GB barrier, and when I used dd to sample the sector there, something weird happened.
No, dd didn’t crash; the hard drive crashed, and it hung the SCSI bus just like what had been going on all along.
Put simply; if I performed a disk operation that spanned the 231barrier, the SCSI bus hung.
It wasn’t my tools at all, or my operating system’s device driver.
It turns out that the hard drive itself had an unhandled signed integer overflow.
And further tests showed that all of the Micropolis 4.5GB HVD SCSI drives had this flaw; the reason why only one hung was that it had encrypted the fastest; once it hit this barrier, it hung the bus, and the host computer could no longer continue to operate on the other drives, so they stalled at some point earlier on.

12 The Fix is In

Now, it turns out that hard drives aren’t just little machines. They actually have some little microprocessors and related electronics, and these run software (called firmware) that interpret the SCSI commands and do stuff with the servo motors. This firmware is just software, written in assembly language.
I procured the tools that could download and upload the software to the drive over the SCSI bus. This was proprietary stuff written by Micropolis in DOS, and so I dual-booted into it, extracted the firmware images, and started doing research. I was in luck! The microprocessor was a common model (80186 if I recall correctly), and so with some disassembly and cooperation with people on the Internet (mostly a guy in Germany), I was able to fix the firmware. The drives never crashed that way again, and some lasted at least ten years.
Incidentally, the software used to reprogram the drives did not require any jumpers to be set on the disks. Thus, the possibility remains that some malware could actually reprogram the drives without the user knowing, turning them into “hot bricks” as one friend put it. Of course, that’s evil, so I’m glad people with those kinds of skills have better things to do with their time.

13 Coda

My best guess is that they used the same software from their 2GB models on their 4GB models, and it suddenly developed this bug. Coincidentally, Micropolis went out of business around this time. Too bad they hadn’t hired me to help them :-)

Copyright (C) 2013 travis+web@subspacefield.org