Drives of Doom
travis+web@subspacefield.org
1 Introduction
Sometimes you go through life feeling like you haven't gotten anywhere
near your full potential, and that if someone would only give you
a chance, you'd be able to do something impressive. Maybe you know
it, and your friends know it, but how in the world do you explain
it to someone without sounding like a pompous fool? You could talk
about the thousands of little things that come up, but by themselves,
they're probably unimpressive. What you really need is an anecdote
that really exemplifies your potential. After some thinking, I realized
that I have one of these, something I've told in bits and pieces,
and which has been buried in my homepage, but has never been told
in its entirety, in part because hard drive encryption wasn't considered
prudent back then, it was considered outright paranoid. Time
has shown the wisdom of my approach, and now I suppose that hard drive
encryption is common enough that this tale can be told in full, and
in the incremental, investigative way that it occured.
2 The Enclosure
The time is sometime around 1995. The Internet, though open to the
public, is not a household word. "The Web" is two years old,
and the main browser is Mosaic. Netscape, which would later become
Mozilla Firefox, is in beta. Almost nobody has "home pages".
I am working for a company that sells database migration tools. The
largest SCSI drives are single digit gigabytes. There is one AT&T
"Teradata" machine in the building, and it is the size and shape
of a full-sized refrigerator. The name suggests that it actually holds
a terabyte of data, and is meant to impress, but at the time I was
still a bit skeptical.
Now for those of you who don't remember SCSI, let's have a history
lesson. At the time, most computers had ATA/IDE drives. These drives
had to be put two to a bus, and one had to be master, and one had
to be slave, and the slave suffered some serious performance penalties.
In most computers, this limited you to two hard drives; you sometimes
hooked up two CD-ROM drives to feel that the capacity wasn't wasted.
Since most hard drives were only a few gigabytes, and since ATA was
slow, if you were serious, you forked out bigger bucks for SCSI. SCSI
allowed you to "daisy chain" up to 7 hard drives for each controller,
though it was hard to fit that many in your case.
Around this time, the company is getting rid of some older parts.
One of them is a large external RAID storage array, apparently custom-made
by a company named "Eagle Storage" or something like that, which
held eight enterprise-class, full-height, 5.25" 2.5GB "wide"
differential SCSI drives. It was the size of the refrigerators meant
for dorm rooms, and had two built-in 300W power supplies and 80mm
fans (one either half of the unit), was awkward to carry, and weight
about 100 pounds when loaded with those drives.
Now, this got my attention! I was a year or so out of college, still
poor, and I had the ability to upgrade to 20GB of space. Not only
that, but I'd have differential SCSI, which allows you to run much
longer cables to the drives (allowing for external enclosures), and
wide SCSI, which allows you to have up to 15 drives on a chain. The
only problem was that I had to buy a relatively rare HVD SCSI controller.
I was able to find one on the Internet; it cost several hundred dollars,
but I was very happy to have it all. The drives were built like tanks,
probably several pounds each. Larger capacity drives were on the market,
but not "enterprise class" ones. So I had about 16-20GB of storage
in one computer, and I was quite happy with it.
3 Seduction by Pure Evil
Now, over the next year or two, some new 4.5 GB drives came on the
market. Specifically, Micropolis had some 7200RPM 4.5GB HVD SCSI drives
that came on the market, and I stumbled across a wonderful deal on
used ones. I don't have the prices now, but I could easily buy several
as a recent graduate getting paid $35-40k/yr, so it couldn't have
been very much, perhaps $50-100 per drive.
I assumed at the time that the resale market for wide, HVD SCSI drives
was simply too small; it was a rare technology, and so trying to sell
them was a bit like trying to sell Betamax tapes when VHS dominated.
But I should not have accepted such a glib explanation; I should have
dug further, because this is where I became ensnared. I bought four.
4 Death's Icy Embrace
For a while, they seemed to be working fine. However, every once in
a while, I'd notice the computer lock up. Later, I noticed that the
SCSI bus light was on. In some cases, the computer would still be
running, but anything that caused access to the disk drives would
simply freeze. You couldn't kill the processes that were frozen like
this; if you tried, your process would freeze. Eventually, the system
would slowly freeze "to death", and I'd have to do a hard reboot.
Attempting to do a graceful shutdown at any point would simply cause
the computer to freeze up completely, and so a hard reboot was necessary.
Now, the whole point of doing a clean shutdown is to avoid the possible
data corruption from a hard reboot, or "power cycle". So it
was no surprise to me to find, on occassion, that there was some corruption.
Files would end up in lost+found, and I'd move them back to their
original places.
But over time, this happened more and more. I blamed something about
the OS, not knowing the drives were at fault. I wrote a tool that
would examine what was in lost+found, particularly when it was a directory
with files and subdirectories, and automatically try and find where
to reattach it to the main file system tree. It was like you had a
picture of a tree, and every day you find pieces of branches laying
at its feet, and you have to graft them back.
Then I started noticing sectors of NULs in the middle of certain files.
This was a seriously insidious kind of corruption. It wasn't just
an annoyance now; I was having silent corruption of the contents
of my files. I had never lost significant data in my life, and this
computer held everything, including all my class assignments from
college. It was like a scrapbook and diary for me, and something was
eating away at it, silently. I had no idea how extensive the damage
was, since I didn't keep checksums of every file (indeed, that's silly,
since certain files are supposed to change contents).
5 The Security Landscape
Living in Austin as I was, the location of Steve Jackson Games,
I was familiar with the 1990 Operation Sundevil, which was
chronicled in The Hacker Crackdown. I met "Erik Bloodaxe"
of LoD, who later became an editor of Phrack, as well as "Minor
Threat", who had a few years earlier finished "Tone Loc",
which is (as of 2010) still a widely-used wardialing program.
The first edition of Applied Cryptography had come out in 1994, just
a few years earlier. It was widely regarded as a watershed event;
it came out just after the Clipper Chip, and during a time when you
could not export encryption technology by the same laws that prevented
exporting sophisticated military weaponry. Although it is accepted
now, many specialists back then who saw encryption as necessary for
the security of the Internet were having a hard time getting their
point across. By publishing Applied Cryptography, Bruce Schneier was
a modern Prometheus, stealing fire and putting knowledge of it in
the minds of the people.
Of course, this watershed moment had historical precedents; for example,
David Kahn's book "The Code Breakers", while largely historical,
did give out some information on how to apply cryptography, but this
was the first description of modern algorithms like DES, and better
yet, IDEA.
Were there encryption programs at this time? Well yes, Phil Zimmerman
wrote Pretty Good Privacy (PGP) in 19911, literally putting working software in your hand, and he is
perhaps more Promethean than Bruce Schneier. Prometheus was chained
to a rock, with an eagle tearing out his liver each day, which regrew.
Phil was hounded by the government after its worldwide spread, he
was forced to defend himself against these arms-trafficking regulations,
trying to prove that he didn't export it himself. That legal hounding
to prove a negative cost him quite a bit of money ($1M?) and so he
was forced to start a business around it; a business with which I
interviewed, and which is being purchased by Symantec at this very
moment, some 20 years after writing the program!
There was also a Norton disk encryption product called Diskreet, but
it only ran in DOS, and by this time, I had totally moved over to
NetBSD as my main operating system (Linux and FreeBSD were competitors
at the time). It was known to some of us that it only did DES encryption,
which I considered too weak, and that it didn't handle the keyspace
properly, making it a joke to break. In fact, the DESCHALL project
broke a DES-encrypted message in June 1997.
So, to summarize, there were threats against youg people or publishers
with computers, the information was out there, most of the tools were
unsuitable because they were using DES, or simply weren't designed
right, and there was one good program, PGP, which did a lot of what
you wanted.
6 My Crypto Contribution (idea_filter)
So, at this time, I was backing up hard drives to tapes. I wanted
to be able to encrypt things as a Unix filter, so that I could encrypt
tapes with something like this:
dump -cvf /dev/stdout / PIPE gzip -9c PIPE idea_filter PIPE dd of=/dev/rmt8
bs=8k
I may not have gotten the flags to dump right, but you get the idea.
I'd then compress it heavily, encrypt it, and then stream it out to
tape. DES was dead, and with no obvious candidate to choose, I took
Schneier's suggestion and picked IDEA. It was also getting used in
PGP.
So, as of 1997, I was encrypting my backup tapes, and I could therefore
leave them in an untrusted place, and nobody who found them would
be able to do anything with them. Other companies were paying Iron
Mountain to truck them off and store them deep in guarded vaults;
I kept mine at my friend's house. This is the power that crypto gives
you; you no longer need to care (much) about the confidentiality of
the data.
But wait, there's more. Once I had written this as a "streaming
filter" that read STDIN, I decided to try doing it a different way.
I'd take an IV and a filename as an argument, and I'd encrypt that
file in place, overwriting the old data. This was nice, because if
you were encrypting a file, you typically didn't want the plaintext
still lying around. Not even PGP could do this. And the way I did
it was interesting; I'd read a sector, encrypt it, back up a sector
(using lseek), and then write it. So the file was incrementally encrypted
from beginning to end.
This seems unimportant until you realize what this allows you to do.
Instead of just encrypting files, you can encrypt any block
device, like a disk partition, or the entire hard drive!
7 The Boot Disks
A big part of security is being prepared, thinking through your problems.
Back then, I used to worry about not being able to boot my system
(disks were less reliable back then). I did this in part by mirroring
the first slice of the disk (including MBR, boot sector, and /) to
another identical drive, so if one wouldn't boot I could swap in the
other. I had a little script which would do the mirroring, and then
fix up the names of the drives so that the mount points (which took
up most of these disks) all worked still.
Another part had to do with getting a minimal operating system on
a floppy. At that time, I could build a kernel containing only the
drivers I needed (using BSD's kconfig system). But userland was a
bigger problem, especially since most programs were statically linked
(dynamic linking hadn't spread to us yet). It turned out there was
a program called crunchgen, which would allow you to compile several
programs together, and they'd share the same library functions, making
a huge executable you could link to several names, and when called
with that name, it behaved as that program. This worked wonderfully,
and so I included idea_filter in that list of programs, and I made
"rescue" CDs which could be used to repair a broken system.
8 The Big Con
It's 1998 and I'm preparing for the drive to San Antonio for the 7th
USENIX Security Symposium. Well, time to put this thing to work! I
boot off the floppies, and start idea_filter encrypting in-place
on the raw disk devices. There are nine drives total, and nine idea_filters
running in parallel, at surprisingly different rates of speed. It
takes longer than I thought; it's almost time to leave and they aren't
even close to done! I can't wait, and I can't stop the process, so
I leave with the drives being encrypted. Hopefully they'll finish
in a hour or two and nobody will ever figure out how to use the floppy
(not that guessing the passphrase would be easy).
9 The Abomination
I came back and - guess what, the SCSI bus was hung. Of all the bad
luck!
I could tell that not all the drives had finished encrypting, so now
I had nine drives, all with their beginnings encrypted, and the ends
still plaintext.
This was terrible. I sat and thought about what I could do.
10 A Spark of Light
It occured to me that encrypted data has an entropy of nearly 1, and
that disks write sectors at a time (or none at all), so the boundary
between encrypted and decrypted is a boundary between sectors. It's
also possible to distinguish the entropy of a sector of plaintext
from a sector of encrypted data, even if the plaintext is part of
a binary, and maybe even if it's part of a mp3 or compressed file.
So, I could do a binary search on each disk, testing one sector for
entropy, and either deciding it's encrypted or unencrypted, and make
my next test further ahead or behind, respectively.
I found source code to "ent" online, which did a wonderful suite
of entropy tests. Using a script I cobbled together and "dd",
I extracted a sector at a time, decided if it was encrypted or not
with ent. Ent conducts various tests, and the chi-square test is the
most common in the academic literature; it tells you how likely a
random distribution (e.g. encrypted) would be to exceed it. However,
I found it rather useless for this purpose. Instead, I used the Shannon
information-theory entropy measurement (using a threshold of about
7.5 bits per octet, determined empirically), and performed a binary
search. Once I found the boundary, I did a few tests for a few octets
on either side to make sure it wasn't some local abberation. It was
surprising how reliable an entropy test on 512 octets was; I found
no abberations.
Most of the drives, I found the boundary just fine.
But on one, the one which had encrypted the most, there was a hitch.
11 Eureka!
It seemed that this fast Micropolis drive had reached the 2GB barrier,
and when I used dd to sample the sector there, something weird happened.
No, dd didn't crash; the hard drive crashed, and it hung
the SCSI bus just like what had been going on all along.
Put simply; if I performed a disk operation that spanned the 231barrier,
the SCSI bus hung.
It wasn't my tools at all, or my operating system's device driver.
It turns out that the hard drive itself had an unhandled signed
integer overflow.
And further tests showed that all of the Micropolis 4.5GB HVD
SCSI drives had this flaw; the reason why only one hung was that it
had encrypted the fastest; once it hit this barrier, it hung the bus,
and the host computer could no longer continue to operate on the other
drives, so they stalled at some point earlier on.
12 The Fix is In
Now, it turns out that hard drives aren't just little machines. They
actually have some little microprocessors and related electronics,
and these run software (called firmware) that interpret the
SCSI commands and do stuff with the servo motors. This firmware is
just software, written in assembly language.
I procured the tools that could download and upload the software to
the drive over the SCSI bus. This was proprietary stuff written by
Micropolis in DOS, and so I dual-booted into it, extracted the firmware
images, and started doing research. I was in luck! The microprocessor
was a common model (80186 if I recall correctly), and so with some
disassembly and cooperation with people on the Internet (mostly a
guy in Germany), I was able to fix the firmware. The drives never
crashed that way again, and some lasted at least ten years.
Incidentally, the software used to reprogram the drives did not require
any jumpers to be set on the disks. Thus, the possibility remains
that some malware could actually reprogram the drives without the
user knowing, turning them into "hot bricks" as one friend put
it. Of course, that's evil, so I'm glad people with those kinds of
skills have better things to do with their time.
13 Coda
My best guess is that they used the same software from their 2GB models
on their 4GB models, and it suddenly developed this bug. Coincidentally,
Micropolis went out of business around this time. Too bad they hadn't
hired me to help them :-)
Footnotes:
1http://www.philzimmermann.com/EN/essays/WhyIWrotePGP.html
File translated from
TEX
by
TTH,
version 3.85.
On 4 Sep 2010, 11:23.