Database leaks are as immortal and toxic as nuclear spills -- let's start acting like it
My latest Guardian column is online: "Personal data is as hot as nuclear waste," which looks at the immortality of databases -- just as it's impossible for the Internet to scourge itself of Paris Hilton's terrible genitals, it is likewise impossible that the personal information hemorrhaged by the likes of Her Majesty's Revenue and Customs (25 million records!) will ever go away. In the era of infinite copying, this information is like a nuclear disaster, immortal and terrible in its consequence. The only way to contain future spills is to make every person who gathers information on his neighbours pay in advance for the long-term handling and storage of that undying, toxic sludge:
If we are going to contain every heap of data plutonium for 200 years, that means that every single person who will ever be in a position to see, copy, handle, store, or manipulate that data will have to be vetted and trained every bit as carefully as the folks in the rubber suits down at the local fast-breeder reactor.LinkEvery gram - sorry, byte - of personal information these feckless data-packrats collect on us should be as carefully accounted for as our weapons-grade radioisotopes, because once the seals have cracked, there is no going back. Once the local sandwich shop's CCTV has been violated, once the HMRC has dumped another 25 million records, once London Underground has hiccoughed up a month's worth of travelcard data, there will be no containing it.
And what's worse is that we, as a society, are asked to shoulder the cost of the long-term care of business and government's personal data stockpiles. When a database melts down, we absorb the crime, the personal misery, the chaos and terror.


the latest
latest episodes
Look at the HMRC website and see if you can find a coherent policy on data encryption.
One document they have says they only accept WinZip self-extracting .EXE's, another says they'll take WinPT -- a version of PGP.
Neither of those cases handle companies whose information is on mainframe computers, which accounts for employees from large businesses.
There isn't a coherent policy.
What we desperately need is encryption implemented as a layer beneath existing applications, so that the entire staff of the UK government + public sector can be brought onto a "safer" system with minimal training expense.
AND: A key-management system that prevents key loss and enforces policy, or at least flags up cases where policy might be being broken.
eg. no one person needs to download the entire HMRC database. A person can't realistically assess more than one record every minute 9-5, so a system that limits the number of records any one account can retrieve to a sensible number would prevent all 25 million records being retrieved in one go.
The reason we had a 25 million record hemorrhage was because it was *quicker for the guy on the shop floor* to dump the entire database than to select a random subset and send that, which was actually what was asked for in the first place.
Queries run in aggregate would not need a rate limit imposed, but would need some noise adding to prevent personal data being identified by making aggregate queries that differed by just one record and subtracting.
The mechanics of encryption is well understood. I've got AES-256 on the command-line in linux with a simple sudo apt-get install.
Key management is what is preventing large scale implementation.
Suppose the UK police have a court order to look at some data you hold as evidence, say on one of your employees or your customers - but your business has lost the key to that data.
Then you are held liable for obstructing justice.
If I had to guess, I'd say adopting this proposal wouldn't discourage data collection, just pass on another cost. Or, worse, provide a rationale for your government to make sure you don't have any "plutonium-grade data" on your computer.
Companies should be masking their toxic data to reduce the chances of the really nasty stuff getting out to the world...
http://www.datamasking.com/
... and this works on mainframe too ;)
What we need to do is to stop using identifiers as secret keys.
And if we put the liability of identity theft squarely on the institutions that fall for it, then this problem with fix itself. It's too easy to pass the buck now.
(Why is there no "post" button on the preview page?)
Dammit Cory, you posted the link right after I used the Build A Data Breach Bear case to develop my own data analogy:
...thinking it was familiar and annoying to a large part of my readership.
Maybe I ought to liken data to the contents of a newborn's nappy: Despite your best efforts at containment, it WILL leak into places you don't want it and cause an almighty stink.
Great article. I have expressed similar thoughts - but not as eloquently as this.
See: http://canton.elegal.ca/archives/2007/05/#a001129
http://canton.elegal.ca/archives/2007/12/#a001283
http://canton.elegal.ca/archives/2007/11/#a001265
http://canton.elegal.ca/archives/2007/03/#a001052
Perhaps our man-made data, in its complex digital form, is reflecting its nature. Information Wants To Be Free from the tyranny of bondage. Our data, and all the issues surrounding it, are interacting in a flawed human system. The leaks in security may simply be liable to the Damn Law, which states that Damns always leak, at least just a little. Water always finds a way, even if it's a micro-crack in the surrounding rock. Nature "leaks." Isn't Hawking radiation the ultimate example of this? So, should we ever expect data that is 100% safe. No, of course not, because this Univere of ours seems to require hazard, chaos, randomness, sudden failure, evolution. Nothing will keep us or our data safe when HUMANS are involved. And who knows if our AI will be any better at keeping secrets. I tend to think they will be a tight bunch that will gossip amongst themselves.
I always make sure any externally held personal data is seeded with some actionable falsehoods.
That way,I can sue the ass off anyone who leaks it.
Who said we ever owned the government or business the truth?
Visit China, discover a whole nation that understood millennia ago the value of lies and silence where the Emperor was concerned.
Personal data may last a long time, but after you are dead, it is not such a big deal. Nuclear waste could poison your great-grandchildren.
Personal data in the wrong hands can ruin your year and create a major pain in the ass. Nuclear waste can give you cancer and kill you.
I think nuclear waste is a poor choice for analogy, trivializing a problem larger than data leaks.
and if your family is disgraced and or bankrupted because of electronic lies about you, this is not hurting the children? Oi vey!
The fact that a person can be bankrupted (extremely unlikely) as a result of personal data leaks is a problem with the lawes in this country, not a problem with data.
Data != Nuclear Waste
ever try getting a lie about you out of government records?
Takuan said, "Who said we ever owned the government or business the truth?"
It depends on the person and the situation. A blanket rule is not going to work. But, I would think that in an ideal world, the government would run better if it was using the best data. As for a business: that depends one what you're talking about. The Truth is often subjective in nature.
Ever try getting nuclear waste out of your government records?
@14
I think it depends on the government. Would you trust your current government?
As to subjectivity; exactly. What I am concerned about is the effect on ME of what the government perceives to be he truth.
In any case,when a database is widely known to be riddled with mistakes and inaccuracies it is no longer lethal. In a democracy. The people's best defense is to spoof the system with a snowstorm of false data.
"Scourge" doesn't mean what you think it means.
@16: "In a democracy. The people's best defense is to spoof the system with a snowstorm of false data."
Why is that the best defense? At what level of Government does the snowstorm of false data begin? Because, I'm thinking of telling the post office that I live at a different address than I actually do. And I was thinking that spoofing the IRS would be goof for a lot of laughs this year. Ha! And, the local school board needs some good data, so I'm going to feed them some real crap. And...You get the point. As much as we would like to think that we are islands, we are not.
Then go for peninsula. Abject surrender is NOT the way to go.
I don't think anyone is talking about abject, or any other sort of surrender. We have relationships with our government, and at times they can be rewarding. The goal is to make the system better, not mess it up for our own personal satisfaction.
"Paris Hilton's terrible genitals" - dang, Cory went and named yet another alt/quirk rock band.
You have a touching faith. Myself, I watch what actually happens.
Perhaps what you perceive as a system exists more in your hopes and aspirations than in the minds of those running it.
Government ALWAYS betrays. It is up to us to decide how much we will tolerate. Either that or get on the gravy train(if one belongs to a permitted category of course).
Takuan, don't try to minimalize my opinion by mentioning a word that you seem to have little regard for: Faith. Goverment does not Always betray. If you think otherwise then I guess you have an abstract notion of government. All human systems are imperfect, but the answer is not to retreat to Paranoid Island. Unless that's where you are most happy. And they do have nice weather there.
I'm with Jeff @7, without the "humans suck" overtones.
The problem of data security is exactly the same as the problem of DRM. You are creating a "secure" database and then connecting it to a bunch of systems designed to... give lots of people access to it. If you didn't do this, it wouldn't be useful. And the security system only has to fail once to produce permanent risk to the individuals whose data has been copied.
The problem is that unlike physical leaks, which can be sealed and cleaned up, data "leaks" can't, in exactly the same way that DRM cracks can't be "sealed" once they're in the wild (lame attempts at key revokation notwithstanding.)
So "leak" is an inappropriate metaphor for what is happening in these cases. Extreme and compelling as it is, Cory's nuclear waste analogy actually doesn't go far enough.
Letting personal data into the wild is more like the introduction of a new species into an ecosystem: a permanent, ineradicable nuisance that can sometimes be life threatening. Think "killer bees" rather than "nuclear waste".
Copies of your personal data will be made and will multiply in the hands of the black hats for the rest of your life, no matter what measures are taken to stop them. And it only takes one "queen" to escape from the "secure" hive for an ineradicable infestation to occur.
There's even a chance that mishandling or deliberate editing of the data will cause it to mutate along the way, depending on what purposes those who have their hands on a copy want to put it to.
No security system is foolproof. Physical security systems don't have to be to do their job. Digital security systems, be they DRM or government databases, do.
This is not to say that there are not policies that will slow the escape of data. Those policies should be implemented. But the only way to make data secure is not to gather it in digital form in the first place, which suggests that in general government and corporate policy should be aimed at collecting as little data as possible.
Furthermore, the digital collection of any information that is not specifically required to serve the individual's needs should simply be forbidden, whether it be by corporations or governments or anyone else.
minimalize ? Pas du tout! I state what I see. You espouse benign government. I have observed such does not exist. I have evidence, where is yours? Hence: "faith".
The weather here is indeed fine. Unclouded, one might say.
Paris Hilton's terrible genitals are on teh Internets?
I'm going to have to get on that thing one day.
Takuan, don't you have any friends that work for the government? I do. People that make the government work: local folks, congress men, senitors, FBI,NSA,CIA...all government workers. They aren't evil because they work for the government. They're good, honest, loving people that work hard to make a living while trying to do what they think is right. Don't throw the baby out with the bath water. YES government is flawed, at least ours is here in the USA. So what. My family is flawed! Just saying something is bad all the time doesn't make it work any better.
hmmm, friends, friends, friends (riffle riffle).Um,nope! Last one I knew quit because the stress of watching children be abused and neglected for lack of funds finally got to her.
I never said all civil serpents were evil, just the system. Though some indeed are evil.
Many government employees let bad things happen and continue under their watch because they, like us, are really just trying to stay alive till the next cheque. This is still morally and ethically wrong even if it is understandable. "I vas only following orders,ja!".
I don't trust government for good reason. If others wish to, I can't do much about that. They are enablers though. Even if they are convinced what they are doing is "right". Too bad I'm dragged along for the ride.
The best way to keep them semi-honest is to continually beat them with a large, be-nailed stick: question EVERYTHING, never freely cooperate and ALWAYS keep records of all they say and do.
The title was such ridiculous hyperbole.
danegeld (#1) - I like the rate-limiting idea. I'm not sure about the rate - for scanning a list, I think most people could do 10/s or so in bursts - but even then, 25 million records is way beyond the limit. A quota of 100,000 records per day is probably generous, but even just keeping count would mean that the data compliance officer could follow up on any large queries.
I would suggest aggregate queries should be simply subject to the same limits, but on the result rows. After all, that's what you're looking at...
Limiting aggregate queries to avoid subtraction etc is probably not workable; besides, either the person needs access to individual records in any case, so there's no point, or they shouldn't have access to the detailed data at all. They can get a sanitized copy, or if they need live data the DB admin can set up a sanitized VIEW for them.
As for a coherent security policy, one of the stories I've read claimed that there was, in fact, a manual, but it was considered too sensitive and classified above the level of the person who leaked those 25 million records. Which shows the down-side of security policies: "need to know basis" assumes that the need is accurately assessed...
Cory, you have it backwards. Free the data and then it's worthless:
http://securosis.com/2008/01/23/cory-has-it-wrong-we-should-free-the-data/