Petascale data-centers in Nature

I wrote a feature for this week's issue of the journal Nature on "petascale" data-centers -- giant data-centers used in scholarship and science, from Google to the Large Hadron Collider to the Human Genome and Thousand Genome projects to the Internet Archive. The issue is on stands now and also available free online. Yesterday, I popped into Nature's offices in London and recorded a special podcast on the subject, too. This was one of the coolest writing assignments I've ever been on, pure sysadmin porn. It was worth doing just to see the the giant, Vader-cube tape-robots at CERN.
At this scale, memory has costs. It costs money — 168 million Swiss francs (US$150 million) for data management at the new Large Hadron Collider (LHC) at CERN, the European particle-physics lab near Geneva. And it also has costs that are more physical. Every watt that you put into retrieving data and calculating with them comes out in heat, whether it be on a desktop or in a data centre; in the United States, the energy used by computers has more than doubled since 2000. Once you're conducting petacalculations on petabytes, you're into petaheat territory. Two floors of the Sanger data centre are devoted to cooling. The top one houses the current cooling system. The one below sits waiting for the day that the centre needs to double its cooling capacity. Both are sheathed in dramatic blue glass; the scientists call the building the Ice Cube. Blank slate

The fallow cooling floor is matched in the compute centre below (these people all use 'compute' as an adjective). When Butcher was tasked with building the Sanger's data farm he decided to implement a sort of crop rotation. A quarter of the data centre — 250 square metres — is empty, waiting for the day when the centre needs to upgrade to an entirely new generation of machines. When that day comes, Butcher and his team will set up in that empty space the yet-to-be-specified systems for power, cooling and the rest of it. Once the new centre is up, they'll be able to shift operations from the obsolete old centre in sections, dismantling and rebuilding without a service interruption, leaving a new patch of the floor fallow — in anticipation of doing it all again in a distressingly short space of time.

The first rotation may come soon. Sequencing at the Sanger, and elsewhere, is getting faster at a dizzying pace — a pace made possible by the data storage facilities that are inflating to ever greater sizes. Take the human genome: the fact that there is now a reference genome sitting in digital storage brings a new generation of sequencing hardware into its own. The crib that the reference genome provides makes the task of adding together the tens of millions of short samples those machines produce a tractable one. It is what makes the 1000 Genomes Project, which the Sanger is undertaking in concert with the Beijing Genomics Institute in China and the US National Human Genome Research Institute, possible — and with it the project's extraordinary aim of identifying every gene-variant present in at least 1% of Earth's population.

Big data: Welcome to the petacentre, Podcast about Petacentres, My Flickr photos of petacenters

Discussion

Take a look at this

Oh squee. Awesome article, and the Nature podcast is one of my favourites.

I hope you had a chance to stroll around the Genome Campus - it's well pretty out there.

Take a look at this

Powering and cooling all that Computronium is the next big challenge. ISTR reading that Datacentres were using 2-3% of US electricity.

More and more I'm thinking that the key to Cleantech is not trying to solve the oil problem but in creating sources of cheap clean electricity. It just seems like plentiful electricity solves all the other problems.

Take a look at this

Holy crow, it's Shalmaneser!

Take a look at this

hmm, a krell data-center.
better watch out for those 'monsters from the Id'

Take a look at this

My first thought was, how similar this is to the HAL computer from 1968 film "2001: A Space Odyssey" by Stanley Kubrick and Arthur C. Clarke
http://en.wikipedia.org/wiki/Image:Hal_brain_room605.JPG

Take a look at this

@2 Interesting points. I would think there is a positive impact, in the form of economies-of-scale, to 'one' large center rather than several smaller centers. Great article also.

Take a look at this

Based on the headline, I was expecting something significantly more awesome, like spontaneously-assembling data centers discovered at the bottom of the ocean.

Take a look at this

Two floors devoted to air conditioning. Sigh. When are the designers of datacentres going to realize that for at least six months of the year (more in Sweden and Canada), all they have to do is open a vent to the outside and they can get all the cooling they need for free, dramatically reducing their energy bill and their environmental impact?

Take a look at this

If only there was a way to harvest that heat....

Take a look at this

Two steps forward, one step back. Hard storage of backup data is still required. We're going to have to figure out something better than tape or we're going to have to give up the notion of having hard records. Either that or keep a hardened facility deep under the lunar surface.

Take a look at this

Thanks for this!

One of the more amusing anecdotes I heard on the problem of heat-generating data centres was to locate them far to the north, and in wintertime, pile a mountain of snow and ice on the roof for overall cooling. The cost in snowblowers just about makes sense. *g*

Take a look at this

Not to be confused with PETAscale data-centers, the data-centers which use naked women to try to entice you to become a vegetarian.

Take a look at this

gotta love them Maenads! Or else.

Take a look at this

The first thing that popped into my head (well, okay, the second thing - the first was that this picture looks remarkably like the hallway in the prison block Princess Leia was being held in) is what a great opportunity for cogeneration this is.

Put this in the middle of a northern city somewhere and run the coolant out to various tenements, and you could probably do your data processing almost for free (not counting sunk costs, of course).

Take a look at this

I don't think a two hour Solexa run produces 320 TB. That's 1/3 a petabyte. My gut tells me that's wrong. In addition I've used the Solexa (Illumina Genome Analyzer) and it takes three days to output about 700 GB.

Take a look at this

#9, #15.: What should they do about the heat for the other 6-8 months of the year? Regardless of what they can do about the heat during the winter, they can't shut the data center down from April to November. SO they still need the monster cooling facilities.

Post a comment

Anonymous