TimesMachine: hackable browser for the public domain NY Times archive

The New York Times has quietly launched "TimesMachine," a slick, API-enabled browser for PDFs of the public domain archives of the paper's run from 1851 to 1922. The API allows anyone to hack their own custom browser for this amazing archive. (Note: the items retrieved by this tool bear copyright notices, but various public statements by the times affirm that this is freely usable, public domain stuff).

As part of eliminating TimeSelect, The New York Times has decided to make all the public domain articles from 1851-1922 available free of charge. These articles are all in the form of images scanned from the original paper. In fact from 1851-1980, all 11 million articles are available as images in PDF format. To generate a PDF version of the article takes quite a bit of work — each article is actually composed of numerous smaller TIFF images that need to be scaled and glued together in a coherent fashion.
Link, Link to blog post with background

Discussion

Take a look at this

'So we uploaded 4Tb of data to S3' Holy shamoly, how long would that take and how much would it cost? I'm worried about moving 100Gb of data about at the moment...

Take a look at this

$1000? Do they at least supply your Sappho juice?

Post a comment

Anonymous