May 22

New York Times TimesMachine

Tag: Amazon @ 5:17 pm

Derek Gottfrid and his colleagues at the New York Times have obviously been having a lot of fun with Amazon EC2.
Their latest offering is the TimesMachine. Print subscribers can access any issue of the New York Times, dating back to Volume 1, Number 1 in 1851. Non-subscribers can take a peek at 6 different (and historically significant) issues, including the inaugural edition, the end of World War I, and the sinking of the Titanic.
As they explained in their blog post, they used EC2, Hadoop, and some of their own code to convert 405,000 large TIFF images, 3.3 million SGML files, and 405,000 XML files to 810,000 PNG images and 405,000 JavaScript files. This didn’t take all that long:
"By leveraging the power of AWS and Hadoop, we were able to utilize hundreds of machines concurrently and process all the data in less than 36 hours."

The content itself is really interesting, […]

Read More…

Leave a Reply