Monday, October 21, 2013

Big Data Travails

Lots have been said recently about Big Data. So I figured I'd try it out. If anyone wants to commiserate, I've spent the last few months wrestling with a dataset that has 47 billion observations. Simple tasks like downloading the data took time to learn Amazon's s3 interface to program, weeks to run, and then, just transferring the files to a different computer has caused me to curse that the only compatible file system between macs and windows is fat32 which can't possibly handle the number of files I have. And simple zipping and unzipping requires shell and perl and python programming and days to run, and fault tolerant code that can handle the inevitable crashes that comes with running things that take days to complete. Doing all that while keeping the data encrypted and secure. Anyway, all that said, if anyone needs help with Big Data questions, I may have some relevant experience.

No comments: