There's an important distinction to be made between the storage layer and the an...

There's an important distinction to be made between the storage layer and the analysis layer. Something like HDFS can make sense as a storage layer once you hit the > 10TB range even if your average dataset for analysis is reasonably small (and it should be; 99% of the time you can get by with sampling down to single-machine size). That doesn't mean you need to be setting up all your analysis jobs to run via map-reduce; you can usually dump the dataset to a dedicated machine and do it all in one go with sequential algorithms. As a side benefit, you have access to algorithms that are really difficult to express efficiently as map-reduce (eg, computations over ordered time series).