Sure if you're dealing with 1GB of data it probably isn't worth spinning up a Hadoop cluster to run your analysis. However, if you already have Hadoop up an running for something that genuinely requires it, that 1GB job might make sense there. The data may already be in HDFS, and you already have the infrastructure there to manage and monitor jobs.
The references to Facebook & Yahoo running small jobs on huge clusters may be a little misleading. It may be simply the easiest place for them to deploy those jobs consistently.
But yeah... "Big Data" is a total meaningless buzzard.
Like that huge firetruck used to put out small fires. Cities only need them for big fires, but, if you gotta have one and keep it ready, it makes sense to deploy it every time.
The references to Facebook & Yahoo running small jobs on huge clusters may be a little misleading. It may be simply the easiest place for them to deploy those jobs consistently.
But yeah... "Big Data" is a total meaningless buzzard.