But it appears that for both Facebook and Yahoo, those same clusters are unnecessary for many of the tasks which they’re handed. In the case of Facebook, most of the jobs engineers ask their clusters to perform are in the “megabyte to gigabyte” range (pdf), which means they could easily be handled on a single computer—even a laptop.

    The story is similar at Yahoo, where it appears the median task size handed to Yahoo’s cluster is 12.5 gigabytes. (pdf) That’s bigger than what the average desktop PC could handle, but it’s no problem for a single powerful server.



mk:

Unless you deal with large datasets, I think it's difficult to understand just how difficult designing meaningful analysis becomes. In the end, when it comes to market analysis, you'd probably do just as well going with the 'gut feeling' of someone that has demonstrated a keen understanding of the market. Otherwise, mining data like user behavior is probably only going to allow you to tweak things around the edges. The problem being, you might learn two important things, and in reacting to them, work counter to a third and possibly more important thing that you didn't even know to look for. -That's why I think A/B testing is most often bunk.

As an aside, I just started changing the way tags are stored, and have been trying to anticipate what information might be best to store with them so that we can use them more effectively. I can't anticipate using a separate machine to crunch these data any time soon.


posted 3993 days ago