Good ideas and conversation. No ads, no tracking. Login or Take a Tour!
linked from here: https://www.techdirt.com/articles/20110725/05335715239/dailydirt-big-data-isnt-necessarily-better.shtml
- For example, Google’s algorithm was quite vulnerable to over-fitting to seasonal terms unrelated to the flu, like “high school basketball.” With millions of search terms being fit to the CDC’s data, there were bound to be searches that were strongly correlated by pure chance, and these terms were unlikely to be driven by actual flu cases or predictive of future trends.
For having painfully experimented data over-fitting. It's so subtle and powerful, and deadly, it will probably the demise of any future serious machine-learning system.
The more data you have, the more prone the system to over-fitting.
And it seems machine learning function like a black-box, : the programmers weren't able to understand the processes leading to certain moves by AlphaGo (the go playing machine).
If you cant check how the decision came to be, you're even more subject to have an obvious over-fitting problem on your hand . Eg: basketball linked to flu.