- Github recently introduced hugely improved code search, one of those rare moments when a service I use adds a feature that directly and measurably measurably improves my life. Predictably, there was soon a flurry of breathless stories about the security implications. This shouldn't have been news to anyone - by now, it should be clear that better search in almost any context has security or privacy implications, a law of the universe almost as solid as the second law of thermodynamics. We saw this with Google's own code search, as well as Google proper, Facebook's Graph Search and even Bing. A certain fraction of people will always make mistakes, and and any sufficiently powerful search will allow bad guys to find and take advantage of the outliers.
After the dust had settled a bit I started wondering what else we could do with Github's search - other than snookering schmucks who checked in their private keys. I'm always enticed by data, and the combination of search and the ability to download raw checked-in files seemed like a promising avenue to explore. Lets see what we can come up with.