Analyzing One Million robots.txt Files

This is a delightfully fun read analyzing the robots.txt files of the top million websites. Interesting history about the origins of the robots specification and the fact that it was never standardized. There isn’t even an RFC for it! As a bonus the code excerpts to show the analysis are all in Python and was the first time I’ve seen the collections module.

📌 Posted on September 20, 2017

