Sunday, September 11, 2011

opensubtitles.com pretending to be big

For one of my datamining projects, I decided to spend a weekend trying to find out how many subtitles you can actually get online for free. It took a bit of parallel python writing to scrape an entire website but I found a surprising discrepancy between the claimed database size and the actual number of subtitles they provide.