The schema evolved to:
After 2 iterations (100 000 each) with a week interval my spider following the http://wordpress.com/next/ link found only 1733 blogs. But when I made the spider crawl the found blogs’ blogrolls it found another 4322 blogs (in wordpress.com only!). Why? Does the next link show only the active blogs or the rest are just spamblogs? We’ll find out soon (I hope :).
Some preliminary analysis results:
When do we blog (nr of posts)?