TechEcho

6 comments

shrikantover 14 years ago

IIRC, sriramk from around here (<a href="http://news.ycombinator.com/user?id=sriramk" rel="nofollow">http://news.ycombinator.com/user?id=sriramk</a>) had also 'rolled his own' web-crawler as a project in college about 5-6 (?) years back. He blogged about it fairly actively back then, and I really enjoyed following his journey (esp. when after months of dev and testing, he finally 'slipped it into the wild'). Tried to dredge up those posts, but he seems to have taken them down :( A shame really - they were quite a fascinating look at the early-stage evolution of a programmer!Sriram, you around? ;)

评论 #2022796 未加载

rb2k_over 14 years ago

Uh, look what the cat dragged in: my thesis :)Hope some of you enjoy the read, I'm open for comments and criticism

评论 #2022653 未加载

评论 #2022304 未加载

评论 #2022897 未加载

评论 #2022379 未加载

yesnoover 14 years ago

I like Ted Dziuba solution:<a href="http://teddziuba.com/2010/10/taco-bell-programming.html" rel="nofollow">http://teddziuba.com/2010/10/taco-bell-programming.html</a>Full-stack programmer at work!

评论 #2022657 未加载

inovicaover 14 years ago

A good read and very timely from my perspective. We created a crawler in Python a couple of years ago for RSS feeds, but we ran into a number of issues with it, so put it on hold as we concentrated on work that made money :) We started to look at the project last week and we've been looking at rolling our own versus looking at frameworks like Scrapy. The main thing for us is being able to scale. Anyone who has knowledge of creating a distributed crawler in Python I'd welcome some advice.Thanks again. Really good post

评论 #2022442 未加载

评论 #2022586 未加载

richcollinsover 14 years ago

I'm having good luck using node.js's httpClient and vertex.js for crawl state / persistence.

评论 #2023957 未加载

nlover 14 years ago

Can someone please explain what FPGA-aware garbage collection is?

评论 #2022691 未加载

6 comments

shrikantover 14 years ago

评论 #2022796 未加载

rb2k_over 14 years ago

Uh, look what the cat dragged in: my thesis :)Hope some of you enjoy the read, I'm open for comments and criticism

评论 #2022653 未加载

评论 #2022304 未加载

评论 #2022897 未加载

评论 #2022379 未加载

yesnoover 14 years ago

评论 #2022657 未加载

inovicaover 14 years ago

评论 #2022442 未加载

评论 #2022586 未加载

richcollinsover 14 years ago

I'm having good luck using node.js's httpClient and vertex.js for crawl state / persistence.

评论 #2023957 未加载

nlover 14 years ago

Can someone please explain what FPGA-aware garbage collection is?

评论 #2022691 未加载

Building blocks of a scalable webcrawler.

6 comments

Building blocks of a scalable webcrawler.

6 comments