TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Do you know a good resource for large data scraping job?

9 pointsby hugo31370over 13 years ago
My company, Easy Vino (easyvino.com), is gearing up for beta release and we need to populate our database with wine lists. The job consists of extracting information from wine lists (which we have and are usually PDF, HTML or Pictures) to put it into our database.<p>We have a simple back office that connects to a wine API to search for wine info and we need help inputing the data. I'd rather have the same person (or team) doing this as the learning curve is significant.<p>Does anyone know a cheap resource for this type of task? Any help or reference is appreciated.<p>Thanks a lot!

3 comments

devs1010over 13 years ago
I'm not sure exactly what sort of answer you are expecting. Unless the data you want is in a standardized format (such as a standardized XML schema), any effort to extract data would require writing custom parsers for each set of data that has a different structure. I'm not sure if you are asking for advice on which technology stack to use for writing this or are looking for a pre-made tool that can extract this for you? There may be some tools that can "attempt" to do this without requiring you to write custom code but I am not sure how effective they would be.
评论 #3573986 未加载
ig1over 13 years ago
The typical way of doing this is to use mechanical turk, there are some third party services (their name escapes me) which are built on top of mturk to provide reliability.<p>The typical way they do this is to have two different people enter the data and when there's a mismatch have a supervisor decide which is right.
评论 #3573994 未加载
polyfractalover 13 years ago
You might have good luck just hiring some cheap Virtual Assistants to do this work for you. oDesk or elance are pretty good for these types of administrative tasks
评论 #3575067 未加载