TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Extract Web Data Easily with AI

29 点作者 t_a_v_i_s超过 1 年前

7 条评论

xz18r超过 1 年前
I'm someone who regularly scrapes websites and uses bs4 to format the data. Admittedly it takes me some time to get what I want exactly, but the past year ChatGPT has _drastically_ sped up the creation of my scripts. Given that I am your target audience, why would I pay a pretty steep 39USD for something I can do myself in 30 minutes - and without limits? There is a high correlation between scraping websites and being tech-savvy enough to produce a (simple) script to do so. The only reason I'd consider buying your product is if I have to do this so often that it would save me the time of writing ONLY the bs4 part of my scraping script (which is often the easy part; websites are awfully incoherent and buggy), but that would only happen if I wanted to scrape vastly more than 25k "lines in a csv" (which is what I assume 1 credit gives you, but is never explained).
评论 #38844889 未加载
ninjin超过 1 年前
Amazing. Pitched the same idea for a startup with my students back in 2020 (never got very far though as every member of the budding team got busy with other things). Was convinced we could turn the language models of the time into this back then and now a few years later it is done through API calls to a third party. We truly live in interesting times.
a_bonobo超过 1 年前
Nine months ago HN user genmon posted the results of his experiments to classify the BBC In Our Time podcast using ChatGPT: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35073603">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35073603</a><p>It&#x27;s the same principle. Give the podcast text to ChatGPT in chunks, ask it to classify using Dewey decimal system, pull out guests and mentioned books, and summarise the episode.<p>It&#x27;s funny that user nl posits that &#x27;We are months away from being able to do this with images too&#x27; - we were indeed months away! End of September, to be precise <a href="https:&#x2F;&#x2F;openai.com&#x2F;blog&#x2F;chatgpt-can-now-see-hear-and-speak" rel="nofollow">https:&#x2F;&#x2F;openai.com&#x2F;blog&#x2F;chatgpt-can-now-see-hear-and-speak</a>
mthoms超过 1 年前
One of the examples is scraping rei.com. The URLs extracted by the tool look like this:<p><pre><code> https:&#x2F;&#x2F;rei.comhttps:&#x2F;&#x2F;www.rei.com&#x2F;product&#x2F;207125&#x2F;asolo-eldo-gv-approach-shoes-mens</code></pre>
malfist超过 1 年前
Extract data easily, no mention of accurately.
评论 #38839197 未加载
tucnak超过 1 年前
&quot;What We Do<p>Kadoa is an AI-powered no-code platform that allows anyone to build complex data workflows effortlessly. We use AI to navigate, understand, and transform unstructured data from any source. The orchestrating AI agent chooses the best strategy for each task, such as where to go, what to extract, and how to format the data. We do this at scale and&quot;<p>I especially liked the &quot;We do this at scale and&quot; (verbatim) bit.<p>&#x2F; Trusted by<p>Probably no-one.<p>&#x2F; Why Kadoa<p>Because founders need to eat, too!<p>&#x2F; Popular use cases<p>There aren&#x27;t any, because this isn&#x27;t actually an established business but a GPT-4 wrapper that didn&#x27;t exist months ago.
评论 #38839213 未加载
评论 #38831212 未加载
nebula8804超过 1 年前
I tried to scrape a simple imdb actor page and it failed and required support to look at it. :&#x2F;