TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How the AI companies collect data to train models?

1 pointsby piotrkeabout 1 year ago
From the Internet, obviously, but how? Are they crawling through every website out there based on the IPs or domain names? Or do they piggyback on Google. Or is there all-internet-data store to just download the latest 'Internet data' dump?

1 comment

richardjam73about 1 year ago
They use datasets like common crawl.