TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Strategies to download data constantly changing via API

2 点作者 rupestrecampos大约 1 个月前
I have to download a dataset through one API (WFS provided by geoserver) that tells me the total amount of items and delivers at maximum 1000 items per request and I can sort by one field and offset the requests start index. The layer has ~1Million items. I can use at maximum 5 parallel request before API gets overloaded.<p>Problem is that items are being added and removed in real time, so at the end of the copy process I already have stale data copied and there are new items to be copied over. So what would you do, or have done in this situation? Start a never ending loop to crawl data all day long would be something evil or is it something to be fixed on provider side?<p>The api url is https:&#x2F;&#x2F;geoserver.car.gov.br&#x2F;geoserver&#x2F;sicar&#x2F;wfs<p>Source data website: https:&#x2F;&#x2F;consultapublica.car.gov.br&#x2F;publico&#x2F;imoveis&#x2F;index

1 comment

stop50大约 1 个月前
I currently only have my phone, so i can&#x27;t judge the API. From my point a full scrape at regular intervals is not that bad. Its only 1000 requests. Depending on the data and querymethods you xan make fresh data appear sooner than removing old data. The major question is: how fresh do you need your data?<p>Not every application needs realtime data, querying it only on occasion or every few hours can be good enough.
评论 #43559709 未加载