TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

What's good about offset pagination; designing parallel cursor-based web APIs

100 pointsby clraover 4 years ago

9 comments

gamplemanover 4 years ago
To point out the obvious: generally API providers don’t particularly want you to pararelize your request (they even implement rate limiting to make it harder on purpose). If they wanted to make it easy to get all the results, they would allow you to access the data without pagination - just download all the data in one go.
评论 #25624043 未加载
评论 #25623670 未加载
felixhuttmannover 4 years ago
A few thoughts:<p>1) AWS dynamodb has a parallel scanning functionality for this exact use case. <a href="https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;amazondynamodb&#x2F;latest&#x2F;developerguide&#x2F;Scan.html#Scan.ParallelScan" rel="nofollow">https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;amazondynamodb&#x2F;latest&#x2F;developerg...</a><p>2) A typical database already internally maintains an approximately balanced b-tree for every index. Therefore, it should in principal be cheap for the database to return a list of keys that approximately divide the keyrange into N similarly large ranges, even if the key distribution is very uneven. Is somebody aware of a way where this information could be obtained in a query in e.g. postgres?<p>3) The term &#x27;cursor pagination&#x27; is sometimes used for different things, either referring to an in-database concept of cursor, or sometimes as an opaque pagination token. Therefore, for the concept described in the article, I have come to prefer the term keyset pagination, as described in <a href="https:&#x2F;&#x2F;www.citusdata.com&#x2F;blog&#x2F;2016&#x2F;03&#x2F;30&#x2F;five-ways-to-paginate&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.citusdata.com&#x2F;blog&#x2F;2016&#x2F;03&#x2F;30&#x2F;five-ways-to-pagin...</a>. The term keyset pagination makes it clear that we are paginating using conditions on a set of columns that form a unique key for the table.
评论 #25630041 未加载
adontzover 4 years ago
I believe data export and&#x2F;or backup should be a separate API, which is low priority and ensures consistency.<p>Here we just see regular APIs are being abused for data export. I&#x27;m rather surprised the author did not face rate limiting.
评论 #25626564 未加载
eyelidlessnessover 4 years ago
I think keeping temporal history and restricting paginated results to the data at the point in time where the first page was retrieved would be a pretty decent way to solve offset based <i>interfaces</i> (regardless of the complexity of making the query implementation efficient). Data with a lot of churn could churn on, but clients would see a consistent view until they return to the point of entry.<p>Obviously this has some potential caveats if that churn is also likely to quickly invalidate data, or revoke sensitive information. Time limits for historical data retrieval can be imposed to help mitigate this. And individual records can be revised (eg with bitemporal modeling) without altering the set of referenced records.
评论 #25624368 未加载
ppeetteerrover 4 years ago
Pagination of an immutable collection is one thing and can be parallelized. Pagination of a mutable collection (e.g. a database table), on the other hand, is risky since two requests might return intersecting data if new data was added between the requests being executed.<p>True result sets require relative page tokens and a synchronization mechanism if the software demands it.
评论 #25625775 未加载
jasonhanselover 4 years ago
It&#x27;s important here that &quot;created&quot; is an <i>immutable</i> attribute. Otherwise you could get issues where the same item appears on multiple lists (or doesn&#x27;t appear at all) because its attributes changed during the scanning process.
arcbyteover 4 years ago
I think you could accomplish something similar with token pagination by requesting a number of items that will result in multiple &quot;pages&quot; for your user interface. Then as the user iterates through you can request additional items. This isn&#x27;t parallelizing, but provides the same low-latency user experience.
gigatexalover 4 years ago
From the code sample in the article I didn’t know you could append to a slice from within a go func
评论 #25625385 未加载
draw_downover 4 years ago
&gt; it uses offsets for pagination... understood to be bad practice by today’s standards. Although convenient to use, offsets are difficult to keep performant in the backend<p>This is funny. Using offsets is known to be bad practice because.... it’s hard to do.<p>Look I’m just a UI guy so what do I know. But this argument gets old because I’m sorry, but people want a paginated list and to know how many pages are in the list. Clicking “next page” 10 times instead of clicking to page 10 is bullshit, and users know it.
评论 #25628417 未加载
评论 #25626949 未加载
评论 #25624120 未加载
评论 #25624887 未加载