科技回声

9 条评论

gampleman超过 4 年前

To point out the obvious: generally API providers don’t particularly want you to pararelize your request (they even implement rate limiting to make it harder on purpose). If they wanted to make it easy to get all the results, they would allow you to access the data without pagination - just download all the data in one go.

评论 #25624043 未加载

评论 #25623670 未加载

felixhuttmann超过 4 年前

A few thoughts:1) AWS dynamodb has a parallel scanning functionality for this exact use case. <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan" rel="nofollow">https://docs.aws.amazon.com/amazondynamodb/latest/developerg...</a>2) A typical database already internally maintains an approximately balanced b-tree for every index. Therefore, it should in principal be cheap for the database to return a list of keys that approximately divide the keyrange into N similarly large ranges, even if the key distribution is very uneven. Is somebody aware of a way where this information could be obtained in a query in e.g. postgres?3) The term 'cursor pagination' is sometimes used for different things, either referring to an in-database concept of cursor, or sometimes as an opaque pagination token. Therefore, for the concept described in the article, I have come to prefer the term keyset pagination, as described in <a href="https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/" rel="nofollow">https://www.citusdata.com/blog/2016/03/30/five-ways-to-pagin...</a>. The term keyset pagination makes it clear that we are paginating using conditions on a set of columns that form a unique key for the table.

评论 #25630041 未加载

adontz超过 4 年前

I believe data export and/or backup should be a separate API, which is low priority and ensures consistency.Here we just see regular APIs are being abused for data export. I'm rather surprised the author did not face rate limiting.

评论 #25626564 未加载

eyelidlessness超过 4 年前

I think keeping temporal history and restricting paginated results to the data at the point in time where the first page was retrieved would be a pretty decent way to solve offset based interfaces (regardless of the complexity of making the query implementation efficient). Data with a lot of churn could churn on, but clients would see a consistent view until they return to the point of entry.Obviously this has some potential caveats if that churn is also likely to quickly invalidate data, or revoke sensitive information. Time limits for historical data retrieval can be imposed to help mitigate this. And individual records can be revised (eg with bitemporal modeling) without altering the set of referenced records.

评论 #25624368 未加载

ppeetteerr超过 4 年前

Pagination of an immutable collection is one thing and can be parallelized. Pagination of a mutable collection (e.g. a database table), on the other hand, is risky since two requests might return intersecting data if new data was added between the requests being executed.True result sets require relative page tokens and a synchronization mechanism if the software demands it.

评论 #25625775 未加载

jasonhansel超过 4 年前

It's important here that "created" is an immutable attribute. Otherwise you could get issues where the same item appears on multiple lists (or doesn't appear at all) because its attributes changed during the scanning process.

arcbyte超过 4 年前

I think you could accomplish something similar with token pagination by requesting a number of items that will result in multiple "pages" for your user interface. Then as the user iterates through you can request additional items. This isn't parallelizing, but provides the same low-latency user experience.

gigatexal超过 4 年前

From the code sample in the article I didn’t know you could append to a slice from within a go func

评论 #25625385 未加载

draw_down超过 4 年前

> it uses offsets for pagination... understood to be bad practice by today’s standards. Although convenient to use, offsets are difficult to keep performant in the backendThis is funny. Using offsets is known to be bad practice because.... it’s hard to do.Look I’m just a UI guy so what do I know. But this argument gets old because I’m sorry, but people want a paginated list and to know how many pages are in the list. Clicking “next page” 10 times instead of clicking to page 10 is bullshit, and users know it.

评论 #25628417 未加载

评论 #25626949 未加载

评论 #25624120 未加载

评论 #25624887 未加载

9 条评论

gampleman超过 4 年前

评论 #25624043 未加载

评论 #25623670 未加载

felixhuttmann超过 4 年前

评论 #25630041 未加载

adontz超过 4 年前

评论 #25626564 未加载

eyelidlessness超过 4 年前

评论 #25624368 未加载

ppeetteerr超过 4 年前

评论 #25625775 未加载

jasonhansel超过 4 年前

arcbyte超过 4 年前

gigatexal超过 4 年前

From the code sample in the article I didn’t know you could append to a slice from within a go func

What's good about offset pagination; designing parallel cursor-based web APIs

9 条评论

What's good about offset pagination; designing parallel cursor-based web APIs

9 条评论