TIL different ways to paginate API calls
POSTED ON:
I think about REST APIs a lot, both from a consumer-level (I am consuming the API data!) and a architecture-level.
Imagine if you were making something like Twitter, where you make a API call for a /[user_name]/tweets
. What would it send back?
If your answer is "Well, all of the user's tweets", great!
We're halfway there.
What if the user is addicted to your service, and generates 1,000 tweets per day for a year straight? Will your API respond with 365,000 tweets? What if they want that data, but by date order? How about by most reactions? How about tweets from when they were in Chicago between the summer.
Sure, you can create filters and queries to output a specific amount. But that assumes the consumer knows what to look for.
When exposing large data sets through APIs, it needs to provide a mechanism to paginate the list of resources.
That way, instead of 365,000 tweets shared... maybe you're sharing segments of 1,000 tweets. Smaller payload, faster delivery, less angry person.
Welcome to thinking about Rest APIS!
TIL - the various Paginating requests in API, via
Ignacio Chiazzo!
A major rule of creating endpoints: Exposing endpoints is very easy. Deprecating and deleting them is extremely hard and potentially impossible.
Things to consider:
- Can the list be sorted?
- Is there any default order?
- Can the list be filtered? If so, which filter params should it accept?
- Are the queries, executed under the hood, performant enough?
- When exposing a large data set, how much do you expose without abuse?
Pagination #
The most common pagination techniques are Page-based pagination (also called offset-based pagination), KeySet-Based pagination, and Cursor-based Pagination.
Page — based pagination #
tl;dr - Divid the content into pages.
Pros:
- You can jump to any particular page, not need to query 99 pages to get the page 100.
- It allows sending parallel requests with different pages.
- Stateless on the server-side.
- Easy to understand and debug.
- This approach requires very little business logic. There are a lot of libraries written in different languages which are easy to use.
Cons:
- Bad performance for large OFFSET in SQL. When doing OFFSET Nin SQL, the database needs to scan and count N rows. “The larger the offset, the slower the request is, up until the point that it times out.” Shopify Blog
- It can return repeated or missing if any is added/deleted while paginating. E.g., If the first request asks for page 1 and a new record is inserted to the first page, then the request with page 2 will have a record repeated which was returned on the previous request.
KeySet-based pagination #
tl;dr: use a key param. Examples: since_id
, since_updated_at
, since_created_at
.
Pros:
- The SQL query is more efficient than OFFSET (for most cases) since it uses a WHERE condition (assuming it has good SQL indexes).
- Unlike page-based pagination, new records inserted on previous pages won’t cause duplicated elements.
Cons:
- It’s tied to the sort order. If you want to use since_id then, the set should be sorted by id.
- There is no way to jump to a specific page. It needs to iterate through all the prior pages.
- It doesn’t allow sending parallel requests for different batches.
- The API needs to expose multiple key-params (e.g. since_id,since_updated_at).
- The client needs to keep track of the key-value of the set.
- Missing items if they are added to the previous pages
Cursor-based pagination #
tl;dr: A cursor will be a piece of data that contains a pointer to an element and the info to get the next/previous elements. The server should return the cursor pointing to the next page in each request. In most cases, the cursor is opaque, so users cannot manipulate it.
Clients should not store the cursor on their side. Google API Documentation suggests adding an expiration date to the token and expiring cursors sent in requests.
Pros:
- If the cursor is opaque, the implementation underneath can change without having to introduce an API change.
- In SQL, for most of the cases, it is much faster than using page since it won’t use OFFSET in the database.
- There is no issue when a record is deleted as opposed to Page-based Pagination.
Cons:
- There is no way to skip pages. If the user wants page X, it needs to request pages from 1 to X.
- It doesn’t allow sending parallel requests for different batches.
- The implementation is more complex than LIMIT/OFFSET.
Hard to debug. Given a request, you have to unencode it to see what’s doing. - Missing items if they are added to the previous pages
That's three different ways to paginate your endpoints!
I highly recommend reading the post Paginating requests in API, via Ignacio Chiazzo, to see how other major companies are imlpementing it!
Related TILs
Tagged: restapi