How to Paginate (the right way) in SQL

Many ways to scroll through a table. Easy to misuse them.

Posted by Tariq Abughofa on September 25, 2019 · 8 mins read #pagination #sql

Server-side pagination is a commonly-used feature in SQL databases. It helps when showing a huge set of results on user interfaces, it’s required in RESTful APIs, and it comes to the rescue whenever you need to process large data in bulk and it doesn’t fit in memory. The problem is that if it’s done wrong it can be as inefficient as loading the full set or more.

There is a number of ways to implement pagination with SQL systems. I will go through the methods in this article with the advantages, disadvantages, and the reasonable scenarios of using each one.

Client-side Pagination:

The requested query is passed as it is without pagination. After the client gets the full set of data, it divides the data and shows it paginated to the user.

Advantages

It reduces the number of requests to the database. The data can even be cached on the client-side to avoid future requests which makes later loads even faster.

Disadvantages

Not suitable for regularly changing data. The dataset should be small to fit in memory and not painfully slow the initial load.

Suitable Scenarios

Small data sets which are not regularly updated and needed frequently such as the categories.

Using Limit/Offset:

The limit and offset are standard SQL keywords and the first solution that comes to mind for pagination over datasets. The query syntax would look something like this:

SELECT *
  FROM products
 ORDER BY sale_date DESC
 WHERE sold = TRUE
 LIMIT 10 OFFSET 30;
Advantages

Easy to implement which made many famous ORM solution use it. It’s as easy as chaining the query function like this:

Product.limit(10).offset(30).findAll()

It also allows you to filter the table while pagination with the WHERE statement. You can sort on any column and it still works. Which makes the query very customizable. Not the whole data is loaded into memory which means no running out of memory.

Disadvantages

It can be HORRIBLE. The query performance goes downhill as the offset value increases. Let’s say you are fetching page n. How would offset skip the first n-1? It wouldn’t. It has to linearly scan the table for the first n-1 and then load page n.

An offset means that a certain number of records will be skipped from the start, but what if new data where inserted in page n-1 while page n is being loaded? rows from page n-1 will be pushed into page n causing inconsistency, and worst; if the data is being updated by the paginating process, rows might be processed multiple times causing data inconsistency.

Suitable Scenarios

Great for user interfaces, easy to implement and very customizable to different filter/order preferences. As long as the deepness of the search results is limited. Perfect for pages where users paginate down the results with scrolling. Have this functionality on a large dataset with a “last page” button showing up on the interface and embrace yourself for the a database server out of service.

For migrations and large dataset processing, my advice: Offset the offset statement.

Cursors

SQL cursors makes the server do the pagination for you. All what you need to worry about is the query you want to paginate. It looks something like this:

-- Create a cursor for the query and open it
DECLARE cur CURSOR FOR SELECT * FROM products;
OPEN curEmail;
-- Retrieve ten rows each time
FETCH 10 FROM cur;
FETCH 10 FROM cur;
-- All done
CLOSE cur;
Advantages

Supports arbitrary queries. No performance drop when going further in the pages.

Disadvantages

the implementation details is different from an SQL engine to the other but in general they held resources on the server and create locks. Clients can’t share cursors and thus they would have to open each it’s own which mean they can’t share the same pagination.

Suitable Scenarios

One client which needs to paginate over large-sets of data and cares about pagination consistency. There are many types that can suite different applications: READ_ONLY which avoids locks on the table, STATIC which copies the result into a temporary table which is good when updates doesn’t matter, KEYSET which only copies the primary keys to provide you with updates but not with new rows, DYNAMIC same as KEYSET but primary keys copy is updated so it can also see newly inserted and deleted rows. …etc. the availability of these solutions depends on the engine.

Key-based Pagination

This technique is my favourite for data migrations. You need an indexed column to use it but that introduces great optimization while staying stateless on the server side.

The first page is fetched with the following statement:

SELECT *
FROM products
ORDER BY id ASC
LIMIT 1000;

For the second page, the query uses the maximum id value fetched in the first page. Let’s say it’s 1000, the query is as follows:

SELECT *
FROM products
WHERE id > 1000
ORDER BY id ASC
LIMIT 1000;

An so on for the next pages.

Advantages

Offers great scalability: It’s as fast for the 1,000,000th page as for the first page. Also pagination consistency is preserved. It supports filtering and ordering on any column or multiple columns as long as you have indices on them.

Disadvantages

In general, there is no way to jump to a specific page. However, If you have an auto incremented identifier in the dataset, that can be done with some simple calculations. To retrieve page n, the lower bound for the query would be like this: n * 1000 where 1000 is the page size or the limit.

Suitable Scenarios

Mostly any one. Very convenient for scalable applications where a lot of server requests are expected.


Card image cap
10 Books Every Programmer Should Read

If you’ve just decided to become a programmer or you are already in the path...

Card image cap
How to Ace the Job interview? - Search Algorithms

Search algorithms are designed to check or retrieve an element from a data structure. The...