PDF streaming
If one is using the web server as an online repository of references, chances are PDF is one of the main document formats.
PDFs upon creation typically do not support streaming, due to how object references are stored in ''xref'' table located at the end of the PDF file. The PDF-1.2 specification does introduce the concept of "Linearized PDF", where the browser can send byte range requests to load specific pages of the PDF. The brand name of this feature is called "Fast Web View". In such files, a linearization dictionary is defined.
Linearization itself is easily done with qpdf (available on Ubuntu PPA as qpdf
):
qpdf --linearize [INPUT.PDF] [OUTPUT.PDF]
To check if PDF is linearized:
qpdf --check-linearization [INPUT.PDF]
Incidentally, Ubuntu 22.04 LTS default PDF viewer (evince
) does not support this feature. One can opt to use qpdfview
(also on Ubuntu PPA).
For the server-side, thankfully there is nothing much to configure other than try_files
- nginx sends the header Accept-Ranges: bytes
by default. Ensure that the PDF file is readable by the user running the webserver (typically www-data
).
For example, the default PDF viewer in Chrome will display "Fast web view: Yes":
Using Chrome's default PDF viewer does not properly handle linearized PDFs though - PDF load times are still pretty high. Seems like the SingleFile
extension is hooked and takes over loading of entire PDF:
PDF.js Chrome extension in contrast handles this correctly with 65 kB chunks, with the first request containing byte range header Content-Range: bytes 248971264-249036799/249098349
corresponding to page 100 of 696.
If the PDF still doesn't load dynamically, consider adding the following configuration to Nginx:
# Explicit use of byte-range header add_header Accept-Ranges bytes; # Force enable of byte-range support irregardless of cache status proxy_force_ranges on; # Explicitly specify file type default_type application/pdf;