We will explain how this works with the new open source image format JPEG XL, but we’ll start by taking a step back and describing how images are currently delivered and rendered on the web.
How partial images are displayed on the web
It’s important that web sites including images load quickly, because waiting for images to load causes frustration. Two techniques in particular are used to make images appear fast: One is showing an approximation of the image before all bytes of the image are transmitted, often known as “progressive image loading.” Another is making the byte size of the image smaller by using strong image compression.
What is progressive image loading?
Some image formats are implemented in a way that does not allow any kind of progressive image loading; all the bytes of the image have to be received before rendering can begin. The next, most simple, type of image loading is sometimes called “sequential image loading.” For these images, the data is organized in a way that pixels come in a particular order, typically in rows and from top to bottom.Formats with this kind of image loading include PNG, webp, and JPEG. The JPEG format allows more sophisticated forms of progressive images. Here, we can organize the data so that it comes in multiple scans, with each scan showing more detail than the previous one.
For example, even if only approximately 15% of the data for an image is loaded, it often already has decent results. See the following images comparing no progression:
15% of bytes loaded, no progressive image loading
15% of bytes loaded, sequential image loading
15% of bytes loaded, progressive JPEG
Progressive JPEG XLs
While the quality (and therefore byte-sizes) of the individual scans in a progressive JPEG image can be controlled, the order within a scan is still top to bottom, like in a sequential JPEG. JPEG XL goes beyond that by making it possible to send the data necessary to display all details of the most salient parts first, followed by the less salient parts. For example, in a portrait, we can decide to first send the bytes for the face, and then, for the out-of-focus background.In general, progressive JPEG XL works in the following way:
- There is always an 8x8 downsampled image available (similar to a DC-only scan in a progressive JPEG). The decoder can display that with a nice upsampling, which gives the impression of a smoothed version of the image.
- The image is divided into square groups (typically of size 256 x 256) and it is possible to provide an order of these groups during encoding. In particular, we can order the groups by saliency and choose an order that anticipates where the viewer might look first, while not being disturbing.
(a) JPEG XL image only
(b) JPEG XL image compared to sequential jpeg
(c) JPEG XL image compared with progressive jpeg (with 3 scans)
(d) JPEG XL image compared to no progression (grey image until the end)
How to find good saliency maps for images
Saliency prediction models (overview) aim at predicting which regions in an image will attract human attention. To predict saliency effectively, our model leverages the power of deep neural nets to consider both high level semantic signals like face, objects, shapes etc., as well as low or medium level signals like color, intensity, texture, and so on. The model is trained on a large scale public gaze/saliency data set, to make sure the predicted saliency best mimics human gaze/fixation behaviour on each image. The model takes an image as the input and output a saliency map, which can serve as a visual importance map, and hence help determine the decoding order for each region in the image. Example images and their predicted saliency are as follows:
Different users have different experiences when it comes to looking at images loading on the web.We hope that this way of progressively delivering images will improve user experience especially on lower-bandwidth connections.
By Moritz Firsching and Junfeng He – Google Research