What is the point of this format? How is it any better than png or webp? Do we really need yet another format? I mean 44k really isn’t that great of a savings in the example used.
The article discusses how it’s better than webp. Specifically, it’s much better at both compression ratios and performance, at all quality levels. WebP has problems where the compression falls off due to being locked to yuv420
JPEG XL provides comparable image quality to ordinary JPEG compression at around 80% of the file size. It also supports lossless encoding at smaller sizes than PNG, and can handle layers, transparency and CMYK, so in principle it could conveniently replace almost every existing raster image format.
So I agree with your sentiment for the most part. Mainly, it’s frustrating to see all of these new image standards come out which somehow compete with each other due to lack of browser support.
That said 44k isn’t peanuts. That’s a huge reduction, especially on lower end connection speeds.
it has the best lossy image compression (not counting extremely low bitrate images, where AVIF starts to win)
it can losslessly recompress JPEGs for a free 20% space savings - no image quality loss
it supports parallel decoding for extra speed
it supports progressive decoding (viewing a lower quality version of the image while it loads), unlike WebP/AVIF which just “pop up” when you’ve downloaded the whole thing
it supports lossless
it compresses lossless extremely well (notably unlike AVIF and PNG which fall on their face with lossless compression)
it supports animation (though AVIF is generally a better format for animation, because it’s based on a proper video codec)
it supports HDR
it has a very strong resilience against generation loss (the classic “JPEG degradation” of resaving images)
it is royalty-free
it otherwise has roughly every image format feature we’ve ever thought of included in its spec
If JXL is not the next image format then we will never ever get rid of JPEG and PNG. There has never been a more obviously superior image format in history.
You don’t need to use the high compression profiles to get good performance though. If you have a usecase where you are resource limited you should stick to effort levels 5-7 for very little loss in quality, or even 3-4 for lightning quick speed (the default is effort 7). Reference this benchmark against AVIF for effort values vs. speed (SSIMULACRA 2 is a deterministic psychovisual metric - higher is better).
Also, an important consideration in this realm is that JXL makes really clever use of variable-DCT (how big a chunk is) and adaptive quantization (what quality should be used for that chunk), allowing “quality levels” that you specify to be much more visually consistent across every image, instead of other codecs that make some images look bad at quality level 90 and some images look good at level 70. This allows you to select a consistent quality level and lower your encoding effort to compensate, instead of needing to always drive a high quality+effort level to account for every region in a picture looking good.
(If you want a slightly deeper dive into JXL’s performance, this is a concise post on various metrics)
With effort level 7 you should be getting images roughly 2/3’s of the size of the original PNG on average (assuming the PNG is already properly optimized). I would try again with at least effort levels 3, 4, 5, and 7. Also consider that PNGs need very expensive CPU time to properly compress them, using a tool like oxipng.
What sort of balance are you looking for with regards to filesize and encode time? At the very least, effort levels 1 through 3 will probably still give you better results than PNG while being ridiculously quick, so there shouldn’t be any configuration where PNG is a better choice than JXL with regards to speed.
No one uses hardware decoding for images - it’s just not a good fit for the reality of how we use images. Images are small and easy to decode, whereas starting up a hardware decoder takes a non-trivial amount of time. Additionally, GPU decoders only work single-threaded, so each image would have to be decoded one by one, instead of all at once like with CPU decoding. This was already attempted with VP8/WebP and they gave up trying to make it any good. Videos are good candidates for hardware decoding since they’re large and you’re only looking at one at a time.
If you have benchmarks or some proof showing otherwise by all means post here.
It is when you’re a cloud hosting platform and you have 1000’s of photos uploaded daily. That 44k saving scales massively when talking about cloud hosting platforms. The jpeg xl format license is more open than webp which is controlled by google.
The new format also enables more features than just file size, a quick google shows it supports animation, 360 photos, and image bursts (as well as more technical specifics that allow for better share ability without needing to have an accompanying json file or dropping to RAW).
This is more important because it means websites can embed photos and the web engine whether it be chromium, Firefox, or safari can handle it natively without needing JavaScript or some other intermediary.
What about png? It’s just another competing standard. At the end of the day it doesn’t really matter, but by not having competing standards we end up having one company controlling it. So since at the very least it gives a decent file size saving it’s good enough for me.
Even better, this must be fantastic when you’re training AI models with millions of images. The compression level AND performance should be a game changer.
Hmm, I haven’t delved into image training in a couple years so I’m assuming they still downscale images anyway, so I’m not sure how much the format helps? Do you know if better compression helps at lower resolution? I could see it helping but I could also seeing it be marginal gains and depending on processing time it might not be worth it to convert whole image sets to jpeg xl. And for performance does jpeg xl require less power/time to decode than other formats? Maybe for new image sets going forward it will be the standard.
Oh, I’ve just been toying around with Stable Diffusion and some general ML tidbits. I was just thinking from a practical point of view. From what I read, it sounds like the files are smaller at the same quality, require the same or less processor load (maybe), are tuned for parallel I/O, can be encoded and decoded faster (and there being less difference in performance between the two), and supports progressive loading. I’m kinda waiting for the catch, but haven’t seen any major downsides, besides less optimal performance for very low resolution images.
I don’t know how they ingest the image data, but I would assume they’d be constantly building sets, rather than keeping lots of subsets, if just for the space savings of de-duplication.
(I kinda ramble below, but you’ll get the idea.)
Mixing and matching the speed/efficiency and storage improvement could mean a whole bunch of improvements. I/O is always an annoyance in any large set analysis. With JPEG XL, there’s less storage needed (duh), more images in RAM at once, faster transfer to and from disc, fewer cycles wasted on waiting for I/O in general, the ability to store more intermediate datasets and more descriptive models, easier to archive the raw photo sets (which might be a big deal with all the legal issues popping up), etc. You want to cram a lot of data into memory, since the GPU will be performing lots of operations in parallel. Accessing the I/O bus must be one of the larger time sinks and CPU load becomes a concern just for moving data around.
I also wonder if the support for progressive loading might be useful for more efficient, low resolution variants of high resolution models. Just store one set of high res images and load them in progressive steps to make smaller data sets. Like, say you have a bunch of 8k images, but you only want to make a website banner based on the model from those 8k res images. I wonder if it’s possible to use the the progressive loading support to halt reading in the images at 1k. Lower resolution = less model data = smaller datasets to store or transfer. Basically skipping the downsampling.
Any time I see a big feature jump, like better file size, I assume the trade off in another feature negates at least half the benefit. It’s pretty rare, from what I’ve seen, to have improvements on all fronts.
JXL has been ready for practical use for a while now - the only place where JXL support is still missing is browsers (due to Google’s politically-motivated removal from chromium). I’m not sure if anyone has tried using JXL with ML, but it’s certainly ready to be tested right now. IMO JXL has been ready since their libJXL 0.7.0 release, which happened September 2022 last year. They’re still working towards a 1.0 but every image-related application has built-in support for JXL already and it can more or less be considered ready.
haven’t seen any major downsides, besides less optimal performance for very low resolution images
Just to note here, to be precise AVIF starts (barely) winning at low fidelity ranges, not low resolution. Meaning if you want a blurry mess that looks like this, AVIF will compress slightly better (that’s an actual AVIF converted to PNG by the way).
At the risk of sounding like sour grapes, this compression advantage doesn’t truly matter. This level of compression is almost never used, and even if it was, even drastic relative filesize savings would ultimately amount to bytes/kilobytes in the grand scheme of all images you’re serving. It’s more impactful to compress large images simply because they are larger. Smaller images are already small and efficiency deltas in a 1kB vs 1.1kB image are meaningless compared to a 600kB vs 800kB image.
I also wonder if the support for progressive loading might be useful for more efficient, low resolution variants of high resolution models. Just store one set of high res images and load them in progressive steps to make smaller data sets. Like, say you have a bunch of 8k images, but you only want to make a website banner based on the model from those 8k res images. I wonder if it’s possible to use the the progressive loading support to halt reading in the images at 1k
I’m not fully confident on this aspect but I’m pretty sure that JXL supports more than just traditional progressive decoding - you can actually pull “complete” images out of the bitstream from arbitrary ranges. Meaning you could efficiently store a full range of quality options in just one image, then serve them on the fly.
Any time I see a big feature jump, like better file size, I assume the trade off in another feature negates at least half the benefit. It’s pretty rare, from what I’ve seen, to have improvements on all fronts.
JXL is self-described “alien technology from the future”, and it was made by a “dream team” of image engineers who have had a hand in just about every image codec and compression technique from our past. It also benefits from being a real image codec, whereas every recent image format that has gained widespread adoption has been derived from a video codec (WebP, AVIF, HEIC).
The only truly useful thing it doesn’t perform best-in-class at is animation encoding (losing to AVIF because it’s based on the amazing AV1 video codec), and I would honestly recommend just serving AV1 videos instead, and skipping image formats entirely.
A neutral aspect of JXL is that it does worse in single-core decode speed compared to JPEG (which is disgustingly fast), but JXL can be parallelized whereas JPEG cannot. This is ultimately an advantage for JXL for general usecase where users have at least 4 cores available, but for large-scale distributed processing I imagine this property of JPEG may still have an edge use-case?
If you’re curious about the technical aspects of JXL, I recommend reading their official slidedeck. The nitty-gritty details start at page 59, but the whole thing is a good read.
What is the point of this format? How is it any better than png or webp? Do we really need yet another format? I mean 44k really isn’t that great of a savings in the example used.
The article discusses how it’s better than webp. Specifically, it’s much better at both compression ratios and performance, at all quality levels. WebP has problems where the compression falls off due to being locked to yuv420
JPEG XL provides comparable image quality to ordinary JPEG compression at around 80% of the file size. It also supports lossless encoding at smaller sizes than PNG, and can handle layers, transparency and CMYK, so in principle it could conveniently replace almost every existing raster image format.
So I agree with your sentiment for the most part. Mainly, it’s frustrating to see all of these new image standards come out which somehow compete with each other due to lack of browser support.
That said 44k isn’t peanuts. That’s a huge reduction, especially on lower end connection speeds.
A shortlist:
If JXL is not the next image format then we will never ever get rid of JPEG and PNG. There has never been a more obviously superior image format in history.
This might help: Image format comparison table
It’s very slow on high compression profiles though, and consumes a lot of resources.
You don’t need to use the high compression profiles to get good performance though. If you have a usecase where you are resource limited you should stick to effort levels 5-7 for very little loss in quality, or even 3-4 for lightning quick speed (the default is effort 7). Reference this benchmark against AVIF for effort values vs. speed (SSIMULACRA 2 is a deterministic psychovisual metric - higher is better).
Also, an important consideration in this realm is that JXL makes really clever use of variable-DCT (how big a chunk is) and adaptive quantization (what quality should be used for that chunk), allowing “quality levels” that you specify to be much more visually consistent across every image, instead of other codecs that make some images look bad at quality level 90 and some images look good at level 70. This allows you to select a consistent quality level and lower your encoding effort to compensate, instead of needing to always drive a high quality+effort level to account for every region in a picture looking good.
(If you want a slightly deeper dive into JXL’s performance, this is a concise post on various metrics)
I tried getting benefit from the format by recompressing PNGs at some point and it just seemed worthless due to reasons I listed in my comment.
With effort level 7 you should be getting images roughly 2/3’s of the size of the original PNG on average (assuming the PNG is already properly optimized). I would try again with at least effort levels 3, 4, 5, and 7. Also consider that PNGs need very expensive CPU time to properly compress them, using a tool like oxipng.
What sort of balance are you looking for with regards to filesize and encode time? At the very least, effort levels 1 through 3 will probably still give you better results than PNG while being ridiculously quick, so there shouldn’t be any configuration where PNG is a better choice than JXL with regards to speed.
Graph conveniently omits hardware support, where avif (av1) is supported across the board
No one uses hardware decoding for images - it’s just not a good fit for the reality of how we use images. Images are small and easy to decode, whereas starting up a hardware decoder takes a non-trivial amount of time. Additionally, GPU decoders only work single-threaded, so each image would have to be decoded one by one, instead of all at once like with CPU decoding. This was already attempted with VP8/WebP and they gave up trying to make it any good. Videos are good candidates for hardware decoding since they’re large and you’re only looking at one at a time.
If you have benchmarks or some proof showing otherwise by all means post here.
It is when you’re a cloud hosting platform and you have 1000’s of photos uploaded daily. That 44k saving scales massively when talking about cloud hosting platforms. The jpeg xl format license is more open than webp which is controlled by google.
The new format also enables more features than just file size, a quick google shows it supports animation, 360 photos, and image bursts (as well as more technical specifics that allow for better share ability without needing to have an accompanying json file or dropping to RAW).
This is more important because it means websites can embed photos and the web engine whether it be chromium, Firefox, or safari can handle it natively without needing JavaScript or some other intermediary.
What about png? It’s just another competing standard. At the end of the day it doesn’t really matter, but by not having competing standards we end up having one company controlling it. So since at the very least it gives a decent file size saving it’s good enough for me.
Even better, this must be fantastic when you’re training AI models with millions of images. The compression level AND performance should be a game changer.
Hmm, I haven’t delved into image training in a couple years so I’m assuming they still downscale images anyway, so I’m not sure how much the format helps? Do you know if better compression helps at lower resolution? I could see it helping but I could also seeing it be marginal gains and depending on processing time it might not be worth it to convert whole image sets to jpeg xl. And for performance does jpeg xl require less power/time to decode than other formats? Maybe for new image sets going forward it will be the standard.
Oh, I’ve just been toying around with Stable Diffusion and some general ML tidbits. I was just thinking from a practical point of view. From what I read, it sounds like the files are smaller at the same quality, require the same or less processor load (maybe), are tuned for parallel I/O, can be encoded and decoded faster (and there being less difference in performance between the two), and supports progressive loading. I’m kinda waiting for the catch, but haven’t seen any major downsides, besides less optimal performance for very low resolution images.
I don’t know how they ingest the image data, but I would assume they’d be constantly building sets, rather than keeping lots of subsets, if just for the space savings of de-duplication.
(I kinda ramble below, but you’ll get the idea.)
Mixing and matching the speed/efficiency and storage improvement could mean a whole bunch of improvements. I/O is always an annoyance in any large set analysis. With JPEG XL, there’s less storage needed (duh), more images in RAM at once, faster transfer to and from disc, fewer cycles wasted on waiting for I/O in general, the ability to store more intermediate datasets and more descriptive models, easier to archive the raw photo sets (which might be a big deal with all the legal issues popping up), etc. You want to cram a lot of data into memory, since the GPU will be performing lots of operations in parallel. Accessing the I/O bus must be one of the larger time sinks and CPU load becomes a concern just for moving data around.
I also wonder if the support for progressive loading might be useful for more efficient, low resolution variants of high resolution models. Just store one set of high res images and load them in progressive steps to make smaller data sets. Like, say you have a bunch of 8k images, but you only want to make a website banner based on the model from those 8k res images. I wonder if it’s possible to use the the progressive loading support to halt reading in the images at 1k. Lower resolution = less model data = smaller datasets to store or transfer. Basically skipping the downsampling.
Any time I see a big feature jump, like better file size, I assume the trade off in another feature negates at least half the benefit. It’s pretty rare, from what I’ve seen, to have improvements on all fronts.
JXL has been ready for practical use for a while now - the only place where JXL support is still missing is browsers (due to Google’s politically-motivated removal from chromium). I’m not sure if anyone has tried using JXL with ML, but it’s certainly ready to be tested right now. IMO JXL has been ready since their libJXL 0.7.0 release, which happened September 2022 last year. They’re still working towards a 1.0 but every image-related application has built-in support for JXL already and it can more or less be considered ready.
Just to note here, to be precise AVIF starts (barely) winning at low fidelity ranges, not low resolution. Meaning if you want a blurry mess that looks like this, AVIF will compress slightly better (that’s an actual AVIF converted to PNG by the way).
At the risk of sounding like sour grapes, this compression advantage doesn’t truly matter. This level of compression is almost never used, and even if it was, even drastic relative filesize savings would ultimately amount to bytes/kilobytes in the grand scheme of all images you’re serving. It’s more impactful to compress large images simply because they are larger. Smaller images are already small and efficiency deltas in a 1kB vs 1.1kB image are meaningless compared to a 600kB vs 800kB image.
I’m not fully confident on this aspect but I’m pretty sure that JXL supports more than just traditional progressive decoding - you can actually pull “complete” images out of the bitstream from arbitrary ranges. Meaning you could efficiently store a full range of quality options in just one image, then serve them on the fly.
JXL is self-described “alien technology from the future”, and it was made by a “dream team” of image engineers who have had a hand in just about every image codec and compression technique from our past. It also benefits from being a real image codec, whereas every recent image format that has gained widespread adoption has been derived from a video codec (WebP, AVIF, HEIC).
The only truly useful thing it doesn’t perform best-in-class at is animation encoding (losing to AVIF because it’s based on the amazing AV1 video codec), and I would honestly recommend just serving AV1 videos instead, and skipping image formats entirely.
A neutral aspect of JXL is that it does worse in single-core decode speed compared to JPEG (which is disgustingly fast), but JXL can be parallelized whereas JPEG cannot. This is ultimately an advantage for JXL for general usecase where users have at least 4 cores available, but for large-scale distributed processing I imagine this property of JPEG may still have an edge use-case?
If you’re curious about the technical aspects of JXL, I recommend reading their official slidedeck. The nitty-gritty details start at page 59, but the whole thing is a good read.