- cross-posted to:
- sysadmin@lemmy.world
- cross-posted to:
- sysadmin@lemmy.world
This article will describe how to download an image from a (docker) container registry.
Manual Download of Container Images with wget and curl |
Intro
Remember the good `'ol days when you could just download software by visiting a website and click “download”?
Even apt
and yum
repositories were just simple HTTP servers that you could just curl
(or wget
) from. Using the package manager was, of course, more secure and convenient – but you could always just download packages manually, if you wanted.
But have you ever tried to curl
an image from a container registry, such as docker? Well friends, I have tried. And I have the scars to prove it.
It was a remarkably complex process that took me weeks to figure-out. Lucky you, this article will break it down.
Examples
Specifically, we’ll look at how to download files from two OCI registries.
Terms
First, here’s some terminology used by OCI
- OCI - Open Container Initiative
- blob - A “blob” in the OCI spec just means a file
- manifest - A “manifest” in the OCI spec means a list of files
Prerequisites
This guide was written in 2024, and it uses the following software and versions:
- debian 12 (bookworm)
- curl 7.88.1
- OCI Distribution Spec v1.1.0 (which, unintuitively, uses the ‘/v2/’ endpoint)
Of course, you’ll need ‘curl
’ installed. And, to parse json, ‘jq
’ too.
sudo apt-get install curl jq
What is OCI?
OCI stands for Open Container Initiative.
OCI was originally formed in June 2015 for Docker and CoreOS. Today it’s a wider, general-purpose (and annoyingly complex) way that many projects host files (that are extremely non-trivial to download).
One does not simply download a file from an OCI-complianet container registry. You must:
- Generate an authentication token for the API
- Make an API call to the registry, requesting to download a JSON “Manifest”
- Parse the JSON Manifest to figure out the hash of the file that you want
- Determine the download URL from the hash
- Download the file (which might actually be many distinct file “layers”)
One does not simply download from a container registry |
In order to figure out how to make an API call to the registry, you must first read (and understand) the OCI specs here.
OCI APIs
OCI maintains three distinct specifications:
- image spec
- runtime spec
- distribution spec
OCI “Distribution Spec” API
To figure out how to download a file from a container registry, we’re interested in the “distribution spec”. At the time of writing, the latest “distribution spec” can be downloaded here:
- https://github.com/opencontainers/distribution-spec/releases/tag/v1.1.0
- https://github.com/opencontainers/distribution-spec/releases/download/v1.1.0/oci-distribution-spec-v1.1.0.pdf
The above PDF file defines a set of API endpoints that we can use to query, parse, and then figure out how to download a file from a container registry. The table from the above PDF is copied below:
ID | Method | API Endpoint | Success | Failure |
---|---|---|---|---|
end-1 | GET |
/v2/ |
200 |
404 /401 |
end-2 | GET / HEAD |
/v2/<name>/blobs/<digest> |
200 |
404 |
end-3 | GET / HEAD |
/v2/<name>/manifests/<reference> |
200 |
404 |
end-4a | POST |
/v2/<name>/blobs/uploads/ |
202 |
404 |
end-4b | POST |
/v2/<name>/blobs/uploads/?digest=<digest> |
201 /202 |
404 /400 |
end-5 | PATCH |
/v2/<name>/blobs/uploads/<reference> |
202 |
404 /416 |
end-6 | PUT |
/v2/<name>/blobs/uploads/<reference>?digest=<digest> |
201 |
404 /400 |
end-7 | PUT |
/v2/<name>/manifests/<reference> |
201 |
404 |
end-8a | GET |
/v2/<name>/tags/list |
200 |
404 |
end-8b | GET |
/v2/<name>/tags/list?n=<integer>&last=<integer> |
200 |
404 |
end-9 | DELETE |
/v2/<name>/manifests/<reference> |
202 |
404 /400 /405 |
end-10 | DELETE |
/v2/<name>/blobs/<digest> |
202 |
404 /405 |
end-11 | POST |
/v2/<name>/blobs/uploads/?mount=<digest>&from=<other_name> |
201 |
404 |
end-12a | GET |
/v2/<name>/referrers/<digest> |
200 |
404 /400 |
end-12b | GET |
/v2/<name>/referrers/<digest>?artifactType=<artifactType> |
200 |
404 /400 |
end-13 | GET |
/v2/<name>/blobs/uploads/<reference> |
204 |
404 |
In OCI, files are (cryptically) called “blobs
”. In order to figure out the file that we want to download, we must first reference the list of files (called a “manifest
”).
The above table shows us how we can download a list of files (manifest) and then download the actual file (blob).
Examples
Let’s look at how to download files from a couple different OCI registries:
Docker Hub
To see the full example of downloading images from docker hub, click here
GitHub Packages
To see the full example of downloading files from GitHub Packages, click here.
Why?
I wrote this article because many, many folks have inquired about how to manually download files from OCI registries on the Internet, but their simple queries are usually returned with a barrage of useless counter-questions: why the heck would you want to do that!?!
The answer is varied.
Some people need to get files onto a restricted environment. Either their org doesn’t grant them permission to install software on the machine, or the system has firewall-restricted internet access – or doesn’t have internet access at all.
3TOFU
Personally, the reason that I wanted to be able to download files from an OCI registry was for 3TOFU.
Verifying Unsigned Releases with 3TOFU |
Unfortunaetly, most apps using OCI registries are extremely insecure. Docker, for example, will happily download malicious images. By default, it doesn’t do any authenticity verifications on the payloads it downloaded. Even if you manually enable DCT, there’s loads of pending issues with it.
Likewise, the macOS package manager brew has this same problem: it will happily download and install malicious code, because it doesn’t use cryptography to verify the authenticity of anything that it downloads. This introduces watering hole vulnerabilities when developers use brew to install dependencies in their CI pipelines.
My solution to this? 3TOFU. And that requires me to be able to download the file (for verification) on three distinct linux VMs using curl or wget.
⚠ NOTE: 3TOFU is an approach to harm reduction.
It is not wise to download and run binaries or code whose authenticity you cannot verify using a cryptographic signature from a key stored offline. However, sometimes we cannot avoid it. If you’re going to proceed with running untrusted code, then following a 3TOFU procedure may reduce your risk, but it’s better to avoid running unauthenticated code if at all possible.
Registry (ab)use
Container registries were created in 2013 to provide a clever & complex solution to a problem: how to package and serve multiple versions of simplified sources to various consumers spanning multiple operating systems and architectures – while also packaging them into small, discrete “layers”.
However, if your project is just serving simple files, then the only thing gained by uploading them to a complex system like a container registry is headaches. Why do developers do this?
In the case of brew, their free hosing provider (JFrog’s Bintray) shutdown in 2021. Brew was already hosting their code on GitHub, so I guess someone looked at “GitHub Packages” and figured it was a good (read: free) replacement.
Many developers using Container Registries don’t need the complexity, but – well – they’re just using it as a free place for their FOSS project to store some files, man.
Hey, a bit off-topic, but how did you get the whole blog post here? Are you using the Wordpress plugin, or did you copy the source into your Lemmy client UI?
it’s a lot of work, but basically I copy the html from wordpress into pandoc, convert it to markdown, and then do a lot of cleanup
I didn’t copy the whole post here because it’s so much work, but I usually do when there’s less images and my articles aren’t likely to exceed the max char limit :)
Ok, then I’m confused… why would you be interested in doing that? Wouldn’t it be better to just post a (short) summary in the body of the post and let people check things out in your blog?
what happens if I die? what happens if my site goes down? what happens if a site is “protected” by cloudflare (and therefore makes the content inaccessible to at-risk folks)? what happens if a site has an authwall (and therefore is inaccessible to less-privileged folks)?
I think it’s important for us to federate content, not just links.
archive.org?
IPFS for your site?
Use the wordpress AP plugin to federate the posts directly?
I mean, I don’t mean to discourage you from posting. I was just curious for your reasons, and was wondering if you’d also consider these other possibilities.
The Wordpress Activity Pub plugin is on my TODO list. I opened some bug reports with them recently.