Mark Peek, VP and principal engineer, and Alex Xu, senior project manager, co-authored this post.
On November 1, 2020, Docker Hub will begin limiting anonymous and free account image pulls. While some may be upset about the change, it reflects a larger reality that takes into consideration the risks associated with consuming public content—most public repositories have some level of rate limiting to prevent denial-of-service attacks and customer metering—in addition to the cost of hosting public content. However, Harbor provides features that will help keep public container services accessible for your teams utilizing container images across multiple clouds.
In mid-2018, VMware donated the Harbor open source container registry project to the CNCF, where it has flourished and is now a graduated project. Harbor is a foundational component of the VMware Tanzu portfolio, and of Tanzu Kubernetes Grid in particular, providing a production-quality registry for use in multi-cloud deployments. Harbor can help you mitigate the effects of the upcoming Docker Hub limits via both replication capabilities and a proxy cache feature.
Configuring replication in Harbor can allow images to be manually or periodically, via automation, synchronized with all major repositories—including Docker Hub, Docker Registry, AWS Elastic Container Registry, Azure Container Registry, Alibaba Cloud Container Registry, Google Container Registry, Huawei SWR, Helm Hub, Gitlab, Quay, and Jfrog Artifactory. Replication will mirror a repository down into your private Harbor repository. With the right retention policy, replication is best used when an exact duplicate of the repository is needed. Caution is advised, however, as any upstream changes could be reflected in the mirror.
Even before the latest Docker terms of service announcement, Harbor users were already impacted by pre-existing pull throttling from Docker Hub. To solve this issue, the Harbor proxy cache feature was proposed and implemented.
The Harbor proxy cache feature allows for a pull-through cache of external repositories. This allows for more ad hoc caching of artifacts, which reduces the number of external requests for frequently used containers. A proxy cache is also useful for scenarios where clusters have little or no connectivity to another target registry due to security issues or limited egress options from Kubernetes worker nodes, and can therefore use Harbor as a secure intermediary registry.
The proxy cache workflow now splits the Docker pull command into two legs—pulling the image once to a deployed Harbor registry and caching it on the first leg, then having Harbor serve subsequent pull requests on the second leg from the cache. When an update on the source image is detected, a repull is triggered automatically. Moreover, the proxy cache enjoys the same set of capabilities available to a regular project, such as retention and immutability policies, vulnerability scanning, and deployment policies based on scanning and signing.
To configure a proxy cache, create a project in Harbor UI as you would any project, but select the “proxy cache” option and enter the target registry endpoint with its already-configured credentials. Next, modify your pull commands and Pod specs to pull from the proxy project instead of from your target registry.
When pulling an image-by-image tag from a proxy project, Harbor will always check its cached copy against the target registry to guarantee it has the latest one. Simply comparing manifests does not count toward your pull quota. Even when the limit is reached, the cached image can be served as a fallback.
Let’s go over all the possible scenarios:
Requested image is not cached but exists in the target registry, in which case Harbor will pull from the target and cache it before serving it to the client
Requested image is cached and has not been updated since being in the target registry, in which case Harbor serves the cached copy directly
Requested image is cached but an updated copy is found in the target registry, in which case Harbor pulls down the updated copy and caches it before serving it
Requested image is cached and the target registry is unreachable, in which case Harbor serves the cached copy
Requested image is cached but the image has been removed from the target registry, in which case the pull request is rejected
In principle, Harbor will serve the cached image in lieu of downloading the image again whenever possible. When used in combination with Harbor’s retention policy capability, you can manage the life cycle of your images efficiently and keep them colocated with the rest of your Kubernetes stack. For additional security, Harbor can perform vulnerability scans on these images locally, with policy enforcement, to ensure they are free of any known vulnerabilities.
The Docker rate-limiting changes will be in effect soon, so you might start observing pull-related failures from development or CI/CD workflows. It is best to be proactive in starting to inventory the container usage in your source code so as to understand where public images are being used. Harbor offers a secure, multitenant artifact registry that enables developer self-service while providing operators with the necessary policy and guard rails. To take advantage, simply switch the pull requests to use the Harbor proxy cache.
Learn more about Harbor at goharbor.io.