Kubernetes must stay pure, upstream open-source

In a short time, the open-source ecosystem has evolved from niche projects with limited corporate backing into the de facto way to build software.

Today, companies small and large are adopting open-source software to accelerate product development and innovation.

A state of enterprise open source survey found that 95% of enterprises are taking open-source seriously, with 75% reporting open-source software is extremely important to their IT strategies. 77% of respondents plan to ramp up open-source use in the next year.

Even in the UK government, 71% of tech workers report they use more open source than five years ago. Meanwhile, the U.S. Department of Defense issued a memorandum on adopting open source software as its preference versus proprietary software, calling open-source “critical in delivering software faster.”

With Kubernetes’ maturation, the market has consolidated. There are many Kubernetes offerings with different architectures, features, and interfaces. Some are less open and flexible than others, with different restrictions, dependencies, and licensing terms in place.

Why do you need openness? Deviating from the Kubernetes standard can create problems. As The Journal of Cloud Computing notes, “without an appropriate standardized format, ensuring interoperability, portability, compliance, trust, and security is difficult.”

What Is Pure Upstream Open-Source Kubernetes?

Upstream Kubernetes is an open-source version of Kubernetes hosted and maintained by the Cloud Native Computing Foundation where code and documentation is developed, and contributions are made. It consists of core Kubernetes components (often called “plain vanilla Kubernetes”) for orchestrating containers without add-on applications. All of these are publicly accessible for inspection, modification, and redistribution.

These free and open-source software projects start with good intentions – making tech that benefits the whole community. Anyone can access the code, so public collaboration can fix bugs, add patches, and improve performance relatively fast. But as a project grows and matures, different goals, perspectives, and needs arise.

This is where project contributors introduce ‘forks’ in the code.

What Is a Fork of Kubernetes?

A fork of Kubernetes is a version of the open-source project developed along a separate workstream from the main trunk. Forking occurs when part of the development community or a third-party vendor makes a copy of the upstream project with modifications to start a completely independent line of development.

Why would you fork Kubernetes? There are good reasons.

There might be differences in opinion (technical or personal). Development of the upstream project might have stagnated. Someone might want to create different functionality or steer the project in a new direction. This can happen in either open-source or proprietary environments.

In an open-source environment, when you have a fork of Kubernetes that improves the original source code, other forks can take advantage of it. And because the code is freely available to use, other forks can merge the code into their fork to better meet the needs of developers and end users.

But a fork of Kubernetes in a proprietary environment? Vendors or cloud companies will modify the source code to meet their specific needs, repackage the software, and offer it to customers as a proprietary distribution, or they may modify the add-ons needed to run Kubernetes in production.

This can complicate the management of the solution and create vendor lock-in.

What Are the Challenges of a Fork of Kubernetes?

Deploying and managing Kubernetes at scale in an enterprise is hard! Many organisations turn to proprietary distributions to obtain enterprise support for their container platforms. But significantly forked versions of Kubernetes have emerged as proprietary Kubernetes deployments.

Customers deploying proprietary Kubernetes distributions face many challenges:

Complications with Patches, Bug Fixes, Upgrades, and New Features

Every time you introduce changes, it becomes more difficult to make them work with your custom distribution. This process is slow, error-prone, and costly. In fact, by the time a new feature comes out, your custom distribution is a few releases behind the latest. Vendors who fork Kubernetes often have an older version of the cluster API because it takes them six months or more to get improvements and bug fixes from the upstream.

Vendor Lock-in

Forks in Kubernetes create lock-in. Vendor lock-in, or proprietary lock-in, happens when a customer cannot easily replace or migrate the solution. A fork in Kubernetes doesn’t give you the flexibility to move your applications and data seamlessly between public, private, and on-premise services. It also doesn’t provide you with multiple options as your company grows.

Even if the source code is open-source, vendors can wrap Kubernetes in many features that make it difficult to migrate to other platforms without incurring cost and excess resource allocation. And because most custom distributions are not built with FluxCD, you can’t switch vendors without re-architecting your whole stack.

Lack of Functionality

A forked version of Kubernetes can break application functionality. Some custom distributions rely on proprietary APIs and CLIs to get full functionality, which creates lock-in. And if the custom distribution only runs on the vendor’s custom Linux kernel, it also creates lock-in. As time goes on, it will get harder to maintain this fork. Merging the latest upstream patches into the fork won’t be possible without major additional work for patch and feature compatibility. And if a vendor discontinues a product or application, you may be out of luck.

Less Secure

A fork in Kubernetes can potentially run less secure code. If a vulnerability is found in open-source code and fixed by the community in the upstream, a forked version of the code may not benefit from this fix because it is different from the upstream.

Lack of Interoperability
Vendors may modify code for their custom distributions or the supporting applications you need to make Kubernetes run in production. While a modified version of Kubernetes will work with a particular vendor’s application stack and management tools, these proprietary modifications lock you into customized component builds preventing you from integrating with other upstream open-source projects without lock-in. And if their stack comprises multiple products, it’s very hard to achieve interoperability, which can cause lots of downstream issues as you scale.

Technical Debt
It’s incredibly difficult to merge back a fork that has diverged drastically over the years from the upstream. This is called technical debt – the cost of maintaining source code caused by deviation from the main branch where joint development happens. The more changes to forked code, the more money and time it costs to rebase the fork to the upstream project.

Why Is Pure Upstream Kubernetes Important?

Pure upstream open-source Kubernetes is more than just where the project lives. It’s the focal point where decisions are made, where contributions happen, and comes with a built-in community that continuously improves the source code.

By doing work upstream first, you can share ideas with the larger community and get new features and releases accepted upstream. When features and patches are accepted upstream, every project and product based on the upstream can benefit from that work when they pick up the future release or merge recent (or all) upstream patches.

Conversely, it takes more time and effort to send patches or report bugs from the downstream project back upstream. Often, the code has already changed by the time it has shipped downstream, making it harder to integrate changes developed against an older version of the project.

While anyone can copy, install, or distribute Kubernetes from the upstream repository, larger companies and organizations need certified products, tested and hardened for enterprise use. As such, organisations rely on vendors to turn upstream Kubernetes into downstream products that meet their business needs.


About the Author

Tobi Knaup, CEO of D2iQ. D2iQ provides the leading independent Kubernetes platform which simplifies and automates the really difficult tasks needed for enterprise-grade production at scale, while reducing operational burden and reducing costs. As a cloud native pioneer we have more than a decade of experience tackling the most complex, mission-critical deployments in the industry.

more insights