Yesterday, there was a conversation on Twitter after Janakiramm wrote an article in which he refers to some analysts calling OpenShift a fork of Kubernetes. As someone who has been part of open source since the early Slackware days and as someone who has seen the forking process in many communities in the past, I thought I will offer my thoughts on the debate. The twitter conversation centered on vendors on both sides of the spectrum pushing their own marketing agendas but I am approaching this debate based on the history of Open Source Software.
What constitutes a fork?
In today’s world, the term fork is used much more loosely from forking the code from the main branch of a git repo to forking communities. This loose usage of the term has allowed marketing teams to use the term to their advantage, confusing the market. So, what is a fork? A simplified definition that could act as a basic smell test would be
If there is a symbiotic relationship with upstream software acting as a “kernel”, it is a distribution. If a software takes the original code and exists without a symbiotic relationship in a parallel universe, it is a fork
A deeper examination of the term fork is more multifaceted and spans multiple aspects of an open source project
- Software Code
- Governance Model
- API compatibility
How do you define forking at the software code level? If you consider a developer forking the master to change the code to either add a feature or fix a bug as a fork, then every thing is a fork. If you consider the forking at the build level, I can assure you that every single user’s build can be considered a fork based on the contents of the build. For something to constitute a fork at the code level, we should consider the code to leave the project and have an entirely different set of community, governance and an incompatible API.
The beauty of OSS is that anyone can fork the code but the difficult part is forking the community. That is why code forks happen all the time developers touch the code but community forks happen very rarely. Most attempts to fork the community has failed miserably for the want of a critical mass. Some startups, where I have worked or advised in the past,have asked me whether they should open source their product. In spite of being a big advocate of open source, I advise against open sourcing because building a community around an open source project is not for faint hearted folks. This difficulty in fostering open source communities is the reason why competitors work together in today’s open source projects. So, calling something a fork just based on how the source code is used in packaging a product is too simplistic.
Even though licensing played a critical role in offering the freedom to tinker the code, it is the community governance that played an important role in the success of many OSS communities. Later on Apache Foundation, Eclipse Foundation, etc. helped maintain a governance model for individuals contributing to OSS. When the contribution to OSS went from individuals scratching one’s own itch to vendors scratching the commercial itch, we saw OSS foundations springing up left and right trying to act as an umbrella organization for both governance and marketing. The early day OSS Foundations like OpenStack and CloudFoundry focussed more on marketing and less on governance and fostering communities, resulting in quite a bit of chaos in these communities. Cloud Native Computing Foundation seems to have learned from the failures of early OSS Foundations and has got a better governance model in place but it is still early days and I am still skeptical (but open minded) of their ability to get this one right. Only time will tell if they can pull it off. I digressed a bit here, but a fork constitutes a break from the governance of the original project into an independent governance model
The key requirement for a product to be considered as a distribution of an OSS project is the API compatibility with the original project. This should be one of the critical criteria you should use in your evaluation of various distributions (This is also important from the point of view of Modern Enterprise Framework we advocate for continuous innovation)
Is OpenShift a fork of Kubernetes?
Let us take a step back and understand the role played by Kubernetes. Kubernetes is a container orchestration engine. When containers were hot, people said just having a bunch of containers makes no sense and you need an orchestration engine to make sense of containers in terms of management, scaling, etc.. When Kubernetes got momentum, the vendors in the ecosystem said Kubernetes by itself doesn’t make much sense and you need additional tooling to make Kubernetes the foundation of modern application infrastructure. The addition of various tools created a spectrum of platforms with varying degrees of flexibility, from container platforms to application platforms.
OpenShift tries to sit somewhere in the middle of this spectrum trying to meet end user needs on both the container platform side and the application platform side. OpenShift, or for that matter any platform in the ecosystem, cannot do it with just the stand-alone Kubernetes. They need to add tooling around Kubernetes to make it useful as a container or application platform (See this post about CaaS, PaaS, Container or Application Platforms for information on the differences). Cloud Native Computing Foundation (CNCF) supports some additional tools needed for this but most vendors in the CNCF ecosystem go beyond these tools as they productize Kubernetes.
With this context, OpenShift is not a fork because
- They contribute heavily to Kubernetes project (one of the top three vendors contributing to the project)
- They are part of CNCF and Kubernetes community and bound by their governance model. OpenShift community, wrapped under the guise of OpenShift Commons, is more of an user community and works as a complement to Kubernetes community
- When you deploy apps on OpenShift, you can either use OpenShift API or Kubernetes API. The choice exists even though the documentation may not be clear
Clearly, OpenShift doesn’t represent a fork in the traditional sense of the term “OSS Forks” and it is clearly a distribution than a fork. It may not be a distribution closely aligned with Vanilla Kubernetes but OpenShift model is tending more towards the symbiotic relationship that existed between Ubuntu and Debian (not an exact mapping but it helps to give an idea about how it is not a fork but a distribution little far away from Vanilla Kubernetes (a derivative?).
Again, this debate is not restricted to OpenShift alone. People had these debates on OpenStack and some people even called Pivotal CloudFoundry a fork of CloudFoundry. Clearly, in all these conversations, the term fork is used more loosely than how it is meant to be used.
Having said that, I want to conclude with the following two arguments:
- Fork in OSS should be celebrated and cannot be considered as a four letter F word
- I just want to quote this conclusion from the early Ubuntu days
Because of the size and usefulness of their code and the size of their development communities, large projects like Debian and Ubuntu have been forced into confronting and attempting to mediate the problems inherent in forking and deriving. However, as these problems are negotiated and tools and processes are advanced toward solutions, free software projects of all sizes will be able to offer users exactly what they want with minimal redundancy and little duplication of work. In doing this, free software will harness a power that proprietary models cannot compete with. They will increase their capacity to produce better products and better processes. Ultimately, it will help free software capture more users, bring in more developers, and produce more free software of a higher quality.