This is a small write-up of our ongoing effort to move our infrastructure to modern technologies like Kubernetes. It all started a bit before the Hack Week 17 with the microservices and serverless for the openSUSE.org infrastructure project. As it mentions, the trigger behind it is that our infrastructure is getting bigger and more complicated, so the need to migrate to a better solution is also increasing. Docker containers and Kubernetes (for container orchestration) seemed like the proper ones, so after reading tutorials and docs, it was time to get our hands dirty!
Installing and experimenting with Kubernetes / CaaSP¶
We started by installing the SUSE CaaSP product on the internal Heroes VLAN. It provides an additional admin node which sets up the Kubernetes cluster. The cluster is not that big for now. It consists of the admin node, three kube-master nodes behind a load balancer and four kube-minions (workers). The product was in version 2 when we started, but version 3 became available which is the one we're using right now. It worked flawlessly, and we were even able to install the Kubernetes dashboard on top of it, which was our first Kubernetes-hosted webapp.
Since the containers inside Kubernetes are on their own internal network, we needed also a loadbalancer to expose the services to our VPN. Thus, we experimented with Ingress for load balancing of the applications deployed inside Kubernetes, also successfully. A lot of experiments around deployments, scaling and permissions took place afterwards, to get us more familiarized with the new concepts, which of course ended up in us destroying our cluster multiple times. We were surprised to see though the self-healing mechanisms taking over.
Although the experiments took place only with static pages so far, it still allowed us to learn a lot about Docker itself, eg how to create our own images and deploy them to our cluster. It's worth also mentioning the amazing kctl tool, just take a look at its README to realize how much more useful it is compared to the official kubectl.
Time to move to the next layer.
Installing and experimenting with Cloud Foundry / CAP¶
The next step was to install yet another SUSE product, this time the Cloud Application Platform, which offers a Platform as a Service solution based on the software named Cloud Foundry. The first blocker was met here though. CAP requires a working Kubernetes storageclass, which means that we needed to have a persistent storage backend. A good solution would be to use a distributed filesystem solution like Ceph, but due to the time limitations of Hack Week, we decided to go with a simpler solution for now, and the simplest was an NFS server. The CAP installation was smooth from that point, and we managed to login to our Cloud Foundry installation via the command line tool, as well as via the Stratos webUI. A wildcard domain *.cf.mydomain.tld was also needed here.
The idea was quite straightforward here: go to your git repository, and type a simple command like: cf push -m 256M -k 512M myapp. This would deploy a new app directly to Cloud Foundry, giving it 256MB of RAM and 512MB of disk space. As a bonus, it created a domain myapp.cf.domain.tld immediately! So the benefits here were quite obvious, no need to build our own container image with the app and set up a mechanism to deploy it, and no need for the manual step of setting up a DNS. The Ingress LB for Kubernetes that was mentioned before is also obsolete now, as Cloud Foundry handles this as well. The command cf scale could also give us the ability to scale up/down (increase/decrease memory/disk) or scale in/out (increase/decrease number of instances) as well.
Time for stress testing¶
Hack Week 17 was over, so the next days we deployed a few static apps (and one dynamic that needed also a memcached backend), by giving them the absolute minimal disk/RAM (around 10MB ram and 10MB to 256MB disk, depending on the webapp). We triggered a number of bots that started requesting the webpages in a loop, and the results were really impressive: even with such minimal resources and only one instance running, we saw only an increase on the CPU usage to max 15%!
As a second step, we put some static webapps of low importance running in the cluster and we let them public. We're not going to reveal which ones yet though, feel free to guess :) We plan to monitor the resource usage and activity for a few days, and if everything is fine to even put some more important webapps in.
There are a lot of future tasks that need to be resolved before we fully hit production. First of all, as mentioned, the NFS storageclass needs to be replaced with a proper distributed filesystem solution. SUSE Enterprise Storage product is a good candidate for it. Furthermore, we'd need to integrate our LDAP server with both CaaSP and CAP accounts. Last but not least, we are very close on making dynamic webpages with relational database needs working.
The overall progress is tracked in a trello board, and of course the internal heroes documentation has more info about the setup. Volunteers are always welcome, feel free to contact us in case you'd like to jump onboard.
Thanks to anybody who helped on setting up the cluster, the SUSE CaaSP and CAP teams for replying to our tons of questions. Special thanks go to Dimitris Karakasilis and Panagiotis Georgiadis for joining me before, during and after Hack Week 17 and still being around, making this from a simple idea to a production-ready project.
On behalf of the openSUSE Heroes team,