Step Out of the Cluster, Please?

This is the first in an infrequent series of notes from our engineering team about the technology behind TakeTurns, it's #TakeTurnsTechTuesday!

In this interview Conrad Chuang and Fabien Bontemps discuss how we migrated our front-end services from our Amazon Elastic Kubernetes Service (EKS) Cluster to Amazon Cloud front.

‍

*A Cynical CTO has a conversation with his chief architect[1]*

What can you tell me about our architecture?

Like many other solutions, TakeTurns is built on a cloud-provider-enhanced implementation of Kubernetes, in our case we use Amazon Elastic Kubernetes Service (EKS).

Our architecture is event-driven [2], and the backend is structured around microservices[3] corresponding to our business domains. And for the frontend, we opted for microfrontend[4] architecture (again, one microfrontend by domain), and the underlying framework is React.js.

At some point, we’ll provide an article around the global application architecture and the rationale but it’s not the current subject of this article.

How did we end up here?

In the first version of our architecture, we hosted our frontend in the Kubernetes cluster, using pods with containers running a node.js server, basically serving static content.

The only dynamic elements were the use of environment variables to define environment-specialized values to integrate with other layers (backend, identity provider, storage and 3^rd party integration). But we could have defined it in a configuration file.

One major interest in using Kubernetes for this hosting is the 24/7 availability and blue/green deployment mechanisms. As Amazon describes quite succinctly:

“A blue/green deployment is a deployment strategy in which you create two separate, but identical environments. One environment (blue) is running the current application version and one environment (green) is running the new application version. Using a blue/green deployment strategy increases application availability and reduces deployment risk by simplifying the rollback process if a deployment fails. Once testing has been completed on the green environment, live application traffic is directed to the green environment and the blue environment is deprecated.”

Indeed, providing a new version is as smooth as possible; we don’t experience any service interruptions when upgrading. The switch between version A and B is managed by Kubernetes proxy and integration with AWS load balancers, which is itself a Kubernetes-managed mechanism. The inscription into the DNS is also managed by the cluster, as are the DNS cache mechanisms.

So, until today, «Tout va pour le mieux dans le meilleur des mondes»[5] .

But then again, using node servers, multiplied by the number of environments, plus the number of pods to ensure regional distribution and availability, and multiple front-end components costs a lot.

Ah, given the costs «Il faut cultiver notre “Cluster?”» ?

Exactly. Cost optimization is always one of the key transverse objectives for many SaaS companies, especially at the beginning of the adventure.

It was decided we must migrate our front-end resources from the Cluster!

What was our most cost-effective solution to reduce costs?

Considering what we are hosting, one of the more efficient solutions is to use a Content Delivery Network (CDN)[6].

Our cloud provider, Amazon, proposes its own implementation (AWS CloudFront), which has some advantages:

It is directly integrated with the DNS (Route53) mechanism.
It manages cache, advanced filtering, security, and metrics gathering (CloudWatch)
It ensures multi-region availability and accelerators for cross-region access.
It can be directly mapped on a static content hosting solution (S3 bucket storage in our case) and maps a domain name (in relation with Route53, again).

But every coin has two sides, unfortunately.

Don’t you mean every rose has its thorn?

Hrm.

Anyhow, CloudFront can only define one distribution (one mapping from S3 to network) per domain name.

So if we want to use CloudFront, how can we switch from one “old” distribution to the “new” one using the Blue/Green principle or something, ensuring no interruption for our service?

What were the options we considered?

We came up with a few options:

	Option 1	Option 2	Option 3
Option	Use the “weighted” load-balancer feature. One can apply a weight (from 0 to 255, relatively) on multiple load-balancers to distribute traffic to multiple underlying targets.	Use Lambda@Edge Lambda@Edge is an interceptor on each request and can do a distribution/redirection depending on any custom rules (geographic etc.).	CloudFront Staging distribution. AWS introduced in November 2022 non-production distribution which can co-exist with active ones. We could then prepare the next version in staging, then promote those to production ones.
Challenge	But a limitation on Route53 aliasing to CloudFront distributions forbids multiple aliases for the same domain name. Aliasing with a wildcard (*.taketurns.app) is a well-known web-described hack, but it does not allow us to define multiple environments domain names (integration.taketuns.app, testing.taketurns.app; etc.) which is mandatory for us.	But it uses cookies to remain sticky for each requester (and we do not want to use cookies) plus adds a systematic over-consumption (time + cost)	But two major obstacles: 1) One can’t totally disable staging distribution (relative weight is between 1 and 100 %) 2) Nothing exists to roll back from the new version to the previous one, meaning that once staging is promoted, it can’t be “un-promoted“ back.

How would you summarize the challenge?

Sure. If we choose to go with a CDN, what we gain on cluster consumption, we’ve lost on the deployment capabilities–especially the smoothness of version management.

In these kinds of cases, what I try to do is to prendre de la hauteur, or take a step back and reconsider the problem with KISS in mind (Keep It Simple, Stupid).

We focused on what we really needed:

Folder/file system structure representing multiple versions
Structure representing the actual one
Easy mechanism to switch from one version to another

Au bon vieux temps (in the good old days), we used to manage this with symbolic links. Does S3 (where we host our static content) provide this possibility? Nope!

What was the path forward?

We can manage all of this with simple mechanisms:

Copy version A and version B in a dedicated file structure (subfolders A and B, respectively) of the S3 bucket and tag the resources accordingly.
If version A is the active one, the content of folder A will be copied into the root folder of the S3 bucket.
CloudFront distribution always serves the content of the root folder. It uses cache mechanisms, so no further requests to S3 root content will be performed once at least one user has requested access to the static content via CloudFront distribution.
If we want to enable version B, we only copy the content of version B to the root folder. Using S3 versioning, we do not lose any elements. All users keep on using version A thanks to CloudFront caching.
Once the copy is done, we use the CloudFront invalidation mechanism, which will empty the cache and force reload of version B for all users.
To go back to version A, either you can process the same way, redeploying version A or removing B components/version using the tags applied and a more cherry-picked approach.
You can keep as many versions as you want and define a purge mechanism (based again on applied tags) to keep only the version depth you desire.

Wow! Simple and cost-effective!

Yes! In conclusion, we managed to: change the way our front end is hosted and reduce Kubernetes costs while keeping our super smooth deployment capabilities!

Notes:

[1] The artwork used in the meme is Christian Winck: Alexander der Große und Diogenes, 1782 . The image alludes to the story of Diogenes of Sinope, a Greek philosopher, and Alexander the Great, a Macedonian king and military leader. The story goes that Alexander was quite impressed with Diogenes (one of the most prominent figures in the Cynic school of philosophy) and offered to grant him any favor or request. Diogenes, who was sunbathing, simply said, "ἀπὸ τοῦ ἡλίου μετάστηθι", which translates to “Step out of the sun,” since Alexander was blocking the sun.

[2] Event-driven architectures are one in which the software components communicate asynchronously through events. Events are when something changes (i.e., changes state) or something occurs. It’s different from the classic request-response style, where the communications are synchronous.

[3] Microservices is an architectural style for developing software systems by breaking them down into small, independent, and loosely coupled components or services (they’re small or micro!). Each microservice is responsible for specific functionality and can be deployed, updated, or scaled independently.

[4] Micro-front end … it’s microservices for the front end!

[5] "Tout va pour le mieux dans le Meilleur des mondes" is from Voltaire’s Candide (1759). It translates to "All is for the best in the best of all possible worlds,” which is the catchphrase of Dr. Pangloss, the extreme optimist, who suggests that everything that happens, even if it’s negative, is ultimately for the best and serves a greater purpose. (fwiw, our French team enjoys quoting Voltaire and to that we say:Tout va pour le mieux dans le Meilleur des mondes!). The artwork used in the meme is an illustration for Candide; ou, L'optimisme; 1900

[6] A content delivery network (CDN) is a geographically distributed group of servers that caches content close to end users.

‍

Step out of the cluster, please?