Thanks to moving to continuous deployment, the portal never goes down, customers benefit from new features earlier, and bugs are found and fixed much quicker.
So why didn’t Inter.link start with the continuous deployment approach?
Thanks to an interview with Poh Chiat Koh (Tech Lead CI/CD & Systems), this article provides an answer to this question.
Deploying software changes
In the early days of the Inter.link portal, we needed a way to get all the features into production.
This is typical for every company that’s building software. A new feature is developed locally on a laptop, and it must be moved through to production. This needs to be done safely, therefore tests need to run automatically, and it needs to be deployed in a consistent fashion instead of commands being typed and hoping for it to be correct.
A semi-automatic approach for software deployment
Originally Inter.link opted for a semi-automatic approach.
This means that feature deployment was automated in the sense that the steps taken to ship software into production were scripts (manually triggered Gitlab jobs), making it a repeatable and consistent process.
However, deploying to production was not done every time someone shipped a new feature. This was executed on a weekly or even biweekly basis depending on how many changes or updates needed to go out. It was also done in a manual fashion, so someone decided on the timing of a release then notified the team ahead of time before clicking the necessary buttons to execute the scripts.
Inter.link has been doing it this way for a while, and one of the main reasons why was due to the need to introduce many and frequent breaking changes in the earlier days.
These breaking changes include non-backward compatible database changes, incompatible API changes etc. To reduce the risk of causing data inconsistencies, we bundled all of these changes across our stack and release them at the same time.
Why didn’t Inter.link continue this semi-automatic approach?
One of the biggest downsides to this semi-automatic approach to software deployment is the need to accumulate a lot of changes. We can accumulate over 20 or 30 changes over two weeks, and release them all at the same time.
The main disadvantage here is that we are putting out a lot of changes at the same time, so if there were bugs, it was very difficult to find out exactly which change introduced the bug. Tracking down the problem took time.
Once we have tracked down the bug, we cannot simply roll back to the previous deployment as that will undo all the accumulated changes at once. Reverting the commit also required a new manual deployment.
Moving to continuous deployment
As Inter.link grew and we started to settle on a data model, the number of breaking changes reduced. The downsides of a semi-automatic approach started to become apparent, and it made sense to move to continuous deployment.
To be able to achieve this, we had to make some changes across the board. For example, our scripts never cared about graceful shutdowns. It also took down the entire service, meaning our public portal becomes inaccessible for a short period of time.
In the context of continuous deployment, we could be deploying 3, 5, or 10 times a day and we cannot afford to have the Inter.link portal going down multiple times a day, even if it’s only a couple of minutes each time. Thus, we had to go through all our services and made sure they are terminated gracefully, and switch to use rolling updates for all our deployments.
We also introduced guidelines on making incompatible changes. For example, if we needed to break API compatibility, we would break down the changes into small, backward compatible merge requests. Not only are smaller merge requests easier to review, keeping them backward compatible meant we can easily rollback if need to.
Finally, we had to add alerts so we can be notified when deployments fail, or if they left the service in a bad state.
Once all of that was completed, the switch to continuous deployment was actually very simple. We simply made the existing Gitlab jobs run on every merge to main instead of them being set to manual.
Continuous deployment in the network industry
Continuous deployment is a very common approach amongst software companies, or web services for instance where the idea that you have to disrupt your services from a deployment is unthinkable because people are pushing so many changes in a single day.
However, it is still relatively new to the network industry which, with the introduction of network-as-a-service, has only come around to operate more like cloud services in more recent years.
Inter.link’s portal is a critical part of how we operate because our connectivity services are provisioned on-demand. So, it is essential to show all our locations, and ensure the portal never goes down.
Continuous deployment directly supports Inter.link’s approach to innovation, letting customers receive benefits faster.
How does Inter.link’s move to continuous deployment benefit customers?
Portal no longer goes down –
In the past, the portal would be down for brief periods but now due to continuous deployment it will not be down at all anymore.
Customers benefit from new features earlier –
Developers can move much faster. A developer can ship a feature they are working on on the same day instead of waiting for the next release cycle. As a result, customers can benefit from new features quickly.
Bugs are found and fixed much quicker –
When a bug is found, the lack of continuous deployment forced developers to come up with ways to work around bugs just to avoid having to make a manual release. Now, not only can a developer choose to fix forward and ship to production quickly, they also have the option of rolling back the changes to resolve the issues.
Visit the Inter.link Portal
Interested in seeing all the latest features and network locations from Inter.link?
Visit the portal and explore!