Engineering

The story of iOS CI at Appunite (Part 2)

Welcome back to the story of iOS CI at Appunite company. In the previous article I’ve described how we moved from outdated and problematic CI infrastructure into modern virtualized approach. I showed you what problems we were able to fix by introducing this approach and today I would like to take a look at what the consequences of this migration were and how we faced them!

New issues

After using and maintaining the setup above, we started to see 2 main issues.

Disk space

Every image with macOS and Xcode is around 25GB, this started to be slightly problematic when we wanted to have a lot of Xcode versions on one device. Of course, we can fix it by increasing hard drive space but we also wanted to somehow reduce the size of each image. Almost all virtualization software allows you to create links between images which means you link new image to some existing one and share the same contents of original image. By leveraging this feature, we could share macOS image between all target images and all system files will be shared. Also, we created a separate link between that has all the tools needed for iOS development installed, like python, ruby and many other. With this idea we achieved another modification of the scheme above:

As you can see, for 3 Xcodes, we have reduced the size from 75GB (3 x 25) to something around 42 GB. That's a huge gain! With such a setup, we didn't even need to increase the hard disk space for our computers, which saved us some money.

Build time

The second issue we've struggled with was build time. This caused issues for CI consumers - developers. That was pretty frustrating to them that now their project builds approximately 3 times as long as before.

What we did first was analyzing what the most time consuming points of a particular pipeline were. We immediately discovered that the main reason for building faster in the past was persistence. The same persistence that were trying to avoid when we introduced templates for virtual machine images. But that time we wanted to save some data knowingly - storing exactly the data we need to make pipeline faster. To solve this issue, we've introduced our own mechanism for caching, which was heavily inspired by CircleCI caching system. We've decided to implement this tool in Swift language and basically, it is uploading directories keyed by special cache keys to AWS S3. I highly recommend you reviewing the source code of it, but some sample call looks like this:

appunite-cache store --key 'gems-{{ checksum "Gemfile.lock" }}' --paths 'vendor'
appunite-cache restore --keys 'gems-{{ checksum "Gemfile.lock" }}'

When it comes to caching iOS specific dependencies, probably the first thing that can come to our mind is how to speed up building external libraries. Of course, there is a great tool called Carthage that we've started to populate for each project in our organization. We cache such dependencies in several different ways, which I described in another article. Without going into all the details that you can find in this article - we've decided to start using Rome. While we could still use our dedicated tool for caching whole Carthage directory - when using Rome we also take benefits from working together in a company and speed up not only CI runs but also local ones.

Also, we have done several software/hardware improvements:

  • Moved from Virtualbox to Parallels software that gave us around 30% speedup
  • Updated our Mac Minis fleet to 2019' models with 12 threads
  • Started to parallelize builds per single machine - 6 threads and 4GB of RAM memory per runner

After so many improvements we've noticed that the build started to be even faster when compared to the non-virtualized architecture times

Approximate build times for the same project:

  • 1st approach with no virtualization ~ 10 minutes
  • 1st iteration of virtualization with no cache and other improvements ~ 30 minutes
  • virtualization with several improvements ~ 4-5 minutes

As you can see, we've achieved a very scalable CI system that seems to be ideal...

Current issues

As you may guess - everything has its own trade-offs and no matter how many improvements we will take, we will always have some new issues caused by that. Currently, when we want to add new Xcode, we're still forced to work manually on the preparation of macOS virtual image with some Xcode installed. Also, there are several operations that we need to make to fit our system and of course register this image somehow in Gitlab. All these actions started to be a little time consuming when it comes to populating it for 3 computers. After all there, are a lot of remote connections with ssh / vnc behind it to do, which requires a knowledge of some ip addresses/credentials.

Then, I started to think that maybe we could also automate this process by introducing some tooling on top of our CI system. At this time, I was highly interested in server-side Swift, so quite quickly we had a new web application that allowed us to add new Xcode image by pressing several buttons under some publicly known address (we authenticate with our company gitlab account). After all, scheduling the addition of new Xcode version takes now around 10 seconds, and when this process completes, there is Slack message sent with information about updates in supported Xcode versions. Also with such an app, we are now able to monitor all runners and have better knowledge of what is currently being built on particular machine or who is occupying which version of Xcode.

Another problem - possibly specific to gitlab-runner is that we've noticed that sometimes our virtual machines are randomly suspended - around 2-3 times per week. After spending dozens of hours and learning nothing I've come up with kind of workaround of this issue. While we could automatically reset gitlab-runner after detecting such an issue this is not the best solution. Why? Because there is a possibility that other users are running some important work on CI at the same time while some others are getting suspended. What we did was sending a warning interactive message on Slack that shows what is currently being built and allows to reset runner on such a computer:

With those improvements, maintenance of the whole system takes around several minutes per week, while previously it took more than an hour. That's not all - we have faster CI, we can build many more projects concurrently, we can add new Xcode faster than anyone else, we are immediately informed on slack about the stability issues of the system - there are really huge money savings behind this.

Guess what - we even recently started to use the same CI system for building Flutter applications. We are able to create virtual images in seconds and define versions of all tools like Xcode, Flutter, Android SDK, Android Tools and many, many more...

Have we reached the silver bullet?

No! We are still improving our CI infrastructure to make it better and better all the time. Recently I've started to explore Elixir / Phoenix technologies and I'm rewriting this server-side-swift application to be even more flexible and generic in Elixir. This time it's going to be open sourced and hopefully more companies could take advantage of it to build their own fast and easily maintainable CI system.

Lesson learned

In the end, I would like to tell you what the point of this article is. This story sounds like a beautiful process on constant improvements with each iteration of our system. But it wasn't as pleasant as it looks. The most critical part was when we introduced the 1st version of the new, virtualized approach and it started to be much, much slower than the previous one. This was done by one guy in our company that I would like to give a huge kudos now: Emil Wojtaszek. At that moment, his approach was questioned by almost all iOS developers in our company, everyone was complaining about CI building time and its stability - also me.

The lesson learned? Try to help your friends to build great tools together rather than criticize it in the beginning. Without risk, without tripping, without iterations, you just can't make software great again.

If you enjoyed the article, follow me on Twitter to be always up to date with my articles.