How We Saved a Client $100,000 Annually by Optimizing Tests

What is the purpose of automated testing?
Worsening performance constraints with a growing test suite
The costs of slow tests
- Financial impact
- Productivity impact
How we improved test suite performance
Life with a faster test suite
Build quickly with confidence

What is the purpose of automated testing?

Our team is a strong proponent of automated testing. Automated tests ensure that our code is working as expected, and that it will continue working even if our app changes over time.

Every application we build has a comprehensive suite of unit tests, integration tests, and end-to-end (E2E) tests. We run these tests every time we change code or deploy a new version of the app. Tests ensure we don’t break existing functionality when we make changes, and they help us catch bugs before they reach production. In short, automated tests are a safety net for building excellent apps.

Worsening performance constraints with a growing test suite

Testing is a critical part of our development process with significant benefits, but there are some costs associated with the practice. As an app matures, the number of tests in a test suite will grow. Some of our apps have thousands of automated tests. The more tests you have, the longer it takes to run the test suite. In some instances, software developers will need to wait on the results of the test suite before moving on to other work. That wait time can thereby reduce developer productivity.

We found test execution time to be a significant bottleneck for one of our clients. While working on several web applications for the client, we found ourselves waiting an hour for a Java application’s test suite to complete. This hindered our own productivity and slowed down development work.

The client’s team was also frustrated by the long test execution time but had grown accustomed to waiting for the test suite, as they had dealt with this problem for a long time. Unfortunately, it’s not a matter of just waiting for tests to pass. Sometimes code changes introduce test failures, meaning developers would need to wait an hour to find out if their changes broke anything. A team of approximately 20 developers was spending hours each day waiting for tests simply so they could do their job!

The costs of slow tests

This time adds up quickly, especially across such a large project team. While idle time is not inherently bad, it becomes a problem when your project’s constraint—your bottleneck—is idle. Work can only be done as quickly as the constraint allows. And more often than not, developers are likely the constraint on most software projects.

Financial impact

So what is the cost of waiting for tests to pass? Let’s perform some naive calculations. If each developer pushes code changes twice a day, and each test run takes an hour, then the team is idle for 200 person-hours each week. That’s 10,400 person-hours per year. If your total cost (salary, benefits, equipment, office space, etc.) for a developer is $100,000 per year, that idle time would add up to $500,000 in lost opportunity each year!

Now, thankfully, those naive calculations do not represent the lost opportunity cost our client experience. In reality, developers had adapted to pushing their code less frequently, and only a subset of the team was regularly working on the slowest Java application. However, a similar back-of-the-napkin calculation demonstrated that the client was leaving nearly $200,000 per year on the table due to slow test suite execution.

Productivity impact

The financial impact of slow tests is significant, but there is also a productivity impact. When developers are waiting for tests to pass, they are not working on other tasks. Or, if they do decide to work on another task while they wait, they may not be as productive as they would be if they were focused on a single task.

Consider test failures. If a developer introduces a code change that causes a test to fail, they will need to wait for the test suite to complete before they can fix the test. In the meantime, they have moved on to another task and are no longer thinking about the problematic code. Once they are notified of the test failure, they have to once again switch their focus to the code that caused the failure. Context switching requires time, so switching back and forth between tasks will reduce the developer’s productivity.

How we improved test suite performance

Identifying the problem

We began investigating ways to improve test suite performance. The first thing we noticed was the amount of integration tests. Integration tests verify that different parts of the application work together as expected. However, these tests are often slower than unit tests because they require setting up a database, starting a web server, and other time-consuming tasks.

Integration tests are very valuable for catching problems prior to deployment, and we wanted to maintain the quality of the test suite. Therefore, removing integration tests was not an option. Instead, we needed to find a way to run the integration tests faster. Looking at system metrics during test execution, we quickly identified the performance bottleneck: the CPU.

CPU utilization was consistently at 100% during test execution, but only for one CPU core. Most processors now have multiple cores, and each core can run its own processes. The test suite was not set up to take advantage of this capability.

Parallelizing tests to utilize available hardware

We made the single-process discovery on a development machine. After reviewing the CI server’s configuration, we realized it also only executed one test at a time. This made our next step clear: we would parallelize the test suite. Instead of running a single test at a time, we would run as many tests as possible at the same time.

It’s important to recognize that the test server is doing other work aside from running tests. Operating system processes still need to utilize CPU time. Therefore, you generally won’t want to dedicate every CPU core to running tests. Instead, a good rule of thumb is to start your parallel test efforts with (N/2 + 1) processes, where N is the number of CPU cores. For example, if your test server has an 8-core CPU, you would start with 5 parallel processes. If you’re unsure of whether you can get more out of the system, you can always try increasing the number of parallel processes, but the (N/2 + 1) rule is a sound starting point.

Since the test suite included integration tests, we needed to make sure that the tests would not interfere with each other. This meant giving each process its own database and web server. We used Docker to create a separate database for each process, and started a web server bound to a unique port number for each process (e.g., 8080 + N where N represents the process number). Starting up multiple databases and seeding them with required test data was a slow process, taking up to 60 seconds. However, since we can reuse the same database for multiple test runs, we only needed to do this once per process. Over the course of test suite execution, we still realized significant performance improvements.

Parallelizing test suite execution brought test time down from an hour to 20 minutes on existing hardware. What a drastic improvement! This still falls short of the general Extreme Programming (XP) guideline of a ten minute build, but was a significant decrease in the time it took this team to receive feedback on their code changes.

At this point, we recalculated the lost opportunity cost of slow tests. Whereas the client’s lost opportunity cost from idle time was previously approaching $200,000 per year, it was now closer to $60,000 per year, amounting to an overall reduction of $140,000 in lost opportunity cost.

Adding more hardware

After parallelizing the test suite, we knew there were more ways to improve performance. We had already identified the CPU as the bottleneck, and parallel tests fully utilized the client’s available hardware. Once we had exhausted the available CPU cores, we knew additional hardware could further decrease build times. We provided the client with information about our findings and recommended they add more hardware to execute test even faster.

On large teams, multiple test servers become just as useful as more powerful hardware. While we had focused on parallelizing the test suite, and the overall execution time was significantly lower, the build server was limited to running only one instance of the test suite at a time. That meant that if multiple developers pushed code changes at the same time, they’d have to wait for their turn to test their code changes on the build server. Multiple build servers would alleviate this constraint.

Further improvements

Beyond the aforementioned improvements, there are other ways to improve test suite performance. Optimizing individual tests, eliminating unnecessary tests, and generally improving application performance are all ways to decrease test suite execution time. However, those improvements can be a much more significant investment than the relatively simple changes we made to the test suite. Parallelizing tests and buying more hardware is incredibly cost-effective compared to optimizing individual tests. Whereas our approach took a few days to implement, optimizing hundreds or thousands of individual tests could take weeks or months.

Life with a faster test suite

The client was thrilled with the results of our work. Decreased test execution time meant developers could get feedback on their code changes faster. This allowed them to be more productive, and they were able to push code changes more frequently.

Build quickly with confidence

Automated testing ensures you can deploy new code with confidence. However, slow test suites can be a major productivity drain. You can improve test suite performance by parallelizing your tests and adding more hardware, increasing your team’s productivity while preserving the benefits of automated testing.