QuiskTech: Continuous Performance Testing

This post comes courtesy of Amandeep Lubana, Senior Test Engineer at Quisk HQ.

 

This week we want to take a look at Continuous Performance Testing, the way that mobile money or financial services companies such as Quisk measure how their systems respond in various situations.

 

Overview

We take pride in the robustness of our global platform from both a functional and performance point of view.

In order to ensure the best consumer experience, we measure our platform along the following factors:

  • Response time
  • Resources utilization
  • Stability
  • Throughput (a combination of the above)

Performance Testing at Quisk is designed to accurately test our systems on these criteria, and to set baselines for further testing. In other words, rather than trying to find functional defects in an application or product–something we do via separate QA processes as a part of every build–our team uses performance testing to simulate various ‘what-if’ scenarios in a carefully sandboxed environment in order to gather quantifiable data to clearly answer these questions.

This is an important rule of applying continuous performance testing in your own organization or engineering team: Every test needs to do three things: It must align with your predetermined measurement criteria, must have a clear goal, and it must provide an answer to the specific what-if scenario you are testing to.

The act of ensuring that every release at Quisk is not only functionally tested, but tested for performance before it is pushed to our pre-production environment is the Continuous Performance Testing we are discussing here.

Some common performance issues that are uncovered in these scenarios are the following:

  • Database access
  • Network latency, or time spent on the network
  • Time spent on reading or writing files on the disc
  • Time spent on making HTTP or other calls
  • Inefficient code that can cause memory leaks or poor resource utilization

As we repeat various system tests we are continuously gathering data that is used to provide almost real-time feedback to our engineering and development teams. Where criteria are negatively effected, we are able to form hypotheses about how to tweak or improve code in future releases, or can test theories by proposing additional new scenarios to gather data on and quickly make improvements.

By understanding where releases sit against our baselines and Quisk SLAs, or how hardware and software utilization might be effected in various scenarios, our management teams are ultimately able to determine if each agile iteration is ready to be pushed to production.

 

Setup and Testing

Another important rule of continuous performance testing is that all test scenarios must accurately reflect real-world environments in order to be useful.

This may seem obvious, but like properly instrumenting applications to gather good data, is much easier to understand than to execute. At Quisk, we carefully mimic our sandboxed performance environment to the production environment in all aspects – servers, routers, switches, load balancers, etc.

As Quisk is a mobile payment system, let’s take a closer look at how this works using a ‘virtual machine’, or VM client, in our Quisk performance environment to generate as many transactions as needed. These clients are scalable and allow our teams to generate millions of simultaneous transactions if needed using scripts that we create for each scenario.

A typical setup for a VM client might look something like this:

 

After simulating a huge volume of transactions in a single second, or many thousands of simultaneous API calls, we use a product called JMeter to provide our team results in the form of convenient pre-formatted graphs and .csv files.

These results can be further analyzed by our team to gather critical information. We might want to examine how much load the system was able to sustain and for how much time, or compare our CPU and memory utilization scores during various componentized scenarios.

An advantage of using VM Clients to create end-to-end tests programatically is not just consistent data outputs, but consistent scenario creation. The programmed scripts we create in performance tests are re-usable and scalable, so that we can understand the impacts everything from tiny code changes to a fully-formed new feature might have on a legacy system, and conduct multiple simultaneous tests.

Another advantage is that clients can be multi-threaded to reduce or eliminate idle CPU processing and run threads in parallel. In the above test, four different VMs are sampling 100 users completing transactions at 100 terminals at just 5 merchant locations, an extreme scenario that nonetheless does a good job at isolating our test criteria. If the tests are run with 5 threads on each VM (5×4), then each VM is simulating 20 users, each on 20 terminals at a single merchant. Scripts might be programmed to make sure that no tests are duplicating or resending the same data in multiple requests. These tests might be run for 30 mins or 1 hour.

A third rule of continuous performance tests is to carefully choose a data period that clearly aligns with your test scenarios.

When reporting the test results here, our team might skip a 5 minute ‘starting window’ as well as the 5 minute ‘ending window’ of the test to quantify a clean 20 minute middle segment where the system is at load (i.e. 5-20-5 in a 30 min test). This allows data like max, median, average, and minimums to not be effected by any ramp-up or shut-down conditions and generate consistent and actionable results. It also allows us to repeat our performance tests continuously, with comparable data that we can plot to past performance baselines.

Finally, before we conduct 20 concurrent thread tests we might test with both 1 thread and 10 thread tests. In this way we are able to capture the no. of transactions that occur on a single thread, so that we can verify that later 10 thread tests are conducted at capacity with regard to the number of concurrent users and the total transaction count. In the past, little tricks like this have allowed us to identify multiple hidden issues that could have made the system heavy or dropped our transaction count.

Here’s some obvious baseline metrics that might be compiled and examined in a continuous performance test of an open payment system like Quisk:

Transactions per second – We measure how many transactions are executed per second across an increase in the number of threads, measuring this ‘TPS score’ as we scale the performance of Quisk’s Digital Transaction Processing Platform. Where transactions are slightly lower than expected, our team can explore hypotheses through various subroutines to find and fix these issues quickly.

 

Response Time – We test the total time it takes from when a user makes a given request until they receive a response. This response time should never be more than than 200 milliseconds as we know user experience rapidly degrades with lag beyond this point. Response times are closely related to TPS and are often measured together, where increases in response time are measured alongside decreases in TPS scores.

CPU Utilization – We look to keep CPU utilization between 40%-50%. While performing our tests we always keep the payment greater network in consideration as this plays a vital role in getting the transaction in and out. We test the work flows using internal IP’s, external IP, and load balancer URL’s to simulate and identify any bottlenecks in this network.

 

Memory Utilization – This is monitored to identify any memory leaks in the code, and our tests look for patterns of constantly increasing memory utilization over a testing period.

 

Results and Remediation

Once the test are collected they are analyzed by the engineering team to see if there is any degradation of performance when compared with the base line numbers.

In the above example, one issue that was flagged and identified during this particular performance tests was a brief drop in transactions per second (TPS). During our subsequent test investigation the team found that the system was making a database call and retrieving data into the cache without first doing a check cache. This was immediately corrected and verified over following testing cycles.

To observe these performance issues, there are a range of technical tools that can be used, some of which we look forward to reviewing in future pieces.

 

If you have any questions about Continuous Performance Testing in your organization, drop us a note in the comments section or reach out to our technology team on twitter @GetQuisk or via email at tech@quisk.co

We’d love to hear from you.

 

-AL