Performance and scalability case study of an Online Banking Application
Co-author: Anu Ramamoorthy
This post summarizes the performance and scalability characteristics of an online banking (OLB) application deployed on IBM Bluemix cloud platform. The performance case study was conducted on both Bluemix Dedicated and Local platforms. The main focus of this work is to understand the capabilities of the Bluemix platform and demonstrate that it can meet the performance and scalability requirements of an OLB application at peak load. Specifically:
- Demonstrate Bluemix can support 500-600 transactions per second (52 million API calls per day) at sub-second response time
- Showcase horizontal scalability with both static and auto scaling
- Demonstrate the platform stability running the application at peak load for more than 12 hours with consistent performance.
We believe that understanding the performance and scalability characteristics of the retail OLB application also helps guide successful deployment of related banking applications on the Bluemix cloud platform. This report begins with an overview of the case study methodology and concludes with observations and tuning guidance.
The workload simulates a customer-facing retail online banking scenario showcasing transactions like an individual account summary page. The end-to-end workload flow is shown in the diagram below.
The application design loosely follows the microservices design pattern and consists of two microservices exposing REST APIs. First, the “orchestrator” service receives REST API calls from the browser UI running account summary page with Angular.js. To render a UI page, the application requires two separate REST calls to the “orchestrator” microservice. Because of this, the workload transactions per second metric (which is defined as pages rendered per second) is only half of the REST calls/second handled by the application.
Second, the “stub” service, which represents back-end data requests, returns pre-loaded back-end data without actually invoking any back-end service; it simulates back-end service latency by putting each request into sleep for a set amount of time. The load simulation tools generate REST traffic.
Workload security is provided by the on-premises enterprise security reverse proxy (IBM Security Access Manager), a VPN connection to Bluemix Dedicated instance, and firewalls. The traffic is encrypted with SSL, which is terminated at DataPower® inside Bluemix and subsequently encrypted via IPSec inside Bluemix. The Orchestrator microservice is configured with Trust Association Interceptor (TAI) login module for Liberty Application Server, which allows it to authenticate ISAM credentials and decode user identity propagated from ISAM.
In our tests, we also used a shortened workload path which allowed us to generate workload traffic from Apache JMeter inside an IBM SoftLayer environment and dramatically simplify test setup using simulated ISAM credentials and user identities previously captured in full path end-to-end tests. Our tests did not detect any performance difference between full path and shortened path scenarios.
Test environment topologies
Performance tests were conducted on three different Bluemix Dedicated environments. The same OLB application is used across all three environments. Two JMeter load drivers were used to load the application with millions of HTTP requests using hundreds of concurrent clients. Both static and auto-scaling mechanisms are used to horizontally scale the application to support peak load. The sustained peak load was applied continuously for 6-24 hours as an endurance test to make sure that the application can withstand peak load for such a long duration without any stability issues.
The figure below represents the network view of the platform showcasing how the transactions flow in the platform.
As shown in the figure above, the initial REST call will go through Vyatta Firewall ①, DataPower Gateway ②, and GORouter ③ to reach the Orchestrator service ④ deployed in one of the DEA’s warden containers. The Orchestrator invokes an HTTP call to Stub service which in turn goes back to the firewall ⑤, DataPower ⑥, and GORouter ⑦ before it reaches Stub service ⑧ deployed in another DEA’s warden container. This route showcases the network hops in these transactions.
Even though Bluemix Local and Dedicated share the same Cloud Foundry architecture at the PaaS layer, Bluemix Local cloud platform has multiple options for IaaS. The test platform that is used for performance evaluations is based on IBM PureApplication® system (see the next section for more details).
Key configuration details of Bluemix Local used for tests in this report:
- Compute nodes of 4 DEAs with each DEA consisting of 4 vCPUs, 32 GB RAM, 16 GB disk, and 128 GB persistent disk
- IBM PureApplication System Gen 2 based hardware IaaS layer
- VMWare ESXi 5.1 Virtualization Nodes with latest Cloud Foundry version and the same version of vCenter
Performance results – Bluemix Dedicated
All traffic is encrypted with one-way SSL and all users are authenticated to make sure that the requests are coming through ISAM. To support the required peak load, horizontal scalability of the application is demonstrated through both static and auto-scaling mechanisms.
The figure below shows the static scaling scenario where more than 21 million HTTP requests were served by six application instances supporting a throughput of 1662 REST operations/sec (or 831 transactions/sec defined as page loads, as each page load requires two separate REST calls) with sub-second response time of 741 milliseconds exceeding the set goal of 500-600 transactions/sec at peak load. The sustained peak load is continuously run for about 6-24 hours as an endurance test for the platform.
Even at this peak load, only 25% of the DEA’s capacity is used, leaving lots of room for other applications. The figure below shows the average CPU consumption of Orchestrator and Stub applications deployed in multiple DEAs (for example, the average instance CPU for the Orchestrator application is only 98% which leaves almost 300% available from other CPU cores for other applications).
Performance results – Bluemix Local
The same application is rerun on Bluemix Local platform using both static and auto-scaling techniques for horizontal scalability. The figure below shows the performance and scalability of OLB application running one JMeter driver sending more than 7.2 million HTTP requests generating 874 ops/sec exceeding the goal. The continuous stress tests were conducted at peak load for more than 24 hours to evaluate the stability of the cloud platform. The data in the figure is with auto-scale service dynamically provisioning multiple application instances based on throughput policy.
The CPU and Memory characteristics of the application during the 24-hour stress test were closely monitored. The CPU data shows that the platform has lots of room to grow even though total 10 application instances (five each of Orchestrator and Stub respectively) were dynamically provisioned to support the peak load.
Performance observations and tuning guidance
The comprehensive performance evaluation of the OLB application resulted in clear understanding of the performance characteristics of similar type of Java applications on the cloud. Some of the other major applications in the banking industry like wealth management, customer reward management, investment banking and others, can directly benefit from this work. This section will highlight some of the observations and recommend best practices for similar applications.
General guidelines for diagnosing performance problems
Typically, a performance problem can manifest in multiple ways. Some examples are low throughout, high response times, high CPU, or memory consumption. The root cause for these performance problems may be due to many factors, some of which are given below:
- Application architecture issues
- Network bandwidth or latency issues
- Insufficient computational resources
- Application or platform tuning issues
- Service integration performance issues
- Cloud platform issues like noisy neighbors
There are many tools that can be used to understand the performance problems and diagnose the root cause. Some of the tools used for this PoC are given below:
- Network tools to understand latency and bandwidth (many built-in and free tools available)
- Application level performance metrics on Bluemix can be obtained through tools like Dynatrace, New Relic, and Monitoring & Diagnostic APM services
- There are also many Cloud Foundry CLI tools available which can be scripted to get valuable performance-related data.
Impact of back-end service latency
One of the major issues that impacted the performance of OLB is back-end service latency. The back-end service latency is simulated in the Stub application. The normal mainframe back-end latency is around 100-200 milliseconds. Initial PoC used a value of more than 1000 milliseconds, which resulted in poor performance.
One of the main reasons for this poor performance can be attributed to the Liberty threading algorithm. The Liberty runtime has a sophisticated autonomic thread management algorithm which adjusts the thread pool size dynamically based on the workload, removing the need for any manual, iterative tuning. The default algorithm works great for normal latency scenarios as shown earlier, but should not be used for high latency scenarios, which can result in very poor performance as shown in this figure.
Guidance for high back-end latency scenarios
- Always measure the back-end service latency (typical mainframe latency is about 100-200 milliseconds).
- One way to identify whether there is any issue with back-end latency with Java applications is to observe the executor thread pool behavior at peak load using Dynatrace, AppDynamics, New Relic, or IBM Monitoring and Analytics Service.
- The figure below shows average thread pool usage metrics running OLB application at peak load with different back-end service latency values. The tool used is IBM Monitoring and Analytics service bound to the application. The chart on the left side shows thread pool usage with low latency scenario and the chart on the right side shows one for high latency scenario. This chart clearly shows the default Liberty autonomic threading algorithm works perfectly, managing the load with small number of executor threads in normal latency scenarios. However, for a very high latency scenario, the expected thread pool size is much higher and the default threading algorithm cannot adjust the thread pool size quickly, resulting in a performance bottleneck which can also be seen from the data at the cloud fabric level.
Even though you should first work towards reducing the back-end service latency, it may not always be possible. In those high latency scenarios, you should bypass the default Liberty threading algorithm by explicitly setting the executor thread pools in
server.xml file of the application adding the following stanza.
<executor maxThreads="100" coreThreads="100" />
The number of threads is application-specific and needs to be tested for each application depending on the back-end latency data. It also depends on the load and number of application instances. The general recommendation for a perfectly designed application is setting an initial thread pool size of twice the number of hardware threads available. But this number will be higher than that for a typical enterprise Java application as factors such as locking play a role in this. Also it is a best practice to evaluate the required thread pool size at peak load during QA testing and then add an additional 10-15% more threads for any peak load fluctuations. But excessively large thread pool size with lots of idle threads can reduce the performance due to context switching.
Impact of sticky sessions on auto-scaling
Another important issue that needs attention is sticky HTTP sessions. While using the auto-scale service for horizontal scalability of an application based on various policies like throughout policy, memory policy and others, sticky sessions need to be disabled by adding the following to
web.xml as shown in the excerpt below. Otherwise, if the sticky sessions are used, then the load may not be distributed uniformly among dynamically provisioned application instances, which will impact the horizontal scalability.
<session-config> <sessiontimeout>1</sessiontimeout> <cookie-config> <max-age>0</max-age> </cookie-config> </session-config>
Java heap tuning guidance
The optimum size of an application instance depends on multiple factors, including the type of runtime being used like Java, Node.js, Swift, or others. For Java applications, the correct heap size can be found through some iterative testing at peak load. Verbose garbage collection data will be useful for tuning the Java heap. Even though it is always better to choose the right size to avoid Out of Memory (OOM) situations, the OOMs can be handled graciously by the platform and the load will be redistributed to remaining instances while the affected application instances restart.
The online banking PoC demonstrated that IBM Bluemix cloud platform is designed for mission critical enterprise banking applications. The PoC clearly showcased that Bluemix cloud platform can support the required peak performance levels (more than 72 million API calls per day with sub-second response time) with required agility and elasticity. The endurance tests proved the stability and resiliency of the platform. The performance of OLB is similar on both Bluemix Dedicated and Bluemix Local cloud platforms. The back-end service latency and the auto-scale session affinity need to be evaluated to avoid performance issues and they are equally applicable for most of the Java applications.
References from the IBM Garage Method
Share this post:
via Bluemix Blog https://ibm.co/2pQcNaA
May 2, 2017 at 03:12AM