Presenter: Jim Brady
A web application load test performed correctly can be a valuable tool for identifying bottlenecks, determining resource consumption levels, and quantifying system capacity. One done incorrectly is revealed as a waste of time and effort when the system under test (SUT) goes live with completely different performance characteristics than the load test indicated. One problem with load test quality, almost always overlooked, is the potential for the load generator’s user thread pool to sync up and dispatch queries in bunches rather than independently from each other like real users initiate their requests. A spiky launch pattern mischaracterizes workload flow as well as yields erroneous application response time statistics. This talk describes what a real user request timing pattern looks like, illustrates how to identify it in the load generation environment, and exercises a free downloadable tool which measures how well the load generator is mimicking the timing pattern of real web user requests.

Most web application load testing professionals assume their load generator’s virtual user thread pool precisely mimics the request timing that a comparable set of real users produce. However, a traffic generator is one computer initiating web requests with a large fixed set of user threads operating in closed loops while real users each have their own computing device and make queries independently from each other as a dynamically changing subset of a larger population. The simulator’s heavier workload, the think time method it uses, and the feedback produced by its fixed closed loops may cause the user thread pool to sink up and offer unrealistic surges of traffic to the SUT. Few practitioners think about user thread synchronization, and those that do find the problem difficult to quantify when it occurs. This talk describes the request pattern produced by real users and provides a measurement methodology for evaluating request pattern quality. The approach taken is illustrated with a free downloadable tool I developed: the web-generator-toolkit. It can be found at Dr. Neil Gunther’s GitHub location: github.com/DrQz/web-generator-toolkit.

We plan a set of approximately 15 Microsoft PowerPoint slides as a medium for presenting the material and, although this topic involves some queueing theory concepts, no complex or subtle equations are planned. We use intuitively appealing illustrations subsidized with a few simple algebraic expressions. Presentation flow:

1. Real User vs Virtual User Request Timing – The differences between Real Users vs Virtual Users (load generator user threads) as described above are illustrated pictorially with contrasting images of how each makes web requests to a server. 

2. Request Timing Illustration – A simple timeline with independent (real user) query start times marked is used to illustrate that the number of queries per fixed time interval is Poisson distributed and time intervals between queries are Negative-Exponentially distributed. Histograms, not equations, are used to show these “Poisson Process” attributes.

3. Request Timing Simulation – Given these basic ideas, the results of a simulation consisting of two-hundred independent start times drawn across a one-thousand second time interval are shown. Again, histograms summarize the results and highlight the fact that for the Poisson distributed start times per fixed interval their mean number equals its variance and for the Negative-Exponentially distributed time intervals between start times the mean time interval equals its standard-deviation. 

4. Request Timing Evaluation – These statistical properties of real user requests are exploited to evaluate request timing pattern quality. The Negative-Exponential’s properties are chosen because its time interval between events statistic is more straight forward to compute than the Poisson’s count per fixed (arbitrarily long) interval numbers. Given the fact that the Coefficient of Variation (CoV) is defined as the ratio of the standard-deviation to the mean, the following guiding principle is stated when deciding if the load generator is producing “Real Web User Request Timing.” To determine if the load generator is creating web requests independently from each other like real users, sort the launch times in ascending order, calculate their differences, and compute the CoV of those differences. If the CoV is approximately equal to one, CoV ~ 1.0, real user request timing is being produced. If the CoV > 1.0 requests are being launched in bunches and if the CoV < 1.0 they are too evenly spaced. Note: the web-generator-toolkit performs this calculation and generates a time interval between requests report containing the CoV statistic.

5. Load Generator Launch Time Distortions – Armed with this guiding principle and the tools for its implementation, we take a look at some of the ways launch time distortions can occur focusing on three of the most common situations where time interval between requests CoV > 1.0. The three listed in the introduction expanded upon here are heavy workload, think time method used, and closed loop feedback.

6. Heavy Workload – We use a diagram to show that under heavy load there can be a significant amount of launch time bunching caused by the large number of user threads sharing processor execution resources. That diagram also illustrates that when a user thread’s think time expires it does not execute immediately, but waits in a queue at its assigned priority before launching its query.

7. Think Time Method Used – The think time method used can also be a source of launch time distortion. The choice that usually causes this problem is fixed think times because, under those conditions, user threads are more likely to queue up and cluster behind the long response time events. Drawing think times from a probability distribution reduces the severity of this lining up behind the long latency tasks in the test suite situation. In fact, the desired independent request times can be drawn using any probability distribution if the number of user threads is sufficiently large. This “Principle of Superposition” is symbolized with a diagram showing multiple Uniform distributions from which think times are drawn for individual user threads that produce a Negative-Exponential distribution of time intervals between requests for the thread pool. Uniformly distributed think times are used in the example load test as well.

8. Closed Loop Feedback – A lower percentage of think time in the closed loop’s round-trip time (think time + response time) lessens the think time probability distribution’s ability to maintain the desired asynchronous user thread pool with time interval between requests CoV ~ 1.0. This is because response times are primarily composed of time in the SUT which can vary widely across the set of web events being tested. Applying data from the example load test discussed in number 9 to 12 below, a pie chart is created to show how large the response time percent of round trip time needs to be before a time interval between requests CoV > 1.0 is produced.

9. Example Load Test – The load test included with the web-generator-toolkit is used as the example for this talk. The toolkit documentation shows the practitioner how to set the load generator’s (JMeter is used) data collection options, what test run output reports contain, and which report provides the time interval between requests CoV information. Seven test runs are performed at increasing traffic levels using two-hundred user threads for all tests. Load is increased from test to test by reducing the mean think time of the Uniform distribution used. This testing setup is intended to answer the question; “How many user threads does it take for the time intervals between requests to become statistically independent from each other, CoV ~ 1.0, when each thread is drawing its think times from a Uniform distribution.” It also shows under what circumstances the CoV diverges, CoV > 1.0, as the user thread count increases.

10. Load Test Description and Results – Images are shown that contain a tabular list of web events being tested, a topological layout of the test environment, and a JMeter testing script snapshot. A summary of results table is also provided that includes think times used, request rate achieved, and response time observed for the seven tests performed. Network link latency is not a significant factor in any of the tests performed.

11. The Number of User Threads Required for CoV ~ 1.0 – Two tables and corresponding graphs are provided which show convergence of time interval between requests CoV ~ 1.0 as a function of thread count. The first table/graph combination illustrates convergence to CoV ~ 1.0 for the lowest traffic volume test with time interval between requests mean, standard-deviation, and CoV included in the table. The second table/graph pairing contains just the CoV values as a function of user thread count for all seven tests. This table and graph show a divergence away from CoV ~ 1.0 for the two highest traffic volume tests where response time is a large percent of round trip time.

12. Closed Loop Feedback Issues and Nuances – the divergent situation for the highest traffic volume test is analyzed in detail with a bar chart and table that show this test has the largest response time percentage of round trip time across all the tests as well as a “response time CoV” (as opposed to a time interval between requests CoV) that is significantly greater than 1.0. This large response time CoV indicates wide variability associated with time in the SUT across the set of web events being tested that is adding to the divergent situation. The last column of this table shows that the number of user threads can be computed by multiplying the request rate by the mean round trip time. This calculation can also be used to estimate number of real active users supported if the user’s web page rate is substituted for the load generator’s GET/POST request rate.

13. Summary – Some summary remarks and insights gained are listed including an observation that the divergent case, CoV > 1.0, raises an important question. How robust is the pure virtual user model so many load testing professionals rely on when request independence is such an important factor in the real world? Unfortunately, few practitioners are aware of the time interval between requests CoV and how significant it is for determining the quality of traffic produced by the load generator. Reporting time interval between requests CoV as a standard metric in load test results would add significant credibility to the effort and if the CoV ~ 1.0 the test results are much more likely to match live application performance.

About the Presenter
Worked for 40 years in the telecommunications and computer industries for GTE, Tandem Computers, Siemens, and am currently the Computer Capacity Planner for the State of Nevada. At GTE I applied my skills to both Data Center Capacity Planning and Digital Switching Traffic Capacity determination. While at Siemens I obtained EU and US patents for a traffic overload control mechanism used in an Advanced Intelligent Network platform and a VoIP switch. State of Nevada responsibilities include Capacity Planning for the IBM Mainframe, a large virtualized Unix environment, and the State’s data network. In addition, I load test new and existing web applications and am the author of several Computer Measurement Group (CMG) technical papers regarding load testing techniques and system performance data analysis methods. I hold BS and MS degrees in Industrial and Systems Engineering from The Ohio State University.

Event Timeslots (1)

Room 2 – 2/21