Performance vs. scalability

December 13th, 2011 Add Your Comments

With the growing interest over cloud computing in the last years, scalability has become one of the buzzwords of the day. However, with its increase in popularity, some of the confusions around it have also become more popular, probably the most common being the one according to which performance and scalability relate to the same thing. Technically speaking, performance is the speed with which a system completes a certain request and scalability refers to the number of requests a system can deliver in a unit of time and at a first glance it might seem that the two are different ways of expressing the same thing.

The reason for this mistake is the fact that we often think software problems in terms of single core or processor and thus it is very easy to consider that the more performance a system delivers, the more scalable it would be. For example let’s consider a server that can perform a given operation in 20 milliseconds. Using basic arithmetic, we can divide one second to the amount of time it takes a single operation to complete and come to the conclusion that the server is able to perform 50 operations per second. But that might not be entirely correct, the reason being the fact that if the server has multiple cores or processors, then it can perform multiple operations in the same time and thus it should be able to perform more that 50 operations per second.

Things become even more obvious on applications that are scaled horizontally. Such applications are deployed on multiple machines and usually have a load balancer in front of them to make sure that the requests are evenly distributed between all the machines in the system. Each individual request is handled by just one individual machine and thus the performance of the system is the performance of just one machine, but the scalability limit is the scalability limit of one machine multiplied by the total number of machines.

Usually, measuring a system’s performance is not a very complicated operation, since in most of the cases it resumes to finding the time it takes a given request to complete. Determining the scalability limit is more complicated and it also involves more computing power. It is done by performing simultaneous requests to the system and check the number of requests per time interval that it can serve. Since chances are that the machine from which the requests are performed is weaker than the system computing them, multiple request making machines must be used. In most of the cases the number of machines needed to perform the test cannot be predicted upfront and the best strategy is to progressively add request making machines until the total number of requests served for all of them does not increase anymore. The graph below illustrates this strategy.

Determining scalability limit

For those interested in the Windows Azure platform, some real life scalability limit tests on different aspects of the platform can be found on the Azure Scope page (read update). For those that want to run the benchmarks by themselves, this is also possible by downloading the benchmark application from here. Please note that the published results, at the moment I am writing this post, are probably a year old. In the meantime it seems that Microsoft has optimized the Azure platform since I was able to get better results by running some of the benchmarks myself.

Update: Unfortunately, the Azure Scope site is no longer online. Probably the main reason for that was the continuous evolvement of the Azure infrastructure and services, which meant that in order to keep the benchmarked results up to date, the tests should have been run periodically and their results updated on the site. Considering the complexity and the number of resources involved by some of the benchmarks, the easier decision was taken, which was to take the site down. The application for making own tests is still available though at the address mentioned above.

Comments are closed.