Benchmarking throughput and scalability on Sql Azure Federations

January 25th, 2012 Add Your Comments

Recently I have developed a benchmark for testing the number of queries per second that Sql Azure with its recently launched Federations feature can provide. In this article I will present the methodology I have used for benchmarking, the benchmark results as well as a few personal considerations.

Purpose of the benchmark and disclaimer

There were two main reasons for creating the benchmark whose results are being presented in this article. The first one was that I was curious to see the federations in action and get a firsthand experience on how well they can scale the load on a database.

My second reason was to gather data that would help approximate the amount of resources needed to support different load amounts. As a solution provider you are often asked questions like “what would it take to make our application support X number of users per day?” and while the Azure platform is all about elasticity and about being able to add or remove resources as needed, it would still be nice to be able to provide a very rough approximate figure on the infrastructure an application would need for a requested load.

That being said, please bear in mind that the results published in this article are not intended to be used as a reference but rather as informative figures. Use them only to get an overview. For any critical application make sure you are properly researching and testing before proposing and implementing a solution.

Benchmark and test methodology

The benchmark measures the queries per second (qps) that a database can provide. All queries are represented by a single select operation, selecting just one row for the database. This is in slight contrast with some other benchmarks that are measuring the number of items that a database can read per second, but for that they are retrieving multiple rows in each select operation.

Below is a representation of the infrastructure I have been using:

Benchmark infrastructure

The test database contained only one table, with the structure below:

Column name Column type
Id int primary key
FirstName nvarchar(100)
LastName nvarchar(100)
Age int
Comments nvarchar(max)

The table only contained one index which was the default one created by Sql Azure on the primary key. The table was populated with 500k (500000) rows. The Comments field was filled with a 1024 characters text meaning that the rough size for each row is a just above 2KB.

During the tests the database was split in a variable number of federations which will be detailed when presenting the benchmark results. Sql Azure web edition databases have been used during the tests, all of them with the size limited to 1GB, the smallest size you can purchase, which at the time I am writing this article costs only 9.99$ / month, as described here.

The benchmark execution instances were all part of a single Windows Azure Web Role. For the tests presented in this article only small instances machines have been used, however their number has been increased or decreased during the various benchmark runs.

Each execution instance had the job to execute 100k (100000) select queries and return the time it took to complete the operation. Each select operation retrieved only one row which was selected by its ID, using the query below:

select id, firstName, lastName, age, comments from Person where id = @id

In order to avoid any sort of result caching from Sql Azure, each of the 100k queries was set to select a different row. Each execution instance had range of IDs from which it would make the selects (like 1…100000, 2…200000 etc.), so that at database level queries would be well distributed on the entire table.

Each execution instance used 25 threads to perform the 100k queries, with an equal number of queries distributed between all the threads.

All queries were being issued and their data retrieved by the means of the DataReader class. All the queries retrieved the entire data for each row (that is approximately 2KB), but that data was not deserialized, since that operation isn’t relevant in measuring a database’s throughput. Below is the code that has been used:

cmd = conn.CreateCommand();
cmd.CommandText = @"select id, firstName, lastName, age, comments from Person where id = @id";
cmd.Parameters.Add(new SqlParameter() { ParameterName = "@id", DbType = DbType.Int32, Value = id });
reader = cmd.ExecuteReader();
reader.Read();

result = reader["id"] as Nullable<int>;
return result;

The benchmark master had the role of running the benchmarks on multiple execution instances at the same time and compute the global result. The communication between benchmark master and the benchmark instances was being made through SOAP web services. On each benchmark run there would be a single call between the master and each of the involved machines, making the master very loosely coupled with the instances.

Considering this you might wonder why I needed the master as a machine inside Azure and not use my PC directly. The reason is because from a PC outside Azure you cannot call each of the instances individually. You can only access the Web Role as whole and let the balancer choose which machine would serve the request. Windows Azure is using a Round Robin balancer, which in theory would take care that an equal number of requests get to each machine, but in real life I found out that there are cases when the same machine would be called in two or more consecutive cases.

Since inside Azure, each machine has a local IP, the solution was to create benchmark execution master inside Azure and let it call each of the machines directly by using the local IP.

The benchmark execution master and all the execution instances were all part of the same Azure Web Role. The Web Role and the database were all part of the same Azure datacenter, the one for West Europe.

Benchmark results

This section presents the benchmark results. Please note that for each result displayed there were at least 3 benchmark runs, the displayed result being an average of each run.

 Not federated database

This first test was made on a standard Sql Azure database, without any federation. The database was made of a single table with the structure and number of rows as described above.

Benchmark infrastructure

 Federated database, 2 federations

This test was made on database with an identical structure as above, but with the test table split between two federations. The split was made at row 200001. The test involving a single machine was hitting a single federation, so it is not very relevant since it behaved much like the test on the not federated database. All the other tests had the number of queries equally distributed between the two federations, which in real life would be an ideal situation.

Benchmark infrastructure

Important performance note: My initial way of making the database calls was by creating a new SqlConnection object for each call. However, on a federated database, each time a connection is created, the federation selection also needs to be made, like in the code below:

federationCommand.CommandText = String.Format("USE FEDERATION PersonFederation (FederationId = {0}) WITH RESET, FILTERING = OFF", ID);
federationCommand.ExecuteNonQuery();

The problem is that the above command is quite costly, since in my first test I was only able to obtain 2338 queries per second using 8 executing instances. That is more than 2 times lower than a single testing instance using a not federated database! After that disappointing result, I decided to use a single SqlConnection object for each thread in each executing instance and the results got a lot better. All the results published in this article have been using this approach.

Cihan Biyikoglu, Program Manager in SQL Azure, has recently blogged about the way the USE FEDERATION command is working. As it turns out it is being handled by a layer outside the database, which has multiple nodes and is load balanced. Unfortunately, as my tests revealed, it would be best to reuse connections as often as possible to avoid reselecting the federation.

 Federated database, 4 federations

This test was made on the same database as above, but split in 4 federations. The splits were made at rows 100001, 200001 and 300001. All the tests had the number of queries equally distributed between the four federations, which in real life would be an ideal situation.

Benchmark infrastructure

 Federated database, 4 federations, simple selects

This test was made in identical conditions as the above, with the single difference that the select query didn’t include the Comments column (the one which was 2KB on each row):

select id, firstName, lastName, age from FederatedPerson where id = @id

Because the Comments field represented most of the size of each row, the returned data is now much smaller (somewhere at 60-70 bytes / row).

Benchmark infrastructure

Since each federation is in fact a standalone Sql Azure instance, as long as the queries only hit individual federations, a federated database should scale linearly with the number of federations. Considering this and the results above, we can estimate that a Sql Azure database with 10 federations would be able to easily deliver 100k (100000) queries per second (for simple queries like above). If each federation is smaller in size than 1GB, you can have a database delivering that power for 110$ / month (10$ required for the root database).

Some thoughts on the benchmark results

As mentioned above, each benchmark was run for at least 3 times and the published result is an average for all the runs. During the individual runs I noticed differences in the results as high as 50%, though usually they stayed below 15%. The reason for that is that, as in any virtual infrastructure, the performance of each machine is influenced by the overall load on the physical machine on which it is hosted.

As Web or Worker role machines are concerned, processor, memory and bandwidth are allocated in advance and they should perform quite predictably. When it comes to Sql Azure instances though, things become a bit more complicated, as those instances are in fact running on shared database servers. In this case there is no predefined resource quota that each instance will be using on a server, so from time to time each individual instance might get affected by the current load on all the other instances with which it is sharing the same physical machine. However, I expect the variations in performance to be between average levels.

5 comments to “Benchmarking throughput and scalability on Sql Azure Federations”

  1. Craig January 8, 2013

    This is really fantastic work. It was something I was just wondering about yesterday as my software runs on Azure with SqlAzure, currently I only have half a dozen concurrent users but could scale up to hundreds soon and interested in the queries per second.

    | Reply
  2. Eric April 12, 2013

    The performance with 4 instances (‘clients’) doesn’t change much from federating. With 1, 2 and 4 federations the numbers go from 9.8k, 10.9k and then to 10.8k respectively, which suspects you haven’t been testing performance of federations but your ability to submit queries to the federated database. The latency of individual statements executed sequentially determine your benchmark result, rather than the processing time at the server side. The question you should have been trying to answer is how many concurrent requests you could give the server to process before latency goes up significantly. I’d expect to see the numbers doubling – or at least multiplied by some number between 1 and 2 – for a fixed number of instances (1) when the number of federations doubles. Unless your clients have been limited by network throughput and/or CPU, what use was there to put more than one client firing requests asynchronously and in parallel?

    So in fact these benchmark results are useless imho.

    | Reply
    • Florin Dumitrescu April 13, 2013

      @Eric, the fact that the performance with 4 clients is roughly the same with for 1, 2 and 4 federations means that the clients are not able to generate enough load to stress to the maximum none of the database configurations. In other words, the scalability limit was reached on the client side rather than on the server. 8 clients are enough to stress to the maximum the configuration with just one federation and because of the the difference in throughput between 1 and 2 federations is obvious.

      I am not measuring the client side latency but rather the number of queries per second the different database setups can provide (procedure detailed above). The results are clearly dependent on the number of clients used. For finding the maximum throughput of the server you usually add clients until the results become flat. Because of the complexity of the benchmark procedure I didn’t do that. Judging by the relative growth I would say that the maximum throughput for 1 federation is around 12k qps and for four federations at around 26k qps.

      I don’t agree that the results are useless. They provide a very rough estimate for the throughput to expect from Azure SQL databases with 1, 2 and 4 federations. One thing to keep in mind though is that due to the shared nature of Azure SQL databases, there are significant variations in the delivered performance (more details here). So you will never be able to get exact scalability limits.

      If readers like you will continue to express an increased interest on finding more precise scalability limits I might repeat the tests with a bigger amount of client instances and take measurements until the results become flat.

      | Reply
  3. Eric April 20, 2013

    Try making your code in .NET 4.5 with async/await and run a single client with increased statements/second to the point where you get timeouts. That’s cheaper and easier to test than to increase the number of instances running and you *will* see performance go up when the number of federations increases – as you should. In the graphics above the numbers are just random numbers between zero and max performance that depend on the ability of your code to generate load on the server. That the bars go up when there are more instances is by chance of the fact that the load each instance is mostly independent. Yet with 16 instances you select 1.6M records so there’s huge caching involved there, worsened if you select aselect on each instance, since then there will remain a percentage of records that never get retrieved, meaning others were requested more often, increasing caching efficiency even further. The fact that performance varies wildly is not (only) because SQL Azure is on shared servers, but (mostly) because the tests are very short, especially the ones that perform better. At 40qps the test takes 2.5 seconds and you report queries per second based on 2.5 samples. Yes they contain 100k queries, but if that’d make them statistically relevant then you’d see stable results. Wrong shape of my argument but still very true. Moreover, starting and ending 25 threads won’t happen instantly, so you’re bound to report lower than average performance just because the average number of threads executing commands during the test will be less than 25. You’d have to start measuring a time and a count from the moment 25 threads have warmed up until a moment early enough to be sure they aren’t already cooling down again. So yes, those results are statistically irrelevant, but that doesn’t make the article useless) Thanks for it, I learned something, Eric

    | Reply
    • Florin Dumitrescu April 20, 2013

      @Eric, there is nothing “random”, “by chance” or “statistically irrelevant” related to my tests and their results. There are some valid points in your comments and I appreciate the fact that you shared them, but I don’t agree with your radical conclusions.

      It’s true that getting data from the database using an async I/O approach is more efficient than using threads and it should have allowed more load to be generated with less benchmark execution instances, but that doesn’t mean that only one such machine would have been able to generate the entire load a federated Azure SQL server can deliver. I would have taken an async approach if .Net 4.5 would have been available at the moment I made the tests.

      There is no chance that my results were affected by caching. Each of the 100k selects performed by each of the benchmarking instances retrieved a different row from the database. Also, different benchmarking instances retrieved data from different areas in the database. In the test where 16 benchmarking instances were involved, in 1.6 million calls no row was retrieved for more than 4 times and not on consecutive calls. The distribution of selects over the database was very good.

      None of the tests were very short. The one that managed to obtain in excess of 40k qps involved a total of 1.6 million select operations. This means that it took close to 40 seconds to perform (not 2.5). I agree that threads and instances didn’t start or end at the same time and this had an impact on the end results, but since none of the tests was very short, that impact was quite low.

      | Reply

Add comment

Mandatory fields are marked as *

Comment Form
  1.  
  2. You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


*

Send Comment