Tests and thoughts on asynchronous IO vs. multithreading

May 13th, 2013 Add Your Comments

Asynchronous I/O has been present in the .Net Framework from version 1.0, through the Begin/End methods, but it’s only in the recent years when it became a popular topic. There were probably two main reasons for that. One of them was the raise in popularity of node.js which has continuously advertised the event-driven, non-blocking IO model on which it is built. Another reason was the addition of the async and await pair in the .Net Framework 4.5 which greatly simplified asynchronous programming.

Introduction

Input and output (I/O) operations are extremely slow compared to CPU processing and they include reading or writing data from hard drives, accessing network resources, calling web services or retrieving data from databases. When performed synchronously, I/O operations block the calling thread until the operation is finished. When performed asynchronously, they release the calling thread as soon as they are started, terminating or making it available for other operations and then, when the I/O operation completes, they continue the execution by running a callback method (or by running the code after the awaited method in .Net 4.5).

For server applications, having threads that are just waiting for an I/O operation to complete is not very efficient. That is because threads are resource consuming, no matter if they are in a waiting state or if they are active. Each thread uses an amount of system memory. Also each thread uses a portion of CPU. When waiting threads do that, they are just wasting CPU time that would otherwise be used by threads that have actual CPU work to perform. Each thread has a context associated with it. Switching the execution from one thread to another also involves switching the context and this adds an overhead to the overall application execution. The more threads are used, the bigger the context switching overhead is.

Asynchronous I/O doesn’t use threads while the I/O operations are performed. This reduces the number of concurrent threads used by an application by removing the ones that are in a waiting state, which theoretically increases the application’s scalability. In order to make sure that is actually the case, I created a series of tests which are presented below. Those not interested in the details of each individual test can jump directly to the conclusions section at the end of the article.

Tests

Tests were performed on my 64 bit Windows 7 PC, with a dual core i5 M560 processor, 4 GB or RAM and a 7200 rpm hard disk drive. While there are multiple types of I/O operations, during the tests I focused only on HTTP requests. Most of the conclusions apply to other kind operations as well. The test are implemented using C# and the .Net Framework 4.5.

 Huge number of very short calls using multithreading

The first test consisted in measuring the amount of time taken to perform one million HTTP requests using a multithreaded approach. All requests were made to a static text file hosted on the local IIS server. The file was very small, only containing the “Hello world!” text and each individual request roughly took one millisecond to perform.

The multithreading behavior was implemented using Task.Run(), which is built on the .Net thread pool. The HTTP calls were made using the HttpWebRequest class. The most relevant part of the code is listed below:

public void TestParallel2()
{
	this.TestInit();
	for (int i = 0; i < NUMBER_OF_REQUESTS; i++)
	{
		Task.Run(() =>
		{
			try
			{
				this.PerformWebRequestGet();
				Interlocked.Increment(ref _successfulCalls);
			}
			catch (Exception ex)
			{
				Interlocked.Increment(ref _failedCalls);
			}

			lock (_syncLock)
			{
				_itemsLeft--;
				if (_itemsLeft == 0)
				{
					_utcEndTime = DateTime.UtcNow;
					this.DisplayTestResults();
				}
			}
		});
	}
}

private void PerformWebRequestGet()
{ 
	HttpWebRequest request = null;
	HttpWebResponse response = null;

	try
	{
		request = (HttpWebRequest)WebRequest.Create(URL);
		request.Method = "GET";
		request.KeepAlive = true;
		response = (HttpWebResponse)request.GetResponse();
	}
	finally
	{
		if (response != null) response.Close();
	}
}

On average, the above test took around 57 seconds to execute, used around 40% of the CPU and 50 MB of memory. Memory and CPU were measured using the Windows Task Manager. Execution time was measured by the test application. All the HTTP requests performed during the test were successful.

It must be said that because each request was so fast, the .Net thread pool didn’t the time to create too many threads for the running the test. The Windows Task Manager showed a total of around 22 threads being used by the test application.

 Huge number of very short calls using asynchronous I/O

For the second test I tried to perform the same one million HTTP requests on the same static text file hosted on the local IIS, but this time the requests were made in an asynchronous I/O fashion. The most relevant part of the code is listed below:

public async void TestAsync()
{
	this.TestInit();
	HttpClient httpClient = new HttpClient();

	for (int i = 0; i < NUMBER_OF_REQUESTS; i++)
	{
		ProcessUrlAsync(httpClient);
	}
}

private async void ProcessUrlAsync(HttpClient httpClient)
{
	HttpResponseMessage httpResponse = null;

	try
	{
		httpResponse = await httpClient.GetAsync(URL);
		
		Interlocked.Increment(ref _successfulCalls);
	}
	catch (Exception ex)
	{
		Interlocked.Increment(ref _failedCalls);
	}
	finally
	{ 
		if(httpResponse != null) httpResponse.Dispose();
	}

	lock (_syncLock)
	{
		_itemsLeft--;
		if (_itemsLeft == 0)
		{
			_utcEndTime = DateTime.UtcNow;
			this.DisplayTestResults();
		}
	}
}

The first time I ran the test things didn’t go well. After more than 5 minutes of waiting it didn’t produce any result. Since I wanted to get a reading, I sopped the test and reduced the total number of requests from 1 million to 200 thousands. That made the test finish in about 2 minutes, but with more than 3000 of the requests failing. After some digging I found out that the failing requests didn’t even get to the IIS server and they failed inside the test application with the following exception:

An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

As it turned out, the application depleted the system from ephemeral ports (even though on Windows 7 there is a generous range for them, from 49152 to 65535). Unlike the multithread test, which was automatically throttled by the .Net thread pool, the asynchronous implementation from above didn’t have anything to limit the number of concurrent HTTP requests and when there were more concurrent requests than the number of available ephemeral ports, the application started to throw exceptions.

The only way to limit concurrency when performing asynchronous HTTP requests from a .Net application is by setting the ServicePointManager.DefaultConnectionLimit (either by code or configuration file). This limits the number of simultaneous connections that can be created for a host.

So I set the DefaultConnectionLimit to 100 and repeated the asynchronous test for 1 million requests. This time the test completed in about 3 minutes with no failed request. The average CPU usage during the test was about 50% and memory usage was around 300 MB. The results were a lot worse compared to those from the multithreaded test.

As you may have noticed, during the async test, I made the HTTP requests using the HttpClient class. Introduced in .Net 4.5, it only offers asynchronous methods and a simplified interface compared to HttpWebRequest. Behind the scenes, HttpClient is built on HttpWebRequest and chances are that it adds a bit of overhead to the execution time. So I decided to perform a second asynchronous test, this time based on HttpWebRequest.

Results were mixed. It took around 1 minute and 45 seconds to execute for 1 million requests, worse than the multithread approach, but better than the async one based on HttpClient. The unpleasant surprise was that during the test the memory usage continuously grew until it peaked at 1.6 GB.

All in all, for very fast I/O operations multithreading proved superior to asynchronous I/O.

 Big number of long calls using multithreading

For a second series of tests I have replaced the very short HTTP calls with some more consistent ones implemented using ASP .Net MVC Web API and which were programmed to take 0.5 seconds to execute. That is about 500 times longer than in the previous tests.

Since each request was much more consistent, I decreased the total number of requests to perform from 1 million to 5000 and started the first test with the same client as in the previous multithreaded tests. That didn’t go smoothly and showed a couple of problems.

The first one was related to IIS. Since my machine was running on Windows 7 Professional, it had a limit of 10 concurrent IIS requests. That was not a problem with the very fast requests from the previous tests, but with half a second ones IIS was only able to deliver around 20 requests per second. That made the test irrelevant, because the server was the limiting factor rather than the client.

Luckily, IIS Express, which is the web server bundled with Visual Studio 2012, doesn’t seem to have any limit related to the number of concurrent requests so simply switching from the Windows 7 IIS to the Visual Studio 2012 IIS Express solved the server side problem.

The second problem was on the client side. The multithreading behavior was implemented using Task.Run(), which in turn is based on the .Net thread pool. When an application is started, the thread pool only has a limited number of threads available. If the application pool receives more tasks than available threads, it starts queuing the excess ones. Additionally, based on various internal algorithms, it can also increase its number of available threads, but that is a slow process and it meant that for most of my test the degree of thread parallelism was quite low. Again, that wasn’t much of a problem when performing the very short HTTP requests from the first tests, but for the long lasting ones I needed a much bigger degree of concurrency.

So I decided to dump Task.Run() for a custom implemented parallel execution engine with limited concurrency. The engine manually created its threads directly with the .Net Thread class rather than relying on the thread pool.

With the above modifications done, I restarted testing in two configurations. For the first one I limited the execution engine to a maximum of 300 concurrent threads. The test completed making the 5000 requests in about 10 seconds, using around 4% of the CPU and 30 MB of memory. The application was able to create the 300 threads very fast and maintained them alive until all requests were executed.

For the second test I increased the limit on the execution engine to 500 threads. The problem in doing that was that unlike the previous test, engine feeding was not fast enough to keep all threads alive from start to finish. Instead it continuously terminated and created new threads. That had a profound effect on the results. The test took around 14 seconds to execute, four more than previously, and the CPU usage jumped from 4% to 60%. This revealed a very important aspect. Threads, at least in the modern .Net and Windows implementations, are quite efficient while running. The memory usage is quite low and context switching seems very effective. The major problem comes from the cost of creating new threads.

 Big number of long calls using asynchronous I/O

This series of tests were executed against the same long running Web API method, but this time using the asynchronous I/O client previously implementation for Huge number of very short calls using asynchronous I/O.

The .Net Framework does not provide a built in mechanism for limiting the concurrency of asynchronous I/O operations, but for those involving HTTP requests this can be indirectly achieved by using the ServicePointManager.DefaultConnectionLimit and this is what I used in order to create similar conditions between the multithreading and asynchronous I/O tests.

With the DefaultConnectionLimit set to 300 the tester was able to perform the 5000 requests in about 10 seconds, using an average of 3% CPU and 30 MB of memory during its execution. The numbers were almost identical to those obtained using the multithreading approach. The only difference is that according to the Windows Task Manager, the asynchronous I/O tester used around 22 threads during its execution while the multithreaded one used around 310. So 15 times less threads managed to deliver the same overall execution time with the same CPU and memory usage.

However, an important aspect to consider about the above tests is that they involve almost exclusively I/O bound operations. There is very little CPU work inside them. In the multithreaded version, the 300 threads spent most of their execution time waiting for I/O operations to complete. In most real life applications, there are both I/O and CPU bound operations and having so many threads in a waiting state means that there is less execution time available for the CPU bound ones, which are those that actually need it.

For a second asynchronous I/O test on long running HTTP requests, I increased the DefaultConnectionLimit from 300 to 500. This allowed the tester to run the 500 requests in about 7 seconds, using around 3% CPU and around 30 MB of memory. The asynchronous tester was able to scale pretty well based on the increased concurrency provided by the higher DefaultConnectionLimit.

 Async I/O vs. multithreading on server side

I previously mentioned the long running HTTP requests were implemented server side using ASP .Net MVC Web API. As it also was the case on the client, the server could be implemented in both an asynchronous I/O manner and in a multithreaded one.

The asynchronous one, which was the one actually used during the tests, was implemented like below:

public async Task<string> Get()
{
	await Task.Delay(500);
	return "Hello world";
}

During each call, the request thread was quit immediately after the call to Task.Delay(). Then, as soon as the delay time expired, the continuation of the method was executed on the first available thread from the thread pool. With this implementation on the server, an asynchronous tester with 500 concurrent connections was able to perform 5000 requests in around 7 seconds.

A multithreaded approach for the server implementation can be achieved by blocking the request thread for a desired amount of time:

public string Get()
{
	System.Threading.Thread.Sleep(500);
	return "Hello world";
}

Running the asynchronous tester with 500 concurrent requests against the above server implementation produced highly variable results (depending on the current state of the server, more exactly on the number of threads available in its thread pool), all of them taking a lot longer than with the asynchronous server implementation and having an average execution time of more than 1 minute.

The problem with the Thread.Sleep() implementation was that it kept a thread busy during the entire execution of each request. This made the IIS Express server to quickly deplete all available threads from its pool. After doing that all incoming requests were queued and executed only when threads become available. IIS Express continuously increased the number of threads from its pool in an attempt to accommodate the high number of requests, but the increase was made in a slow gradual fashion, which made it ineffective for the given context.

Conclusions

Tasks involving asynchronous I/O operations have an overhead produced by the fact that the thread executing the task is quitted when the I/O operation is started and then its continuation is queued for execution as a callback method when the I/O operation completes. Depending on the host, the callback might get executed on the same thread that started the task or on a thread pool thread (this depends on synchronization contexts, more info available here). This overhead is a problem for very short I/O operations, which are more efficient when completed synchronously on the thread that started their execution.

When possible, it is better to execute a single callback for multiple asynchronous operations rather than executing an individual callback for each of them. This reduces the above mentioned overhead and it is very easy to implement in .Net 4.5 using await and Task.WhenAll (a detailed sample is available here).

One of the biggest advantages delivered by asynchronous I/O is that it reduces the number of concurrent threads used by an application. As proved in the above tests, threads are very efficient related to memory and context switching. In I/O bound applications they easily deliver the same performance as asynchronous I/O. However, in applications that run both I/O and CPU bound operations, having a lot of threads that are waiting for the I/O operations to complete means that there will be less CPU time for threads running CPU bound operations and which are the only ones that actually need it.

It is also very important to note that during the tests, multithreading solutions only proved to be efficient as long as the used threads were kept alive during the entire execution of the application. If threads were continuously created and destroyed they became very inefficient related to CPU usage and the overall performance of the application was visibly affected.

One area where the asynchronous I/O absolutely shines is when performing long or potentially long operations in environments where the total number of available threads is limited or slow to increase. This is the case of web applications. Information related to thread related limits in ASP .Net are available here and here. Running long I/O operations synchronously from inside ASP .Net applications can exhaust the thread pool and make an application freeze or slow to respond.

.Net does not provide a built in mechanism for limiting the parallelism on asynchronous I/O operations. As proved in the first tests, it is possible to start millions of operations concurrently, but that has a negative impact on the application’s performance and reliability. Depending on the server configuration and on the kind of I/O performed, it is usually safe to perform hundreds or even thousands of operations in parallel, but when going above that a mechanism should be implemented for limiting the asynchronous I/O concurrency. As far as HTTP requests are concerned, a limit should always be set to ServicePointManager.DefaultConnectionLimit. The limit should be large enough to allow a good level of parallelism, but low enough to prevent performance and reliability problems (from the exhaustion of ephemeral ports).

Source code

Complete source code for the tests used in this article can be found on GitHub.

Comments are closed.