Asynchronous I/O has been present in the .Net Framework from version 1.0, through the Begin/End methods, but it’s only in the recent years when it became a popular topic. There were probably two main reasons for that. One of them was the raise in popularity of node.js which has continuously advertised the event-driven, non-blocking IO … Continue reading
Choosing between Windows Azure SQL Database and Windows Azure VM hosted SQL Server
Recently, Windows Azure has entered in the Infrastructure as a Service (IaaS) territory by announcing Windows Azure Virtual Machines (VMs), currently in public preview. Along with VMs came the possibility to host SQL Server databases directly inside Azure cloud.
Previously, the only option for relational databases inside Windows Azure was represented by Windows Azure SQL Databases (former SQL Azure Databases), a Platform as a Service (PaaS) implementation based on SQL Server, which sacrifices some of the features in exchange for a much simplified management. Now, with the database that inspired it in public preview and probably soon to get into general availability, a new challenge arises: what to choose for storing relational data between SQL Server databases hosted inside Windows Azure VMs and Windows Azure SQL Databases?
PaaS vs IaaS
Windows Azure SQL Databases are implemented as PaaS, taking care by themselves on as much as possible of the management and configuration. SQL Databases automatically allocate physical resources, automatically handle size growth, automatically ensure high availability and data durability and automatically handle system updates.
For each SQL Database instance, behind the scenes, there are always two copies, one of them transactionally consistent with the live instance. This means that the possibility to lose any data due to hardware failure is near zero and also ensures a very high availability (when there is a problem with the live instance, the transactionally consistent copy automatically replaces it behind the scenes). In fact the SQL Database high availability feature is guaranteed in the SLA with 99.9% up time.
Things are different with SQL Server databases hosted on Windows Azure VMs. VMs are IaaS products, meaning that you are supplied with the environment into which the software is being run but as the software itself is concerned (OS, DB server) you are responsible for all of the management and configuration. Using SQL Server inside Azure is pretty much identical with using it on premises, except that it will be deployed on a virtual machine and hosted in the Windows Azure data centers.
Just like on premises, when using SQL Server you will not only be responsible in configuring the database server but also the OS that hosts it. On the OS side configurations will involve things like managing updates and configuring the firewall. On the database server they will involve things like ensuring disk space for the database, ensure high availability, backups etc.
While SQL Server involves more manual management compared Windows Azure SQL Databases, it also provides more flexibility and a richer feature set. A detailed list with the features missing from SQL Databases can be found here.
Some of the missing features are a result of the PaaS nature of Windows Azure SQL Databases. Others, like full-text search, just didn’t make it to the current version but will be added in the future releases.
Before deciding to go with Windows Azure SQL Databases you should first check that they offer all the features that you need. There are cases when features are not delivered out of the box, but can be replaced by alternative solutions. For example you can supply the lack of Jobs with a custom implementation using a Worker Role, or you can supply the lack of backup / restore feature by using the SQL Azure Migration Wizard tool. If alternatives are not existent for the needed features, or if they are too costly / complicated to implement, then you should probably go with SQL Server.
Compatibility with existing on premises databases
While based on SQL Server, Windows Azure SQL Databases only support a subset of its functionalities (a rather large one thought). Besides the missing feature set mentioned above, there is also a list of unsupported T-SQL statements, detailed here.
If you have an existent on premises SQL Server database and you want to move it into the cloud, porting to Windows Azure SQL Databases might be a tedious job, based on the amount of unsupported features that the local database might be using. On the other hand, moving to Windows Azure VM hosted SQL Server is going to be a minimal effort operation.
Performance and performance predictability
When it comes to performance, Windows Azure SQL Databases are not fantastic. Chances are that an SQL Server deployed to a modern laptop will outperform them. That being said, this is not surprising considering their pricing level and the fact that they are built with reliability rather than performance in mind. A very effective way to improve the throughput of SQL Databases is by scaling-out through a feature called Federations (I will come to it a bit later).
The rather low level of performance is not the only problem for Windows Azure SQL Databases. They are deployed in a shared environment, which means that each database instance will share the server on which it is hosted with multiple other instances. Windows Azure will always make sure that all instances are perfectly isolated at the data level and also manage the amount of physical resources that each database is receiving. Part of those resources are allocated dynamically, meaning that if there is little activity among the other instances on the server, your instance will receive a bigger computing power. However, if there is intense activity on multiple instances on the server, the amount of computing power for your instance will be lower.
Windows Azure will always try to ensure that each SQL Database is receiving a fair amount of resources at each time, but matter of fact is that the overall performance of the database will vary within a certain interval from time to time.
In order to protect the other instances on the server from an instance trying to monopolize a big amount of the available resources, Windows Azure uses throttling (more details here). This means that if you perform resource intensive operations on a database, chances are that they might get throttled. Due to throttling and other transient problems that might appear because of the shared and PaaS nature of SQL Databases, it is recommended to implement retry logic with any database operation as described here.
Things are much more predictable on performance side when it comes Windows Azure VM deployed SQL Server databases. The amount of resources available to the server is defined by the size of the virtual machine (available sizes here). The only type of resource that might be less predictable is represented by the disks of the virtual machine (overviewed here). The disks are built on top of Windows Azure Blobs, providing a very good amount of reliability (there will always be at least two copies of the data) but they are also a shared resource and chances are that there will be some variations in their delivered performance.
Windows Azure VM deployed SQL Server databases will have better performance than Windows Azure SQL Databases, but keep in mind that you will still not be able to reach the performance of powerful physical machines. So you will get a good amount of performance, but only up to a certain level. While I have played a bit with Windows Azure VMs and Windows Azure VM hosted SQL Server databases, I didn’t yet have the time to run any benchmarks on them so at the moment I don’t have any numbers related to their throughput. However, you can get an overview of the performance to expect from Windows Azure SQL Databases from a series of benchmarks on single instance and federated databases which I have run a few months ago. The results are published here (please note that at the time Windows Azure SQL Databases were named SQL Azure databases).
As a conclusion to this section, if you need a good amount of predictable performance from a single database instance, Windows Azure VM hosted SQL Server databases would be the best choice for you.
This is one of the sections where Windows Azure SQL Databases gain the upper hand through a feature called Federations, also known as sharding, which allows to easily create additional instances (scale-out) to accommodate an increased demand over the database’s resources.
At the core of Federations is the SPLIT operation (very well explained here), which allows dividing an existing database instance in two new instances, distributing the initial’s instance data among them. The greatest thing about SPLIT is that it is an online operation (the data is still available for almost all of the time while the operation is being performed).
Currently, in a Windows Azure SQL Database you can have up to 500 instances which should provide a very good level of scalability. On the down side, in the modern world of IT we are often seeking for elasticity rather scalability. Elasticity involves that you can scale up or out when the demand is high and scale down when the demand is low, in order to cut down the costs. The rather great SPLIT command is only able to provide the scale out part. For scaling down a MERGE command (I have described and advocated for it here) would be needed, but unfortunately such a command is not yet available. There are alternatives for performing it, but they involve some deal of manual work and down time on the instances on which it will be performed.
Another potential downside is that while horizontal scalability is usually preferred over the vertical one due to the theoretical lack of limits and lower costs, it is also harder to implement at application level. It will always be easier to throw more resources into a machine, rather than designing the code to split the work between multiple machines.
SQL Server is great when it comes to scaling up. It is designed to squeeze the last drop of performance from the most powerful servers available today. When deployed inside Windows Azure, the maximum amount of computing power that it will receive will be the one corresponding to the most powerful VM (currently the Extra Large instance, which 8 cores and 14 GB of RAM), which might be more than enough for many of today’s applications.
Besides scaling up, SQL Server also comes with a few scale-out options (like scalable shared databases, peer-to-peer replication, AlwaysOn). All of them require some manual setup and usually they will result in a system that has a master database for read and write operations and some secondary ones used only for read operations.
So, if a very big amount of scalability is key to your application, federated Windows Azure SQL Databases should be the answer. If your scalability needs are not that high and you rather prefer the implementation ease of scaling-up, then SQL Server is the way to go.
Backups and reliability
As far as reliability is concerned, Windows Azure SQL Databases are in a very good position. As mentioned above, for each database, at any time, there are two copies, one of them being transactionally consistent with the live one. This means that the chance of losing data due to hardware failure is close to zero. It also means that high availability is an out of the box feature (in case of failure on the live DB, the transactionally consistent copy will automatically replace it), guaranteed by a 99.9% uptime in the SLA.
On Windows Azure VM hosted SQL Server things are not very bad either. The main reason for that is that Windows Azure disks are based on the Windows Azure Blob Storage, which means that underneath they also implement data redundancy (two copies at any time), which means that the risk of losing data due to disk failure is close to zero. What’s more, the Windows Azure storage offers the geo-redundant storage option, which will involve that data is going to be replicated in more than one data center.
On the high availability side, there is no out of box the implementation for Windows Azure VM hosted SQL Server databases. The VM itself has a 99.9% up time SLA, but that will only cover the machine itself, not the SQL Server database. You can of course manually implement high availability by using the AlwaysOn feature, but this will involve setting and paying for an additional machine.
Hardware failures are not the only reason for data loss. Often it is the case of human or application error and the only guard against those kinds of incidents is provided by backups. This is where things get a bit complicated on the Windows Azure SQL Databases’ side. At the moment there is no straight forward mechanism of performing backups and things get even more complicated with federated databases. I have recently blogged on the ways to backup such databases (article available here). Things will get much easier when the announced but not yet released point in time restore functionality will get into production.
On SQL Server side doing automatic backups is a process that involves some manual configuration, but considering that SQL Server has been with us for so many years it something that most of the people know how to deal with. Every backup functionality available for on premises servers will also be available for Windows Azure VM hosted SQL Server databases.
So, as this section is concerned, the verdict is mixed. Windows Azure SQL Databases come with out of the box durability and high availability, but, at least for the moment, the backup process is more straight forward on Windows Azure VM hosted SQL Server databases.
One of the main goals of Windows Azure SQL Databases is simplicity. This is also reflected in the pricing model. You are only paying based on the size of your database. The amount of operations that you are performing is not taking into account (though, if you perform very resource intensive operations, you might get throttled). Also the redundancy (2 copies of the DB at any time) is an out of the box free feature. An overview of the pricing model is available here and a pricing calculator here.
On Windows Azure VM hosted SQL Server databases things are a bit more complicated. You are being charged for the VM hosting the database, then for the SQL Server itself, then for the storage disk used for the database and then for the number of transactions performed on that disk. Luckily, the last two items are quite low on price and the storage also comes with out of the box redundancy.
As pricing levels are concerned, on SQL Server it really matters if you are going for the Web or Standard edition. The Standard one is about twelve times more expensive (rates available here).
A bit challenging is when choosing between SQL Databases and SQL Server Web edition. On the small sized databases, the first ones are going to be cheaper (prices start at $5 / month for a 100MB database), but as the size is increasing, a Windows Azure VM hosted SQL Server Web edition database is going to be a better choice. For example a 150 GB SQL Database will cost about $225 / month, while a SQL Server deployed on Small Instance VM will only cost about $135 / month (storage and storage transactions price included).
In a nutshell, if you are after high scalability, easy management, high availability and predictable cost model, then Windows Azure SQL Databases are probably the best choice for you. If instead, you are after full compatibility with existing on premises applications, all the tools and functionalities you are familiar with from SQL Server, easy to achieve (but just up to a point) vertical scalability and performance reliability then Windows Azure VM hosted SQL Server databases are the right choice for you.
To sum up, below I have outlined the pros (+) and cons (-) for each of the two options.
Windows Azure SQL Databases
+ highly scalable through the Federations feature
+ durability and high availability out of the box
+ easy management
+ predictable pricing model
– not straight forward backup procedure (at least for the moment)
– performance variations
– the performance of a single instance
– missing features from SQL Server
Windows Azure VM hosted SQL Server databases
+ all the features and functionalities from SQL Server
+ easy to achieve (but only up to a point) vertical scalability
+ fully compatible with existing on premises applications
+ predictable performance
– involves manual management (though it gives access to a wider range of features)
– limited horizontal scalability
– performance not as high as with modern dedicated servers
– expensive Standard edition (but quite affordable Web edition)