Azure SQL Data Sync, currently in Preview, can help a business solve many problems with syncing data to Azure. This may seem self-explanatory but there are some limitations and pitfalls you should be aware of, so let’s get right to it.
What is Azure SQL Data Sync?
Azure SQL Data Sync is a sync framework that allows businesses or individuals to sync data from one or many disparate SQL data sources – either on-premises or other Azure SQL Databases – to a single Azure SQL Database (the Hub). The data can be synced manually or setup on a schedule to sync automatically anywhere from once every five minutes to once a month. They sync job can also sync data from the Hub to the other on-premises or Azure SQL databases.
One example may be a central office that collects data that needs to be distributed to different regional databases. The central office can sync to the Azure SQL database which in turn can sync the data to the regional databases, keeping them all as close to real-time sync as possible.
The Setup Process
The sync job is able to copy data to or from one or more “Reference Databases” using a hub/spoke topology. The Azure database is the hub, and the one or more reference databases are the spokes, so to speak. Here are the basic steps to get a successful sync set up:
- Install the Azure SQL Data Sync Client Agent on the machine or machines that have access to the reference databases that need to be synced. During the installation you will give this tool the username and password that it will use to register your reference database with Azure. This account must have “log on as a service” rights in Azure and read/write access to your reference database(s). It will create a few tables of its own in the reference database(s) that are used during the sync process.
- Setup a Sync Agent in Azure.
- Get a key from the Azure Sync Agent to paste into the Sync Client Agent so that the client can register your reference database(s) with Azure.
- Register your reference database(s) through the Sync Client Agent interface.
- In Azure, create a Sync Group that will define what your Hub database and Spoke (Reference) databases will be.
- Setup Sync Rules for the group.
- This is where you choose:
- The schema that the sync framework will use as its definition for the sync process to Azure. Note that your Azure SQL database does not have to have any schema at this point. The sync job will set that up for you.
- Which tables and columns will be synced.
- What filters you want to use in the sync process (e.g. “where Age >= 18”).
- To do this you will be asked to choose which database to use for schema information.
- This is where you choose:
- Either manually sync your data or setup the interval for automatic sync.
So, overall it is a fairly simple process; however, there are some limitations and at least one major problem that you need to be aware of.
The Azure SQL Sync team has published several limitations that you need to be aware of before setting up a sync job. There are many, but I’ve done the work for you and listed the most common issues:
- The maximum number of endpoints across all sync groups is 30.
- The maximum character length of any database, table, schema, or column is 50 characters.
- There can be no more than 500 tables across all databases included in the sync group.
- There can be no more than 1000 columns in a single table in a sync group.
- The max data row size on any table in the sync group is 24Mb.
- If more than one database in the sync group already contains data before the initial sync, then every row will be treated as a conflict during the original sync and will require resolution, which on very large data sets could take from hours to months according to Microsoft’s documentation.
So, suffice it to say that Azure SQL Data Sync is not for really big data.
Two Annoying Problems
You may be thinking that, as far as you know, none of your databases exceed the limitations above but that to find out for sure you’ll just try to set up the sync rules, and let Azure tell you if there are any problems.
Read carefully: IT WILL NOT. That’s right. It fails silently with a little spinner just spinning away until your session times out. When you come back to the sync rules page, you will simply see the same message that you saw at the beginning explaining that you have to define your schema in order to setup sync rules.
As a workaround, to get to the error messages, I went into SQL Server Management Studio, right-clicked on the reference DB that I was going to use to define the sync schema, and clicked on the option to deploy to Azure. In this process you give a new database name, and the process will kick off to generate a schema and deploy it to Azure. In this case, if there are any errors, SSMS will show them to you. I hope this little tip saves you hours of frustration.
When the sync fails there are two places that you may find the actual error message that occurred–IF you are given an error message. If one place doesn’t have the message, check the other place.
Conclusion and Documentation
So, Azure SQL Data Sync (Preview) can be a great way to sync data – even from multiple data centers – into a single spot in Azure if you have a business need to do so. It is fairly simple to set up; however, there are some limitations and problems to be aware of before going through the process. Microsoft has documented the process pretty thoroughly here and here, which I used when setting up my first Azure SQL Data Sync job. Questions? Find me on Twitter at @jagraham or with the hashtag #ComposerJason.