With Availability Groups there has always been a common theme across the industry to offload your database backups to a secondary replica. It sounds great in theory. I know people that have done this. I also know people that “thought” they were doing this. But the question is, should you be doing this?
Backup Types available on Secondary Replicas
First, let’s talk about the backup types available on secondary replicase. You will have access to take copy-only full backups or regular log backups (no copy-only here). You do not have the ability to take differential backups, or concurrent backups. That last part meaning if full backups are running on one replica, you cannot simultaneously take log backups on a different replica. This can be a deal breaker for many if you’re using differential backups as well, as these must be taken on your primary replica, as well as your full backups. This is a dependency of the secondary database being read-only, and the backup job not having accessing to update the backup flag within the database.
Controlling where you backups run
When you configure Availability Groups, the wizard presents you with the option to choose where your backups will run. Personally, I wish Microsoft would remove this from the wizard, as I feel it adds a layer of confusion around how this actually works. The confusion comes in around how this setting is actually applied. One would think that if you set this value, your backups will always run on the replica of your choice. That is not true at all. Outside of the Microsoft Maintenance Plans for backups, the only way this setting is ever applied is if your backup jobs explicitly call the sys function sys.fn_hadr_backup_is_preferred_replica ( ‘dbname’ ) and takes action based on the result. I’ve seen it far too many times where someone says “Oh yeah, my backups are running on my secondary”, only to find out all they did was configure the setting in the Availability Group wizard, but they were using a custom backup job that made no reference to the backup preference function.
Log backups and Secondary Replicas
Another concept many people fail to realize is that regardless of what replica you take a log backup on, it becomes part of the log chain that will be required in the event you need to restore your transaction logs. This is due to log chains being persisted across all replicas in an Availability Group. This can be a blessing and a curse. The good news, if you take log backups on a secondary replicas, your primary replica becomes aware of them, and will be able to truncate its transaction log as those backups are completed on the secondary replica. In theory, this sounds great.
I once found a SQL Server Availability Group running a 3rd party backup tool. This tool performed various backup types based on the current state of the database and the backup set. The backup solution was taking full, diff and log backups on the primary replicas. However, there was a diff backup job that ran once a day on the secondary replica. Presummably, for the event the replica was ever to become the primary. Every night it would run, and the backup solution would see that it could not take a diff backup on a secondary, and would on the fly decide to take a log backup instead. Fast forward to the next time a restore was need. That random log backup that the diff backup job was taking on the secondary replica was now needed to fully complete the restore.
One might ask, “but what do integrity checks have to do with offloading backups to a secondary replica?”. Well, for starters, you should be taking your backups where ever you are running your integrity checks. Honestly, you should be taking integrity checks on all replicas, but that’s a topic for a different day. Let’s suppose for a moment you’re running integrity checks on your primary replica and taking backups on your secondary replica. Your databases are running for months, and corruption occurs on the secondary replica during that time. But you have no idea corruption has occurred there, because you’re only running integrity checks on your primary replicas. Then one day, someone drops a table in production. The database is in an Availability Group, so that table is now gone on all replicas. You’re only option to get it back is to restore that corrupted backup from your secondary replica.
This by itself does not mean you shouldn’t run backups on your secondary replicas. This just means you need to be aware that if you do run them there, you must make sure you are are also running integrity checks there.
Backup Preference Options
Ok, so let’s assume you’ve taken everything above into account and you’re referencing sys.fn_hadr_backup_is_preferred_replica ( ‘dbname’ ) in your backup jobs. Which backup preference do you choose? Your options are Prefer Secondary, Secondary Only, Primary and Any Replica. Most of these are pretty self-explanatory. The one I really want to focus in on here is Secondary Only. This option can be truly risky. Where as Prefer Secondary can fall back to the primary replicas if no secondary replica is available to take a backup, with this option, your backups will simply not run if there is no secondary replica available to run a backup. If you still choose to run backups on a secondary replica, I would avoid this option entirely.
So, should you take backups on your secondary replicas?
My opinion is No! If the reasons above were not enough to convince you of the same, let me add some additional context. What’s the number one way to loose your job as a DBA? Data loss! So, let’s say your running database backups on your secondary replicas. There are requirements to be met before a backup can run on a secondary replica. The secondary must be able to communicate with the primary replica and the database must be in a SYNCHRONIZED or SYNCHRONIZING state. If that condition is not true, your backup will not run. Add any latency occurring with your data synchronization, and you must also add that to your RPO. If replication is behind by 5 minutes, and you restore a backup from 15 minutes ago, expecting to meet a 15 minute RPO, you may not realize that you’re really only restoring data from 20 minutes ago.
Oh, right now is probably a good time to call out that synchronization can still fall behind, even when replicas are in synchronous commit! Be aware of this, and do not just assume you are 100% protected by using synchronous commit.
Worse, say your full backups are running over night, once a week, at the same time maintenance is occurring. After-all, we typically perform maintenance on our secondary replicas, thinking there is no impact to production. For the past three weeks, a combination of Windows and SQL Server patches were applied during maintenance windows, and the replica was rebooted while the backups were running. You only keep two weeks of backups online, and it’ll take some time to get older backups from offsite storage. I’ll just say, that’s not a situation I would want to be in.
Bottom line, if you want to confirm you have valid backups, you must take them on your primary replica and test them often. If your primary replica does not have enough resources to run backups concurrently with an active workload, you have a resource issue, not a backup preference issue.