Recovery and Backup for Beginners

2

June 1, 2013 by Mike Hillwig

A few weeks ago, I started putting together a new presentation to give about database backups. The more I thought about it, the more I realized that people don’t need to understand backups as much as they need to understand recovery. That’s where this session came from.

One of the first questions I have for my audience is if they have a backup strategy. And then I ask if they have a recovery strategy. It’s a very pointed question.

A few years ago, I saw Paul Randal do his Mythbusters session at the SQL PASS Summit. He said something that I will never forget: You need restore strategy and not a backup strategy.

I just can’t find a flaw in his reasoning. We can plan our backups, but unless we know how we’re going to recover those backups, we’re asking for failure. That’s when I decided that I wanted to change the name of my presentation. It’s now called Recovery and Backups for Beginners. The emphasis needs to be on the recovery instead of the backups. Once we know what we need in terms of recovery, then we can find the right backup solution to get there.

How often are DBAs required to put in place bad data structures because developers have already written the code. This is the same thing. We’re going to shoe-horn in a recovery to meet our existing backups. It needs to be the other way around. Once we know what our recovery requirements are, only then should we start talking about how to get there.

I have this statement on three slides in the presentation: If you have a backup strategy and not a recovery strategy, you’re doing it wrong!

How do we know what our needs are? This is where my ITIL training comes in. You probably have an SLA or OLA within your organization, formally or informally. And these probably include an RTO and RPO.

Lets break apart the alphabet soup:

  • SLA: Service Level Agreement – This is an agreement between your company and your clients
  • OLA: Operational Level Agreement – This is an agreement between your IT service provider (you) and your internal departments
  • RTO: How quickly you need to be back up after some type of a system failure
  • RPO: How close to the failure point you need to recover. In other words, it’s the amount of data loss is acceptable.

Once you have defined these four things, then you can start putting together your strategy.  Some people seem to think that they can provide an RTO of ten minutes with an RPO of zero. With simple backup and recovery, that’s pretty unlikely.  If you lose an entire server, or the HBA in your needs to be replaced, you’re going to have an RTO much greater than ten minutes. The larger your database, the longer your recovery time. If your RTO is less than the time it takes you to restore your database, you’re simply not going to make it.

Once we know what our needs are, we can then start determining which recovery model works for our databases. I used to work for a company that had an RTO of twelve hours. That means if we lost a server, losing twelve hours of data was acceptable to the business. It would be painful, but all of the data could be recreated. That gave me a lot of flexibility in my recovery strategy. In my current environment, my RTO is 30 minutes. That’s a little tougher.

Recovery models tell us how us the backup methods. If you’re database is in SIMPLE recovery, it means you can only restore from a full backup. With a twelve hour RTO, I was able to keep my databases in SIMPLE mode and back up twice daily.  However, with FULL Recovery mode, we still need to do that full backup and then roll forward the transaction log backups. That works better in situations with much shorter RTOs. The other advantage is that FULL recovery mode lets us do a point-in-time recovery. In some applications, that can be a life saver.

Now that we have a better idea for what we need, we can start to talk about backups. Depending on the size of your database and your infrastructure, you can back up as often as your RTO allows. If you’re in SIMPLE recovery mode and have an RTO of twelve hours, it means you need to do a full backup at least once every twelve hours.  But if you’re in FULL recovery mode, you can back up as infrequently as you like and then roll forward all of your transaction logs. But be VERY careful doing this. Imagine doing a full backup once a month and then retaining 31 days worth of transaction log files. Recovering that could take days. Sure your backups won’t take long, but your recovery will be hell.

There is a reason I use the graphic of the guy with his pants down. If you don’t plan your recovery right, you WILL be caught with your pants down.

Recovery and backup for beginners from Mike Hillwig