RSS Feed

Human Factors of a DR Test


April 29, 2015 by Mike Hillwig

I’m going to go out on a limb and assume that everyone does a regular disaster recovery test. You DO have a disaster recovery plan, don’t you? What happens if a comet hits your data center? Or a terrorist attack. Or the power grid goes offline and you run out of fuel for the generators. How do you recover that? If you don’t, you have some catching up to do.

One of the phrases that dovetails nicely into disaster recovery is business continuity. As technology people, we tend to think about how we get the systems back online in case of a failure. But what about the business itself?

The company I work for is owned by a bank, so we don’t talk about disaster recovery and business continuity as individual constructs. We talk about disaster recovery and business continuity as a single entity. And our parent company takes it very seriously. A couple of years ago, we had hundreds of employees in New York and New Jersey who were impacted by Hurricane Sandy. Basically, all of our people based in and around New York City were out of commission. Our clients never knew it. Do you know why? It’s because we had business continuity plans. The two primary data centers  weren’t impacted because they’re in the midwest. But our people were. By leveraging our people in western Pennsylvania as well as a multitude of offshore resources, the only thing clients saw was that different team members were responding to their service requests as different times of the day. Tedious planning really paid off.

Every year, we do our regular “India out” exercise. This is in addition to regular technology DR tests. We simulate a situation where all of our offshore teams become unavailable. And we do this with our US-based teams filling in the gaps. It demonstrates to our auditors, our clients, and to ourselves, that we’re ready in case a crisis would hit. About a year ago, we had to implement those contingencies during a period of civil unrest that threatened to shut down our offices just outside of Mumbai. These “India out” exercises are what I refer to as a “scheduled bad day.” They really suck. Parts of our US teams work hours where we should be sleeping. It always confuses clients when they see me answering an email at 4AM. When I explain that we’re doing a business continuity test, they always appreciate it.

What gets me is when we do a disaster recovery and business continuity test at the same time. Occasionally during a DR test, they’ll throw us for a loop by saying we can’t use offshore resources. Or they’ll occasionally say that we can only use offshore resources. Or people in a given facility should be assumed offline. The worst is when they tell us that certain tools, such as our ITSM, email, or IM tools are unavailable during the test.

We train for crazy things. They’re inconvenient. There are always lessons learned. We hope they are things we never have to do in an actual disaster. But then again, nobody thought Hurricane Sandy would happen, either.