Business leaders tend to be hyper-aware of the amount of money their business makes on average. So when an outage brings business operations, and revenue, to a halt, executives start to calculate how many dollars are being lost each day, hour, minute. During such a disaster, the chaos is exacerbated tenfold when the response team doesn’t know what they’re doing. When things are improvised, no one knows how many hours will be wasted, how many pieces can be rebuilt, or how many dollars were lost.
Disaster Recovery is a critical piece of business continuity planning that considers how to recover the critical technology and business infrastructure during an emergency service interruption. In terms of process and communication, it involves many people within operational and technical teams. As with any complex performance, rehearsal is key. At Evolve IP, we make sure you get the most out of all your DR tests.
If testing a disaster recovery plan is a rehearsal, a DR runbook is the script. A DR runbook can simply be appended to business continuity planning documentation. This script is not only important for the actors on stage in front of the audience, but also for the technology guys who are keeping the show running behind the scenes.
Operations team– CEO, CTO, CFO, any infrastructure architect, and Evolve IP (or your service provider)
The primary duty of the operations teams in DR planning and testing is to ascertain which infrastructure assets are most important to running the business, and then crafting response time expectations that match SLAs with a service provider.
The executive team should be responsible for delineating emergency response procedures, documenting it, and sharing with the rest of the team. The runbook should be completed by them, with the service provider, and then made readily available to the rest of the team. This is an overview of the parts of a DR Run Book that operations teams must consider:
- Assess which applications are most important to business
- Determine priority level for restoration order of infrastructure assets
- Determine RTO/RPO for these assets in the event of a disaster
- (only if hosting—determine SLAs for availability of these assets)
- Determine how much bandwidth is available to support off site backup replications
- Repeat for all sites
- Delineate response roles, call trees, and responsible parties for each critical infrastructure component
- Validate DR test results
Technology response team-DR Service Provider (such as Evolve IP), Systems admin, CTO, CIO, Director of Infrastructure
The primary duty of the technology team should be to validate that the DR provider has effectively stood up all business systems according to SLAs and that all end-users are able to access necessary applications.
Successful DR service providers should be doing all the heavy lifting. Customers should never have to access an off-site location to retrieve disks, or really do any manual work at all. DR providers should perform the entire business recovery and let the technology response team test, configure and communicate results to their team. To make it easier for everyone, these are a few considerations the technology response team needs to outline in the DR runbook:
- Survey current infrastructure
- Collect server inventory (OS, CPU, RAM, Disk)
- DR environment is architected to meet RTO/RPO
- Validate DR test results
There are, of course, various networking and bandwidth considerations when failing over to a secondary site. One successful method we deploy is to simply backup locally to a SmartFrame and replicate that backup to one of our DC’s via VPN. But there’s always more than one way to skin a cat. Contact an Evolve IP cloud expert to learn more about best practices surrounding business continuity planning and DR tests.Categories: General