Conversations between a DataEndure SME backup engineer and customer can go like this:
I need to revamp my company’s backup-and-restore solution. The script I’ve been tweaking all along is getting too complicated and error-prone. But while I’m ripping and replacing, where can I get some of that virtual recovery stuff I’ve been hearing about?
We’d be happy to show you, but first let’s map out your current situation. Which of your applications are the most mission-critical – the ones that need to be up 24 by 7 with no hiccups?
All of them?
That would be nice, but far too expensive. What kind of company do you work for?
Manufacturing.
So the server that runs your core business – manufacturing – should have the most sophisticated backup-and-restore solution in your company. Agreed?
Absolutely.
RTO and RPO: Two Guiding Principles
Two key parameters define the service provider’s commitments to maintain and recover business continuity: the recovery-time objective (RTO) and the recovery-point objective (RPO). The RTO specifies the maximum number of hours during which any volume of data is permitted to be lost during a major IT outage without penalizing the service provider with fines.
In contrast, the RPO focuses on the maximum time period in which a specific IT service is permitted to be offline during an IT outage, without incurring penalties. The details of these contracts are defined in service-level agreements that both parties sign-off on.
Businesses pay third parties a premium for their risk groups’ guaranteed uptime. So let’s segment your workloads according to how mission-critical each one is. We’ll tag a rating of 1 to 10 for each of your servers.
Mapping Risk Levels to Technology
RTO and RPO requirements also drive the service provider’s calculations as to how elegant and high-performant each application server’s backup-and-restore solution needs to be to comply with each SLA. For example, a company’s core, mission-critical manufacturing application that commits a high volume of transactions may call for an RTO of zero and redundant backup servers including an offsite backup, whereas an ERP solution with a slow volume of transactions may be able to get by with replication or backup to tape or disk.
The Cost of Downtime
RTO and RPO specifications vary across industries. The telecommunications (“telco”) industry is a classic example of a data-intensive business model that depends so heavily on fast data feeds that telcos contractually oblige their streaming-data providers to guarantee 99.999% uptime, or “five nines,” for the service. And it makes sense:
According to an April 2016 Ponemon Institute survey, unplanned data-center outages cost enterprises an average of $7,900 per minute, a 41% increase from 2010. And according to an Aberdeen research study, the average company loses $163,674 in unused labor and lost revenue for each hour of downtime due to data loss.All in all, outages cost enterprises $700 billion a year, according to an IHS study from 2016.
Careful Coordination is Key
Only if all developers and administrators work in harmony to alert one another to new dependencies (such as those introduced by a new application going live) can anyone expect error-free scripts and backups. A lot of things can go wrong: an administrator’s worst nightmare is to reluctantly revert to a previous backup only to realize that that it never happened.
Questions to Ask Yourself
The key is to describe every workload that may be be worth backing up regularly, and sharing any other relevant information that can help design adequate back-up specifications.
- Have you identified your risk groups – i.e., the most mission-critical applications? Typically they would include the servers that manage your core, mission-critical applications.
- What other workloads may possibly be candidates? Then tag them based on their risk category. OK, tag CRM as a risk-category 2. Then look at other key workloads and do the same.
- How long can your information be down before you’re in trouble?
- If you’re down for more than an hour, than your RPO, or recovery point, becomes one hour?
- What’s your recovery time objective (RTO)? That is, how long can you tolerate downtime?
- If it’s an hour, than your RTO is an hour.
This is a good start. The next steps are to visit your data center, document the backup requirements, and, time allowing, discuss the pros and cons of virtual recovery.