EC Cloud

It’s never too late to re-evaluate your backup plan | Extreme Compute

Written by Lokesh Mandapati | Dec 23, 2020 4:15:00 AM

Data protection architecture of the database should be specified by business requirements. Such criteria include considerations such as rate of recovery, maximum allowable data loss, and the need for backup storage. The data protection plan must also take into account various regulatory requirements for the retention and restoration of data. Ultimately, different data recovery scenarios must be addressed, ranging from normal and predictable recovery from user or device failure to disaster recovery scenarios that include total site loss.

Small changes in data security and recovery policies can have a huge impact on the overall storage, backup and recovery architecture. It is important to identify and report requirements before beginning development work in order to avoid complicating the data protection architecture. Additional features or levels of security lead to unnecessary costs and overhead management, and an initially ignored specification can lead the project in the wrong direction or require last-minute design changes.

Recovery Time Target

The recovery time objective (RTO) determines the maximum time available for the recovery of a service. For example, a human resources system could have a 24-hour RTO because, although it would be rather difficult to lose access to this data during the working day, the company can still function. On the other hand, the server representing the bank's general ledger would have the RTO calculated in minutes or even seconds. A RTO of zero is not feasible because there must be a way to distinguish between an actual service interruption and a normal occurrence such as a missing network packet. A near-zero RTO is a common norm, however.


Recovery Point Objective 

The recovery point objective (RPO) determines the highest tolerable loss of data. In the scope of a system, the RPO is usually a matter of how much log information can be lost in a specific situation. In a standard recovery situation where the server is damaged due to a software malfunction or user error, the RPO should be null, so there should be no data loss. The recovery procedure involves restoring an earlier copy of the database files and then replaying log files to return the database state to the desired point in time. The log files needed for this operation should already be in place at the original location. Log information may be lost in rare scenarios.

For example, accidental or malicious database file rm-rf* may result in the removal of all data. The only alternative would be to restore from backup, including log files, and some data would likely be lost. The only way to improve the RPO in a traditional backup environment would be to perform repeated backups of log data. However, this has limitations due to the constant movement of data and the difficulty of maintaining a backup system as a continuous service. One of the advantages of advanced storage systems is the ability to protect data from accidental or malicious file damage and thus deliver better RPO without data movement.


Disaster Recovery

Disaster Recovery involves the IT infrastructure, policies and procedures needed to recover a system in the event of a physical disaster. This may include flooding, explosion, or a person acting with malicious or reckless intent. Recovery of disasters is more than just a compilation of recovery procedures. It is a comprehensive process of recognizing the different risks, specifying the criteria for data recovery and continuity of services, and providing the right structure with the related procedures.

In setting data protection criteria, it is important to differentiate between the standard RPO and RTO requirements and the RPO and RTO requirements needed for disaster recovery. Many server environments require an RPO of zero and a near-zero RTO for data loss situations ranging from a relatively normal user error to a fire that destroys a data center. There are, however, costs and operational implications for these high levels of protection.

In general, the criteria for non-disaster data recovery should be stringent for two reasons.
  • First, software bugs and user errors that harm the server are common to the point where they are almost inevitable.
  • Second, it is not difficult to develop a backup plan that can offer null to low RTO RPOs as long as the storage system is not destroyed. There is no reason not to address a significant risk that can be easily remedied, which is why the RPO and RTO targets for local recovery should be aggressive.

Considering the above scenarios, it is always plausible to re-evaluate your backup plan. Disaster recovery criteria RTO and RPO differ more broadly based on the likelihood of a catastrophe and the effects of related data loss or damage to a company. The specifications for RPO and RTO should be based on actual business needs and not on general principles. They have to account for multiple logical and physical disaster scenarios. 

So, the vital question stays, What is your Plan B ?