Instant recovery strategies for S3 and DynamoDB in the AI era.

6 min readOct 16, 2024

A few weeks ago, I had the opportunity to attend the SHIFT conference organized by Commvault, and it was quite an experience. I wanted to share some insights on a key announcement that caught my attention.
With the increasing complexity of cyber threats, having a robust approach to data protection has never been more critical, and Commvault’s innovations seem well-positioned to address these challenges effectively.

Especially now that their solutions are available on AWS marketplace, ready to be deployed in our AWS accounts!

The explosion of data volume, particularly in sectors like healthcare and e-commerce, has created significant hurdles in storage, management, and security. This data surge, coupled with the demand for real-time access in industries such as finance and gaming, puts immense pressure on companies to maintain highly responsive and resilient data systems.

The regulatory landscape adds another layer of complexity to data management strategies. For instance, financial institutions in the EU are preparing for the Digital Operational Resilience Act (DORA), set to become mandatory from January 2025. This act mandates robust ICT risk management, including stringent data protection and recovery capabilities. Simultaneously, the rising sophistication of cyber threats has put companies at increased risk of data breaches and ransomware attacks, necessitating constant vigilance and adaptive security measures.

Data resilience is a critical aspect of an organization’s data management strategy, encompassing the ability to safeguard, restore, and maintain data integrity in the face of various unforeseen events or disruptions. For S3 and DynamoDB workloads, this concept takes on heightened importance, as it ensures that data remains both accessible and recoverable under a wide range of scenarios, including but not limited to accidental deletions, sophisticated cyberattacks, or large-scale regional outages.

Key Metrics: RTO and RPO

In the realm of data resilience, two critical metrics stand out: Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO represents the maximum acceptable duration for restoring a system after a disruption occurs, while RPO indicates the maximum tolerable period in which data might be lost due to a major incident. For services like Amazon S3 and DynamoDB, striking the right balance between these metrics is crucial to ensure optimal data protection and system availability.

When it comes to implementing resilience strategies for data, multi-region approaches offer robust solutions. A multi-region active-active strategy involves replicating and actively using data across multiple regions. This approach provides high availability and fault tolerance while reducing latency for geographically distributed users. However, it also increases complexity and the potential for data conflicts. On the other hand, a multi-region active-passive strategy designates one region as primary and others as standby. While simpler to implement and potentially more cost-effective, this approach may result in longer recovery times during failover events.

For Amazon S3, Cross-Region Replication supports both active-active and active-passive setups. S3 also offers versioning and lifecycle policies for enhanced data protection, along with high durability across multiple Availability Zones by default. DynamoDB, meanwhile, provides Global Tables for multi-region active-active replication, supports on-demand backups and point-in-time recovery, and offers automatic scaling and high availability within a region.

Commvault: Instant Recovery for AWS

While AWS offers native backup and recovery solutions, third-party providers like Commvault have developed specialized tools to enhance data protection. Commvault’s solution for S3 and DynamoDB offers near-instant recovery, significantly reducing RTO to minutes. This is achieved through continuous data protection, which minimizes RPO by constantly capturing changes. Additionally, Commvault provides air-gapped backups, storing them separately from production environments to enhance security.

Commvault and S3 use case

Imagine you’re a data scientist at a large e-commerce company, where Amazon S3 serves as the backbone for storing critical customer data, product information, and extensive AI/ML training datasets.
Over the years, your reliance on S3 has grown, and with it, the importance of ensuring that this data is not only secure but also easily accessible.

When you first integrate Commvault with your AWS account, the process is seamless. Commvault quickly gains access to your S3 buckets without requiring complex configurations, allowing you to focus on your core tasks rather than getting bogged down in setup. Once integrated, Commvault takes charge by automatically backing up your S3 data according to the policies you set.

For instance, you might decide to schedule daily backups of your customer data while opting for weekly backups of less frequently changed AI training datasets. This automated approach ensures that your data is consistently protected without any manual intervention.

As your company grows and the volume of data skyrockets from terabytes to petabytes, Commvault scales effortlessly alongside you. When your marketing team suddenly uploads a massive dataset for a new AI-driven recommendation engine, Commvault handles the increased backup load without a hitch, ensuring that performance remains uninterrupted.

However, challenges can arise unexpectedly. One day, a junior developer accidentally deletes a crucial product catalog. Instead of panicking, you turn to Commvault’s intuitive interface. Within moments, you locate the specific files from your latest backup and restore them with ease, minimizing downtime and keeping operations running smoothly.

Implementing Instant Recovery

When setting up an instant recovery solution, it’s crucial to ensure proper configuration by aligning settings with your specific RTO and RPO requirements. Following best practices is essential, including regularly testing recovery processes, implementing least-privilege access controls, and using encryption for data at rest and in transit. It’s also important to validate your strategy by conducting periodic recovery drills to ensure your system performs as expected.

Remember, while solutions like Commvault offer advanced features, they work best when integrated with AWS’s native capabilities. A comprehensive approach combining AWS services and specialized tools often provides the most robust data resilience strategy.

The AI era has just started

We cannot avoid to touch on AI.

When we talk about AI workloads, we’re dealing with a whole new ballgame in terms of data scale and complexity.

Think about it: traditional backup solutions were never designed to handle petabytes of data or billions of objects. That’s where Commvault comes in, and it’s pretty interesting how they’ve approached this challenge.

First off, their architecture is serverless and built to scale horizontally.
This is crucial because AI datasets aren’t just big — they’re massive and growing exponentially.

Commvault’s serverless architecture on the right of the diagram

The nature of AI workloads introduces new challenges. You’ve got training data, model versions, inference results — all of which need protection. Commvault’s platform is designed to handle these diverse data types across the entire AI pipeline.Now, here’s where it gets really interesting from a technical standpoint: compliance and regulation. As AI becomes more prevalent, we’re seeing increased scrutiny from regulatory bodies. Commvault’s approach to intelligent backup and immutable storage is forward-thinking. It’s not just about protecting data; it’s about maintaining its integrity and traceability over long periods.

From a technical perspective, what Commvault is doing is essentially consolidating and optimizing data protection mechanisms that were previously disparate and often inefficient. It’s not about reinventing the wheel, but rather about re-engineering it for the demands of AI-driven, cloud-native environments. Of course, it’s not a silver bullet — no solution is. But for teams working with AI on AWS, especially those pushing the boundaries of scale and complexity, Commvault’s approach is worth a closer look. It’s addressing problems that many organizations are only just beginning to grapple with as their AI initiatives mature.

In Summary

As we move further into the AI era, organizations must continually reassess and upgrade their data protection strategies. The goal is to empower businesses to harness the full potential of AI and cloud technologies without compromising on data security or operational efficiency. By adopting these advanced recovery strategies, organizations can confidently push the boundaries of what’s possible in the cloud, knowing their most valuable asset — their data — is well-protected and readily accessible when needed.