As some point, if you're developing an application, you'll need to pause, and reflect on the right approach to store/retrieve/protect the data persisted.
As illustrated in the following diagram, before starting to implement anything, you need to understand the trade-offs, ask yourself these questions :
Choosing the right database will impact the performance of your application, so it's important to understand which type of databases are the most appropriate for your use case.
Your application might be a monolith or microservices, either way, the structure of the data persisted might be more efficient for some services and not for others. You often have to choose between 2 main type of databases :
Relational Database (SQL)
Non-Relational Database (NoSQL)
Relational databases are the most common database used by professionals. The main advantage is to provide strong consistency for your application.
Relational databases offer ACID properties : Atomicity, Consistency, Isolation and Durability.
One other aspect, is that relational database require a schema (fixed structure before storing data), this provides more control on the type of data your application persist.
If you need strong relationship (a query to join multiple tables, etc ..), relational database are the most appropriate choice.
To summarize, if your application require the following, you may choose a relational database (SQL):
ACID properties
Data needs to be structured and consistent
The transactions are a strong priority
You have to perform complex queries between data (joins, etc..)
You need in advance to set a fixed schema (structure of the data persisted)
Customers table in a SQL database
| CustomerID | FirstName | LastName | Email |
|------------|-----------|----------|------------------------|
| 101 | Bob | Smith | bob.smith@email.com |
| 102 | Alice | Johnson | alice.j@email.com |
| 103 | John | Doe | john.doe@email.com |
Students table
| StudentID | Name | Age | GPA | GraduationYear |
|-----------|--------------|-----|------|----------------|
| S001 | Daniel Monto | 20 | 3.7 | 2026 |
| S002 | Emma Brown | 22 | 3.9 | 2024 |
| S003 | Liz Brian | 21 | 3.5 | 2025 |
Use-case 1 (Financial industry) : If you develop an application for the financial sector, there is chance that you may want to choose a relational database, for strong consistency , to prevent double spending or balances. You may also have to perform complex queries between customers, accounts and transactions.
Use-case 2 (Ecommerce platform) : If you develop an ecommerce application, you probably need to have a structured data to store your products, categories, customers, orders or reviews. You may need to perform complex queries between the customer and the orders, and you need to guarantee transactions (strong consistency) to ensure a reliable inventory management or checkout.
Use-case 3 (Healthcare systems) : If you develop a healthcare application, and as you store sensitive data, you probably need strong consistency, and integrity. Also, healthcare industries are heavily regulated, so having a relational database will ensure your data persisted follow the latest compliance and regulations.
Non-Relational databases offer more flexibility. You don't need to have strong consistency and you don't need a fixed schema before storing your data. NoSQL databases are more appropriate for use cases where you need to store unstructured or semi-structured data (JSON Document, Key Value), and you think about having lower latency.
| Types of unstructured data | Examples |
|---|---|
| Text documents | Emails, chat messages, Word/PDF files |
| Media files | Images, videos, audio recordings |
| Social media posts | Tweets, comments, likes |
| Logs & event data | Server logs, clickstreams, IoT device messages |
| Web pages | HTML content with mixed text, links, metadata. |
JSON Document
[
{
"user_id": 123,
"name": "Alice",
"purchases": ["Book", "Laptop", "Headphones"],
"last_login": "2025-09-14"
},
{
"user_id": 124,
"name": "Bob",
"purchases": ["Book", "Laptop", "Headphones"],
"last_login": "2025-09-14"
},
{
"user_id": 125,
"name": "John",
"purchases": ["Book", "Laptop", "Headphones"],
"last_login": "2025-09-14"
}
]
NoSQL databases are a good choice, if you don't want to deal with strict rules to manage your data and the transactions.
To summarize, if your application require the following, you may choose a non-relational database (NoSQL):
Unstructured or semi-structured data, and it's flexible
You expect a huge volume of data, and you don't want to worry about horizontal scaling
You need to have a simple design, with the potential to scale fast.
Use-case 1 (Real-time analytics and Big Data) : You develop an application, at some point you may need to understand more about the users behavior and activity, this will create a huge volume of data (logs, clickstreams, ...), the best way is to use NoSQL databases to handle these type of data (unstructured and semi-structured). Also choosing a non-relational database, is a better option for faster writes and horizontal scaling.
Use-case 2 (Content Management & Product Catalog) : You develop a content management application, or you need to store product catalogs. The content often changes (flexible schema), using JSON documents to store these type of data is more efficient.
| Company | Database used | Use Case | Estimated Daily Active User (DAU) | Sources |
|---|---|---|---|---|
| Stripe | PostgreSQL + Redis | PostgreSQL for transactional payment data (ACID compliance critical), Redis for caching & real-time fraud detection | ~1.3 million active websites using Stripe globally | https://redstagfulfillment.com/how-many-payments-stripe-process-per-day/ |
| Notion | PostgreSQL + Redis | PostgreSQL for document + collaboration metadata storage, Redis for session caching and quick lookups | ~20M monthly active users | https://www.boringbusinessnerd.com/top-startups? |
| Canva | PostgreSQL + Redis | PostgreSQL for design assets and user accounts, Redis for fast retrieval of frequently accessed templates | ~100M monthly active users | https://www.boringbusinessnerd.com/top-startups? |
| Slack | MySQL, PostgreSQL, Redis | MySQL/Postgres for message history & team data, Redis for ephemeral data (presence, typing indicators) | ~42–47M daily active users | https://www.demandsage.com/slack-statistics/? |
Your database is by default in a healthy state, all operations (connections, queries) return a successfull response. But you may experience unexpected downtime due to failures .
Unavailable access to your data can have significant impact on your business activities, revenue and reputation. It's important to understand the most common root cause of database failures.
Replication Lag & Failover Issues
Schema Migrations Gone Wrong
Misconfigurations (DNS, Networking, Permissions)
Lack of Backups & Disaster Recovery Planning
Planning for these scenarios during the design phase is essential. There are 2 main metrics to implement in order to lower the risks if an incident happens :
Recovery Time Objective (RTO) : How long it takes to restore the database in case of failure ?
Recovery Point Objective (RPO) : How much data can you loose (the retention of your backups : minutes, days, months, years) ?
There are several database recovery strategies :
| Database recovery strategies | RTO (Recovery Time Objective) | RPO (Recovery Point Objective) | Cost |
|---|---|---|---|
| Full Backup (Daily/Weekly) | Hours to days (restore from backup) | Up to last backup (24h+ data loss possible) | $ (Cheap) |
| Database recovery strategies | RTO (Recovery Time Objective) | RPO (Recovery Point Objective) | Cost |
|---|---|---|---|
| Asynchronous Replication (Active-Passive) | Seconds to minutes | Seconds to minutes (small lag possible) | $$ (higher cost) |
| Database recovery strategies | RTO (Recovery Time Objective) | RPO (Recovery Point Objective) | Cost |
|---|---|---|---|
| Synchronous Replication (Active-Standby) | Seconds to minutes (automatic failover) | Zero (no data loss) | $$$ (Very expensive) |
Here are the recommended RTO and RPO per industry :
| Type of workload | Recommended RTO (Recovery Time Objective) | Recommended RPO (Recovery Point Objective) | Sources |
|---|---|---|---|
| Mission-critical (tier-1) applications | ≤ 15 minutes | “near-zero” | https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-of-on-premises-applications-to-aws/recovery-objectives.html |
| Important but not mission critical (tier-2)” apps | ≤ 4 hours | ≤ 2 hours | https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-of-on-premises-applications-to-aws/recovery-objectives.html |
| Less critical applications | 8-24 hours | 4 hours | https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-of-on-premises-applications-to-aws/recovery-objectives.html |
Real-world cases where lack of RPO/RTO strategies cost companies revenue
| Company | Incident Description | Revenue / Impact | Root Cause / Lessons | Sources |
|---|---|---|---|---|
| Ma.gnolia (2009) | Production database lost; backups incomplete/unrecoverable. | Complete data loss; company shut down. | No RPO/RTO strategy; backups not tested, no recovery plan. | More details |
| Code Spaces (2014) | Attackers deleted databases and backups on AWS; recovery impossible. | 100% business loss; company shut down within days. | No tested disaster recovery plan, no offsite/immutable backups (RPO/RTO failure). | More details |
| Delta Airlines (2016) | Power outage + backup system failure caused DB corruption; IT systems down. | 2,300+ flights canceled; global operations disrupted for 3 days; ~$150 million loss. | Disaster recovery didn’t meet RTO; systems took too long to restore. | More details |
| British Airways (2017) | Data center power issue led to DB corruption for check-in/reservation systems. | 75,000 passengers stranded; days of disruption; £80+ million in costs. | Lack of resilient recovery plan to meet business-critical RTO for databases. | More details |
The database stores all kind of data, you probably want to limit who can have access to the database, especially if you store sensitive and critical information. Understanding who should have access is a first step, is it a service account or a human user ?
To enhance the access to the database, it might be a good practice to limit the network access, authorizing only requests from specific subnet (usually backend subnet), and use security group to authorize incoming traffic on the database target TCP port from restricted sources (IP address, group of servers, etc..)
Once the user or service account authenticated, it's important to limit the actions on the database (CREATE DATABASE, CREATE TABLE, UPDATE, ...). Best practices include assigning the least privileges (roles to perform only restricted operations on the database). Credentials can be stored in a secrets store (secrets manager, key vault, ..), with password rotation to enhance security.
Breaches due to credentials
| Company | Incident Description | Data Compromised / Key Metrics | Root Cause / Lessons | Sources |
|---|---|---|---|---|
| Uber | Attackers accessed Uber’s AWS database using credentials found in a GitHub repository. | 57 million users and 600,000 drivers’ personal information. | Storing credentials in version control; importance of secret management. | More details |
| Capital One | Attacker exploited a misconfigured AWS IAM role using stolen credentials to access cloud storage. | 106 million customer records in the U.S. and Canada. | Excessive permissions in IAM roles; need for credential rotation and monitoring. | More details |
| Dropbox | Stolen employee password (from a LinkedIn breach) was reused to access Dropbox employee accounts, leading to data exposure. | 68 million user accounts affected. | Highlights the dangers of password reuse and lack of MFA. | More details |
It's a good practice to secure the traffic between your applications and the database. By enabling SSL mode in your database, you lower the risks, and prevent anonymous connections from intercepting or tampering the traffic sent to the database.
Also if you need to follow these regulations (GDPR, HIPAA, PCI DSS, SOX, HDS), it's required to enable encryption to secure the data in transit.
GDPR (General Data Protection Regulation) : A regulation by the EU that governs how organizations collect, process, store, and share personal data of individuals in the EU
HIPAA (Health Insurance Portability and Accountability Act) : sets standards for how healthcare organizations, insurers, and their business partners handle Protected Health Information (PHI) — any data that can identify a patient and relates to their health, treatment, or payment for care
PCI DSS (Payment Card Industry Data Security Standard) : a global security standard created to protect cardholder data (credit/debit card information) and reduce payment fraud. Applies to any organization that stores, processes, or transmits payment card data.
HDS (Hébergement de Données de Santé) : french regulation, It requires that healthcare-related personal data (patient records, medical history, test results, etc.) be stored and processed only by certified Health Data Hosts
Enabling SSL mode on your database does add CPU overhead for encryption and decryption. One option to prevent more load on your CPU is to use connection pooling.
Connection pooling is cache of open database connections that your application can reuse instead of creating a new one each time, reducing latency and CPU load.
Using benchmark to measure if your database performs better with or without connection pooling is recommended.
Breaches due to not protecting data in transit
| Company | Incident Description | Data Compromised / Key Metrics | Root Cause / Lessons | Sources |
|---|---|---|---|---|
| Heartbleed (OpenSSL) | The Heartbleed vulnerability in OpenSSL allowed attackers to read memory, exposing data that was assumed to be encrypted in transit. | Usernames, passwords, session keys, and sensitive application traffic (Yahoo confirmed user credential leaks). | Flawed TLS implementation; always patch crypto libraries and enforce TLS properly. | More details |
| TJX / TJ Maxx | Attackers exploited weak WEP encryption between store POS systems and backend databases to intercept card data. | 94 million credit card numbers stolen. | Outdated WEP protocol; always use strong encryption standards (TLS 1.2/1.3, WPA2+). | More details |
| Equifax | Attackers exploited unpatched Apache Struts, intercepting unencrypted traffic between apps and backend databases. | ~147 million personal records including SSNs and financial data. | Lack of TLS enforcement and unpatched software; highlights need for TLS and regular patching. | More details |
| Marriott / Starwood Hotels | Attackers monitored unencrypted internal traffic between reservation systems and databases for years. | 500 million guest records (passport numbers, PII, payment data). | Legacy systems lacked encryption in transit; internal traffic must also be encrypted. | More details |
You've just secured the traffic sent to your database. But how about the data stored in your database ? It's a good practice to encrypt the data at rest, at the disk level, using key management system. Backups should also be encrypted to prevent unauthorized access to the data.
Breaches due to not protecting databases at rest
| Company | Incident Description | Data Compromised / Key Metrics | Root Cause / Lessons | Sources |
|---|---|---|---|---|
| Health Net | Lost unencrypted hard drives containing sensitive medical data. | ~1.9 million patient records. | No encryption on physical storage; highlights importance of database encryption at rest. | More details |
| US Veterans Affairs | Stolen laptop contained an unencrypted database of veterans’ information. | ~26.5 million personal records. | No encryption on endpoint databases; lesson: enforce full-disk and DB encryption. | More details |
| Ashley Madison | Hackers stole entire unencrypted database, with weak password storage and payment details. | ~32 million user records leaked. | Poor hashing/encryption; highlights need for strong cryptography at rest. | More details |
| Desjardins Group | Employee exfiltrated internal unencrypted databases containing customer information. | ~4.2 million customer records. | Lack of encryption at rest made insider theft possible; highlights need for DB encryption and monitoring. | More details |
You may have to handle more load on your database (traffic, queries, ..), and to prevent any performance degradation, you have some options to scale your database. Usually there are 2 scaling strategies :
Vertical Scaling
Horizental Scaling
Vertical scaling is the first option if you want to handle more traffic, it's easier to implement, you just have to add more resources (more cpu, memory or faster disk and bandwitdh).
Horizental scaling is a little bit more complex to implement. You need to add replicas node which handle only read-only requests. All the write queries goes to the primary DB, and the read queries to the replicas nodes to distribute the traffic.
There are several other options you can implement to handle more traffic
Load balancers or databaes proxies : helping you redistribute the traffic (connection pooling)
Autoscaling : cloud managed databases offer options to scale automatically your database.
Sharding and Partionning : For heavy workloads and traffic, you may need at some point to use sharding and partionning, the goal is to split your database in multiple smaller "datasets". It's usually quite complex to implement.
Use cases with scaling-related database outage/revenue loss
| Company | Incident Description | Revenue / Impact | Root Cause / Lessons | Sources |
|---|---|---|---|---|
| Twitter (Fail Whale Era) | Frequent outages due to MySQL write bottlenecks as user growth exploded. | Millions in lost ad revenue and user churn. | Vertical scaling limits reached; need for sharding, caching, and horizontal scaling. | More details |
| Amazon Prime Day (2018) | Surge of traffic overloaded internal databases and order systems. | Estimated $72–100 million in lost sales during downtime. | Peak load exceeded DB scaling capacity; need for proactive load testing and auto-scaling. | More details |
| Roblox (2021) | A database growth issue caused cascading failure across backend (MongoDB-based). | 3 days offline; tens of millions in lost revenue + developer payouts delayed. | Poor handling of DB growth; need for partitioning, replication, and distributed DB design. | More details |
| Facebook (early scaling) | Multiple outages due to MySQL bottlenecks and insufficient caching in early years. | Millions in lost ad revenue during downtime. | Single-node bottlenecks; drove adoption of sharding and memcached at massive scale. | More details |
| GitHub (2012) | Database cluster failover led to prolonged downtime; writes bottlenecked on a single master. | Lost paid subscriber revenue and SLA penalties to enterprise customers. | Weak HA setup; need for resilient replication and multi-master designs. | More details |
| Ticketmaster (various) | Ticket sales portals crashed during high-demand events due to backend DB bottlenecks. | Hundreds of millions in lost ticket sales and refunds (e.g., Taylor Swift 2022). | Insufficient scaling for sudden spikes; need elastic DB scaling and load balancing. | More details |
Let's assume your application is running smoothly, everything works just fine. Users have access to every features of your application and can perform all operations. Then suddenly, you start getting emails from 1 frustated user, furious about their experiences, seems like some features didn't work fine, or there is a huge latency when performing some actions. But it doesn't end there, more and more users start to send emails with their bad experiences.
You're shocked as you didn't notice any sign of performance degradation, your application seems to work just fine. You then realize by investigating, that there are a lot of warnings, and errors in the logs.
This experience shows the importance of monitoring the activity of your infrastucture, to anticipate and prevent in advance your users/customers from having bad experiences.
It's then crucial to understand which metrics can help your understand what's happening on your database and if you should get any alerts as soon as some threshold are met.
| Monitoring Objectives | Description | Core metrics to collect | Alerts and thresholds |
|---|---|---|---|
| Availability & health |
Is the database up and reachable ? |
Examples of metrics on PostGreSQL : pg_up, pg_connections, pg_stat_activity_count |
Examples with PostgreSQL (pg_up == 0) |
| Performance |
Is there any latency to reach the database ? Are the queries slow ? |
Total queries/sec and transactions/sec 95th/99th percentile query latencies Slow queries count |
connections >= 90% of max_connections (Connections near max) query running > 5–15 min (Long running query) slow_queries / total_queries > 1% over 5m (Slow query rate) |
| Resources & capacity |
Is the database server capacity overloaded ? What is the status of the resources consumption (CPU, Memory, Disk usage, Disk latency, network I/O) |
CPU Memory Disk usage, Disk Latency |
CPU > 85% for 5m Memory > 85% for 5m Disk usage > 80% |
| Errors & Fault |
What are the errors in the logs ? This should indicate the failed queries, connection errors |
Error Logs |
Error rate (failed queries / total queries) |
| Security and Audits |
Logins (successful & failed) Privilege changes Suspicious exports |
failed logins new user creation privilege escalations unusual export activity |
failed login attempts spike (possible brute force) |
| Replication and Bakcups |
Replication lag Backup success WAL/redo lag |
last successful backup time backup duration restore test results |
last_backup_time older than expected retention window lag > 10s (or > 60s depending on app) |
Use cases with revenue loss from not monitoring DB activity
| Company | Incident Description | Revenue / Impact | Root Cause / Lessons | Sources |
|---|---|---|---|---|
| Target (2013) | Attackers exfiltrated payment card data; suspicious DB queries went unnoticed. | ~$162 million in costs (settlements, fines, legal). | Lack of real-time monitoring for abnormal database reads and exports. | More details |
| Sony Pictures (2014) | Attackers had weeks of DB access; leaked unreleased films, scripts, employee data. | Hundreds of millions in lost revenue and IP theft. | No database query monitoring or large export detection. | More details |
| Equifax (2017) | Attackers exploited Struts flaw, accessed DBs, and stole 147M personal records. | >$1.4 billion in breach costs + stock value drop. | No alerts on abnormal DB access to sensitive PII. | More details |
| Capital One (2019) | Misconfigured firewall allowed DB access; 100M+ records exfiltrated without detection. | $190 million in settlements + regulatory fines. | No alerts for massive data pulls from databases. | More details |
| Desjardins Group (2019) | Insider exfiltrated data of 9.7M members over months without being flagged. | Hundreds of millions in costs, lawsuits, and brand damage. | Lack of insider threat monitoring and alerts for unusual exports. | More details |
| Shopify (2020) | Rogue employees accessed merchant DBs and stole customer transaction details. | Reputational harm, legal expenses, and merchant trust loss. | Inadequate monitoring of privileged employee queries on live DBs. | More details |
During the design phase, you should define a cost model and forecasting plan for operating your infrastructure in the cloud, so you understand expected expenses and the trade-offs of architectural decisions
Your cloud cost architecture plan often includes the following :
Expected baseline costs (compute, storage, networking)
Your assumptions on the growth and if your infrastructure scales
The cost tradeoffs of the design decisions (should you implement a multi-AZ, a single-AZ, reserved vs on-demand, open-source vs commercial , etc ...)
There are always hidden costs while operating your cloud database :
| Cost Category | Use Cases | Prevention | Sources |
|---|---|---|---|
| Backup / Snapshot Storage & Retention |
You enabled snapshots on your database, and you don't have a clear retention policies There are copies of your snapshots cross-region |
Snapshot storage are charged per GB/month, unecessary snapshots should be deleted Set retention policies |
More details |
| Data Transfer (Cross-AZ / Cross-Region) |
You application read from database (read-replicas) in another AZs or in another region There is traffic (egress) sent to Internet, We tend to underestimate the volume of data (egress/ingress). |
Collect metrics and setting alerts to prevent the billing to increase unexpectedly. |
More details |
| Idle / Non-Production Environments & Orphaned Resources |
You run multiple environnements (staging, dev, production), but not all the environnements needs to run 24/7 Snapshots remain after a DB deletion The database instance size absorb peaks but it's usually in idle (underutilized) |
Turn off resources in non critical environments (automatic shutdown off of instances, etc..) Right size the database instance type, benchmark if possible, optimize the queries if necessary |
|
| Monitoring, Logging, Metrics Retention |
You enabled multiple metrics, and detailed logs (audit logs, ddl events, etc ...) |
Track the volume of the logs ingested and the storage capacity costs (increase) |
Use cases with revenue loss due to hidden or unexpected costs
| Company | Incident Description | Revenue / Impact | Root Cause / Lessons | Sources |
|---|---|---|---|---|
| Pinterest (2017) | AWS costs spiked during holiday season due to massive unoptimized database and compute usage. | Margins squeezed by unplanned cloud spend during peak traffic. | Lack of monitoring and cost controls on scaling workloads. | More details |
| Adobe (2018) | Large unoptimized AWS workloads ran longer than intended, driving up cloud bills. | Tens of thousands in unplanned charges. | Lack of governance and auto-shutdown policies for workloads. | More details |
| Basecamp (2014) | AWS S3 costs surged due to inefficient data storage and backups. | Direct margin squeeze until re-architecture. | No cost monitoring on object storage growth. | More details |
| NASA JPL (2019) | AWS credentials exposed, attackers used resources for crypto-mining. | ~$30,000 in stolen compute costs. | No monitoring of abnormal API usage or sudden billing spikes. | More details |
| Snapchat (2017) | Google Cloud costs spiraled as user activity surged faster than infra optimization. | Losses of $2.2 billion in IPO year; cloud costs grew faster than revenue. | Poor forecasting and inefficient DB queries increased hidden costs. | More details |
| Multiple Orgs (2018–2021) | Misconfigured open databases exploited for crypto-mining workloads. | Tens of thousands in surprise compute bills. | No alerts on abnormal workloads or sudden cloud billing spikes. | More details |
Databases breaches and incidents happen all the time, 100% protection does not exist. Prestigious companies and firms have experienced security breaches despite having implemented tight security controls, it's important to learn from their experiences to understand how you can prevent these scenarios or at least lower the risks or the gravity if you experience a breach.
The goal is not to implement the latest security tools and think you're out of reach, it's better to implement a progressive and continuous protection, re-assess continuously your security posture, improve the protection of the weakest link in your infrastructure.
Here you can find the top root causes of database breaches :
Misconfiguration (Open Databases, Wrong ACLs,..)
Credentials (weak passwords, reused credentials, no password rotation, no MFA)
Zero-days /Supply Chain attacks
Provider/Partner attacks
Below is the summary of recent database breaches grouped by root cause :
| Root cause (ranked by frequency) | Example of breach | Year | Impact (record/people) | Why It happened | Sources |
|---|---|---|---|---|---|
| 1. Misconfiguration (Unprotected DBs / Poor Access Controls) |
Chinese Surveillance DB |
2025 |
~4 Billions |
Database left exposed, no password protection |
More details |
| 1. Misconfiguration (Unprotected DBs / Poor Access Controls) |
Microsoft/Apple/Google/PayPal DB leak |
2025 |
184 Millions |
Elasticsearch DB exposed without auth |
More details |
| 1. Misconfiguration (Unprotected DBs / Poor Access Controls) |
Texas General Land Office |
2025 |
44,485 people |
Access control misconfig allowed cross-user data view |
More details |
| 2. Credential Compromise / Weak Authentication |
23andMe |
2023-25 |
Dozens of orgs; AT&T, Ticketmaster, Santander, etc. |
Weak MFA/credential hygiene; compromised access |
More details |
| 2. Credential Compromise / Weak Authentication |
Kering / Gucci–Balenciaga |
2025 |
Millions of customers |
Hacker group “Shiny Hunters” stole customer DB creds |
More details |
| 3. Software Vulnerabilities / Supply Chain Exploits |
MOVEit Transfer Zero-day |
2023 |
~93–100 Millions people, 2700+ orgs of customers |
Zero-day exploited in widely used file transfer system |
More details |
| 3. Software Vulnerabilities / Supply Chain Exploits |
Latitude Financial |
2023 |
~14 Millions records |
Exploit in provider’s systems; details not fully disclosed |
More details |
| 4. Direct Cyberattacks / Service Provider Compromise |
Qantas Airways |
2025 |
~5.7 Millions customers |
Hackers gained access to customer DB |
More details |
| 4. Direct Cyberattacks / Service Provider Compromise |
Miljodata |
2025 |
~1.5 Million people |
IT provider cyberattack leaked sensitive records |
More details |
Everything breaks all the time. Databases can break in production, it's critical to understand the main sources of these issues and anticipate any actions to prevent and reduce their impact and cost.
| # | Database Issue | Real-World Example | Business Impact | Estimated Loss | Root Cause | Sources |
|---|---|---|---|---|---|---|
| 1 | Connection Leaks | Barclays IT Outage (Jan 2025) – mainframe software bug impacted core banking access. | Widespread service unavailability, frustrated customers. | Up to £12.5M compensation | Software bug, improper connection handling | Yahoo Finance |
| 2 | Slow Queries / Missing Indexes | Asana Outages (Feb 2025) – excessive logging caused DB and downstream service slowdowns. | Multi-hour service disruption, productivity losses | Millions in lost productivity | Unoptimized queries and logging infrastructure overload | |
| 3 | Deadlocks | Delta Air Lines Outage (Jul 2024) – software update failure triggered system crashes. | Flight delays, operational disruption | $500M estimated losses | Faulty software update, unhandled concurrent operations | New York Post |
| 4 | Lock Contention | Co-op Cyberattack (Apr 2025) – operational systems disrupted across 2,300 stores. | Store outages, supply chain delays | £275M lost revenue | Cyberattack causing DB and system resource contention | Acronis |
| 5 | Schema Changes in Production | GitLab (2017) – schema migration failure led to downtime. | Service unavailable, partial data loss | 6 hours of customer data lost, reputational damage | Uncontrolled migration, no rollback strategy | GitLab Blog |
| 6 | Data Corruption | Delta Air Lines Outage (Jul 2024) – software errors corrupted system data. | Operational disruption, loss of customer trust | $500M estimated losses | Faulty update, unvalidated data integrity | New York Post |
| 7 | Replication Lag | Asana Outages (Feb 2025) – excessive logging caused replica delays. | Inconsistent UX, service errors | Millions in productivity lost | Async replica lag under heavy logging load | |
| 8 | Unsecured Databases | Co-op Cyberattack (Apr 2025) – exposed internal DBs led to operational disruptions. | Data theft risk, system downtime | £275M lost revenue | Lack of authentication, open access points | Acronis |
| 9 | Backup & Restore Failures | GitLab (2017) – untested backups caused permanent data loss during outage. | Loss of customer trust, reputational damage | Priceless, indirect revenue loss | Unverified backup processes, no RPO/RTO drills | GitLab Blog |
| 10 | Capacity & Scaling Issues | Co-op Cyberattack (Apr 2025) – DB overload caused operational outages. | Store closures, supply chain disruption | £275M lost revenue | No auto-scaling, underestimated peak load | Acronis |