Minimalist Solution Architect

Core Concepts
Use Cases
Templates

Objectives :

Learn the difference between vertical and horizental scaling
Learn how to distribute workloads with load balancers
Learn how the difference between consistency and availability depending on your business needs
Learn how indexing works and how it is important to improve your database performance
Understand the different communication protocols and learn to choose one for your use case
Understand the basic security principles to secure your architecture
Learn the basics monitoring principles to keep and maintain your infrastructure operations
Learn the basics principles to control your cloud infrastructure costs

3. CAP Theorem

The Problem

In the previous section we emphasized the importance of having load balancer to handle and distribute requests accross multiple nodes/servers.

One of the challenges with this distributed architecture, is how you can ensure consistency, reliability and availability of your data across multiple nodes/servers.

In the following picture, userA update in the database the field "function" on Node-01, but at the same time, userB is sending a read request to get the content of the field "function".

What will be the content sent to userB "cloud engineer" or "cloud architect" ??

A figure to illustrate the problem.

Here comes the CAP Theorem principle to address this challenge.

What is the CAP Theorem ?

CAP means (Consistency, Availability and Partition Tolerance).

Consistency : the data are the same accross all the nodes, there is no difference. If userA updates the function field, and userB reads the data from another node, he must see the same updated data or get an error.

Availability : the data must be accessible at all time, meaning if Node-02 is offline or in failure mode, userB must be able to access the data from another available node.

Partition Tolerance : the system must continue to work even in case of network failures. If Node01 and Node02 can't synchronize their data, they should still respond to user requests.

The CAP theorem implies that the 3 properties cannot be implemented at the same time. System designer must choose between the 3 possible combinations (CA, AP, CP)

CA (Consistency + Availability): In this scenario, the system is consistent and available all the time. But if there is any partition happening (network disconnections or interruptions), consistency is lost.

AP (Availability + Partition Tolerance): In this scenario, the data is always available, but there is consistency if there is a partition (network failures)

CP (Consistency + Partition Tolerance): In this scenario, we make a priority on consistency, userA and userB must see the same data. This implies, that the system won't be available, during the maintenance phase (repairing the network communication) and ensuring the data is re-synchronized between nodes.

Real-world examples

CP (Prioritizing consistency over availability)

Financial transactions : In the financial industries, banks or stock markets, if a user makes a transaction, the system must be sure that there is not partition between nodes to prevent duplicate transactions. So consistency will be prioritized over availability. If there is a partition, the system must reject the transaction to ensure consistency.

In the following scenario example, the system is considered healthy, when a user sends a request, the system accepts the transaction.

In this scenario, the system is considered unhealthy, when a user sends a request, the system rejects the transaction, to prevent double transactions.

Inventory systems : For online ecommerce, it's crucial to keep consistency, especially to keep track of the inventory; a product cannot be listed if it's no longer available to prevent user to place an order on an unavailable product (over-selling).

In this scenario, the system is healthy, the product can be listed, and the user can place an order; the inventory is consistent across nodes.

In this scenario, the system is unhealthy, the product cannot be listed preventing a user to place an order on non-existing product. The inventory is not consistent across nodes.

AP (Prioritizing availability over consistency)

Social media platforms : In social media, companies like Facebook, Twitter, Linkedin, ensure that the platform is always available, even if there is slight delay of data. If a user send a post, and there is partition between the nodes (de-synchronization between nodes), it's not a real issue, user can wait, the transactions will be validated later.

In this scenario, the system is healthy, users can comments, post, message; all the data are synchronized, others users see the updates at the same time.

In this scenario, the system is unhealthy, users can still comments, post, message; but the data are not synchronized, there is a slight delay, which is tolerable.

CDN (Content Delivery Network) : CDN server static data, and that data must be always available even if the data is not the same between servers.