System Design Principles and Patterns
Overviewβ
System design involves creating scalable, reliable, and maintainable distributed systems. This summary covers key principles, patterns, and strategies used in modern system architecture.
Core Design Principlesβ
1. Scalabilityβ
The ability to handle increased load by adding resources to the system.
Horizontal Scaling (Scale Out)β
- Add more machines to handle increased load
- Pros: Cost-effective, fault tolerant, virtually unlimited scaling
- Cons: Complex data management, network overhead
- Examples: Web servers, microservices, NoSQL databases
Vertical Scaling (Scale Up)β
- Add more power (CPU, RAM) to existing machines
- Pros: Simple to implement, no data distribution complexity
- Cons: Expensive, single point of failure, hardware limits
- Examples: Database servers, monolithic applications
2. Reliabilityβ
System continues to work correctly even when failures occur.
Fault Tolerance Strategiesβ
βββββββββββββββββββ
β Load Balancer β
βββββββββββ¬ββββββββ
β
βββββββΌββββββ
β β β
βββββΌββ βββΌβββ ββΌββββ
βWeb 1β βWeb2β βWeb3β
βββββββ ββββββ ββββββ
β β β
βββββββΌββββββ
β
βββββββββββΌβββββββββ
β Database β
β (Master/Slave) β
ββββββββββββββββββββ
Redundancy Patternsβ
- Active-Active: Multiple active instances
- Active-Passive: Standby instances for failover
- N+1 Redundancy: N active + 1 backup
3. Availabilityβ
System remains operational over time (measured in "nines").
Availability | Downtime per Year | Downtime per Month |
---|---|---|
99% | 3.65 days | 7.31 hours |
99.9% | 8.77 hours | 43.83 minutes |
99.99% | 52.60 minutes | 4.38 minutes |
99.999% | 5.26 minutes | 26.30 seconds |
4. Consistencyβ
All nodes see the same data simultaneously.
CAP Theoremβ
You can only guarantee two of:
- Consistency: All nodes have same data
- Availability: System remains operational
- Partition Tolerance: System continues despite network failures
Consistency Modelsβ
- Strong Consistency: All reads receive the most recent write
- Eventual Consistency: System will become consistent over time
- Weak Consistency: No guarantees about when data will be consistent
Key Design Patternsβ
1. Load Balancingβ
Types of Load Balancersβ
// Round Robin Load Balancer
CLASS LoadBalancer:
INITIALIZE:
servers = list_of_servers
current_index = 0
FUNCTION get_server():
server = servers[current_index]
current_index = (current_index + 1) % LENGTH(servers)
RETURN server
// Weighted Round Robin
CLASS WeightedLoadBalancer:
INITIALIZE:
servers = list_of_servers // [{server: 'server1', weight: 3}, ...]
current_weights = MAP servers TO {server, weight, current_weight: 0}
FUNCTION get_server():
total_weight = 0
selected = null
FOR EACH server IN current_weights:
server.current_weight += server.weight
total_weight += server.weight
IF selected IS NULL OR server.current_weight > selected.current_weight:
selected = server
selected.current_weight -= total_weight
RETURN selected.server
Load Balancing Algorithmsβ
- Round Robin: Requests distributed sequentially
- Weighted Round Robin: Servers get requests based on capacity
- Least Connections: Route to server with fewest active connections
- IP Hash: Route based on client IP hash
- Geographic: Route based on client location
2. Database Design Patternsβ
Master-Slave Replicationβ
βββββββββββββββ βββββββββββββββ
β Master βββββββΆβ Slave 1 β
β (Read/Write)β β (Read Only) β
βββββββββββββββ βββββββββββββββ
β
βΌ
βββββββββββββββ βββββββββββββββ
β Slave 2 β β Slave 3 β
β (Read Only) β β (Read Only) β
βββββββββββββββ βββββββββββββββ
Benefits:
- Read scalability
- Data backup and recovery
- Analytics without affecting main database
Drawbacks:
- Write bottleneck on master
- Replication lag
- Increased complexity
Database Shardingβ
// Horizontal Sharding Strategy
CLASS DatabaseSharding:
INITIALIZE:
shards = array_of_database_connections
// Hash-based sharding
FUNCTION get_shard(key):
hash = hash_function(key)
shard_index = hash % LENGTH(shards)
RETURN shards[shard_index]
// Range-based sharding
FUNCTION get_shard_by_range(key):
IF key < 1000: RETURN shards[0]
IF key < 2000: RETURN shards[1]
RETURN shards[2]
// Directory-based sharding
FUNCTION get_shard_by_directory(key):
RETURN shard_directory.get(key)
FUNCTION hash_function(key):
// Simple hash function
hash = 0
FOR EACH char IN key:
hash = ((hash << 5) - hash) + ASCII_VALUE(char)
RETURN hash
Sharding Strategies:
- Hash-based: Distribute based on hash of key
- Range-based: Distribute based on key ranges
- Directory-based: Lookup service maps keys to shards
3. Caching Strategiesβ
Cache Patternsβ
// Cache-Aside Pattern
CLASS CacheAside:
INITIALIZE:
cache = cache_client
database = database_client
FUNCTION get(key):
// Try cache first
data = cache.get(key)
IF data EXISTS: RETURN data
// Cache miss - get from database
data = database.get(key)
IF data EXISTS:
cache.set(key, data, ttl: 3600) // Cache for 1 hour
RETURN data
FUNCTION set(key, value):
// Update database
database.set(key, value)
// Invalidate cache
cache.delete(key)
// Write-Through Cache
CLASS WriteThrough:
INITIALIZE:
cache = cache_client
database = database_client
FUNCTION set(key, value):
// Write to both cache and database
EXECUTE_PARALLEL([
cache.set(key, value),
database.set(key, value)
])
FUNCTION get(key):
// Always read from cache
RETURN cache.get(key)
// Write-Behind (Write-Back) Cache
CLASS WriteBehind:
INITIALIZE:
cache = cache_client
database = database_client
pending_writes = MAP()
batch_interval = 5000 // 5 seconds
SCHEDULE_RECURRING(flush_writes, batch_interval)
FUNCTION set(key, value):
// Write to cache immediately
cache.set(key, value)
// Queue database write
pending_writes.set(key, value)
FUNCTION flush_writes():
IF SIZE(pending_writes) == 0: RETURN
writes = CONVERT_TO_ARRAY(pending_writes.entries())
pending_writes.clear()
// Batch write to database
database.batch_set(writes)
Cache Levelsβ
Client ββΆ CDN ββΆ Load Balancer ββΆ Web Server ββΆ App Cache ββΆ Database
β β β
β β βΌ
β β Redis/Memcached
β βΌ
β Local Cache
βΌ
Browser Cache
4. Message Queue Patternsβ
Publish-Subscribe Patternβ
class PubSubSystem {
constructor() {
this.topics = new Map(); // topic -> Set of subscribers
this.messageQueue = new Map(); // topic -> Array of messages
}
subscribe(topic, subscriber) {
if (!this.topics.has(topic)) {
this.topics.set(topic, new Set());
this.messageQueue.set(topic, []);
}
this.topics.get(topic).add(subscriber);
}
publish(topic, message) {
const subscribers = this.topics.get(topic);
if (!subscribers) return;
// Immediate delivery
subscribers.forEach(subscriber => {
try {
subscriber.notify(message);
} catch (error) {
console.error('Delivery failed:', error);
// Could implement retry logic here
}
});
// Store for durability
this.messageQueue.get(topic).push({
message,
timestamp: Date.now()
});
}
unsubscribe(topic, subscriber) {
const subscribers = this.topics.get(topic);
if (subscribers) {
subscribers.delete(subscriber);
}
}
}
Message Queue Use Casesβ
- Decoupling: Services don't need direct connections
- Reliability: Messages persist until processed
- Scalability: Handle traffic spikes with queues
- Async Processing: Long-running tasks don't block responses
Microservices Architectureβ
Service Decomposition Strategiesβ
By Business Capabilityβ
E-commerce System
βββ User Service (Authentication, Profiles)
βββ Product Service (Catalog, Inventory)
βββ Order Service (Cart, Checkout, Orders)
βββ Payment Service (Billing, Transactions)
βββ Shipping Service (Logistics, Tracking)
βββ Notification Service (Email, SMS, Push)
By Data Ownershipβ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β User Service β β Product Service β β Order Service β
βββββββββββββββββββ€ βββββββββββββββββββ€ βββββββββββββββββββ€
β User DB β β Product DB β β Order DB β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Communication Patternsβ
Synchronous Communicationβ
// HTTP/REST API calls
CLASS OrderService:
INITIALIZE:
user_service = UserService()
product_service = ProductService()
payment_service = PaymentService()
FUNCTION create_order(user_id, items):
// Synchronous calls to other services
user = user_service.get_user(user_id)
FOR EACH item IN items:
product = product_service.get_product(item.product_id)
IF product.stock < item.quantity:
THROW Error('Insufficient stock')
payment = payment_service.process_payment({
user_id: user_id,
amount: calculate_total(items)
})
RETURN save_order({user_id, items, payment_id: payment.id})
Asynchronous Communicationβ
// Event-driven communication
CLASS OrderService:
INITIALIZE:
event_bus = EventBus()
// Listen for events
event_bus.subscribe('payment.completed', handle_payment_completed)
event_bus.subscribe('inventory.updated', handle_inventory_updated)
FUNCTION create_order(user_id, items):
order = save_order({user_id, items, status: 'pending'})
// Publish events for other services
event_bus.publish('order.created', {
order_id: order.id,
user_id: user_id,
items: items,
timestamp: current_timestamp()
})
RETURN order
FUNCTION handle_payment_completed(event):
update_order_status(event.order_id, 'paid')
FUNCTION handle_inventory_updated(event):
// React to inventory changes
check_order_fulfillment(event.product_id)
Performance Optimization Strategiesβ
1. Database Optimizationβ
Indexing Strategiesβ
// Primary key index (automatic)
TABLE users:
id: SERIAL (PRIMARY KEY)
email: STRING(255) (UNIQUE)
created_at: TIMESTAMP
// Single column index
CREATE INDEX idx_users_email ON users(email)
// Composite index
CREATE INDEX idx_users_status_created ON users(status, created_at)
// Partial index
CREATE INDEX idx_active_users_email ON users(email)
WHERE status = 'active'
// Covering index
CREATE INDEX idx_users_covering ON users(id, email, status)
INCLUDE (first_name, last_name)
Query Optimizationβ
// Use EXPLAIN to analyze query performance
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2023-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10
// Optimize with proper indexing and query structure
CREATE INDEX idx_users_created_at ON users(created_at)
CREATE INDEX idx_orders_user_id ON orders(user_id)
2. Connection Poolingβ
CLASS ConnectionPool:
INITIALIZE:
config = configuration
pool = []
active_connections = 0
waiting_queue = []
FUNCTION get_connection():
RETURN NEW PROMISE((resolve, reject) => {
IF LENGTH(pool) > 0:
resolve(pool.pop())
ELSE IF active_connections < config.max_connections:
create_connection().then(resolve).catch(reject)
ELSE:
waiting_queue.push({resolve, reject})
// Timeout for waiting requests
SET_TIMEOUT(() => {
index = FIND_INDEX(waiting_queue, item => item.resolve === resolve)
IF index != -1:
waiting_queue.splice(index, 1)
reject(Error('Connection timeout'))
}, config.connection_timeout)
})
FUNCTION create_connection():
active_connections++
connection = config.create_connection()
RETURN connection
}
releaseConnection(connection) {
if (this.waitingQueue.length > 0) {
const { resolve } = this.waitingQueue.shift();
resolve(connection);
} else {
this.pool.push(connection);
}
}
}
Monitoring and Observabilityβ
Three Pillars of Observabilityβ
1. Metricsβ
class MetricsCollector {
constructor() {
this.counters = new Map();
this.gauges = new Map();
this.histograms = new Map();
}
incrementCounter(name, value = 1, tags = {}) {
const key = this.buildKey(name, tags);
this.counters.set(key, (this.counters.get(key) || 0) + value);
}
setGauge(name, value, tags = {}) {
const key = this.buildKey(name, tags);
this.gauges.set(key, value);
}
recordHistogram(name, value, tags = {}) {
const key = this.buildKey(name, tags);
if (!this.histograms.has(key)) {
this.histograms.set(key, []);
}
this.histograms.get(key).push({ value, timestamp: Date.now() });
}
buildKey(name, tags) {
const tagString = Object.entries(tags)
.sort()
.map(([k, v]) => `${k}:${v}`)
.join(',');
return tagString ? `${name}{${tagString}}` : name;
}
}
// Usage
const metrics = new MetricsCollector();
metrics.incrementCounter('http_requests_total', 1, { method: 'GET', status: '200' });
metrics.setGauge('memory_usage_bytes', process.memoryUsage().heapUsed);
metrics.recordHistogram('request_duration_ms', responseTime);
2. Loggingβ
class StructuredLogger {
constructor(service, version) {
this.service = service;
this.version = version;
}
FUNCTION log(level, message, context):
log_entry = {
timestamp: current_iso_timestamp(),
level: level,
message: message,
service: service,
version: version,
request_id: context.request_id,
user_id: context.user_id,
...context
}
OUTPUT(to_json(log_entry))
FUNCTION info(message, context): log('INFO', message, context)
FUNCTION warn(message, context): log('WARN', message, context)
FUNCTION error(message, context): log('ERROR', message, context)
// Usage
logger = StructuredLogger('order-service', '1.0.0')
logger.info('Order created', {
order_id: '12345',
user_id: 'user-123',
amount: 99.99,
request_id: 'req-456'
})
3. Tracingβ
CLASS DistributedTracing:
INITIALIZE:
active_spans = MAP()
FUNCTION start_span(operation_name, parent_span_id):
span_id = generate_span_id()
trace_id = IF parent_span_id EXISTS ?
active_spans.get(parent_span_id).trace_id :
generate_trace_id()
span = {
span_id: span_id,
trace_id: trace_id,
operation_name: operation_name,
parent_span_id: parent_span_id,
start_time: current_timestamp(),
tags: MAP(),
logs: []
}
active_spans.set(span_id, span)
RETURN span_id
FUNCTION finish_span(span_id, error):
span = active_spans.get(span_id)
IF span IS NULL: RETURN
span.end_time = current_timestamp()
span.duration = span.end_time - span.start_time
IF error EXISTS:
span.tags.error = true
span.logs.ADD({
timestamp: current_timestamp(),
level: 'error',
message: error.message,
stack: error.stack
})
send_to_collector(span)
active_spans.delete(span_id)
FUNCTION add_tag(span_id, key, value):
span = active_spans.get(span_id)
IF span EXISTS:
span.tags[key] = value
FUNCTION generate_span_id():
RETURN RANDOM_STRING(9)
FUNCTION generate_trace_id():
RETURN RANDOM_STRING(16)
Security Considerationsβ
Authentication and Authorizationβ
// JWT-based authentication
class AuthService {
constructor(secretKey) {
this.secretKey = secretKey;
}
generateToken(user) {
const payload = {
userId: user.id,
email: user.email,
roles: user.roles,
iat: Math.floor(Date.now() / 1000),
exp: Math.floor(Date.now() / 1000) + (60 * 60) // 1 hour
};
return jwt.sign(payload, this.secretKey);
}
verifyToken(token) {
try {
return jwt.verify(token, this.secretKey);
} catch (error) {
throw new Error('Invalid token');
}
}
authorize(requiredRoles) {
return (req, res, next) => {
const token = req.headers.authorization?.replace('Bearer ', '');
try {
const payload = this.verifyToken(token);
const hasRole = requiredRoles.some(role => payload.roles.includes(role));
if (!hasRole) {
return res.status(403).json({ error: 'Insufficient permissions' });
}
req.user = payload;
next();
} catch (error) {
res.status(401).json({ error: 'Authentication required' });
}
};
}
}
Rate Limitingβ
class RateLimiter {
constructor(maxRequests, windowMs) {
this.maxRequests = maxRequests;
this.windowMs = windowMs;
this.requests = new Map(); // clientId -> [timestamps]
}
isAllowed(clientId) {
const now = Date.now();
const windowStart = now - this.windowMs;
if (!this.requests.has(clientId)) {
this.requests.set(clientId, []);
}
const clientRequests = this.requests.get(clientId);
// Remove expired timestamps
const validRequests = clientRequests.filter(timestamp => timestamp > windowStart);
if (validRequests.length >= this.maxRequests) {
return false;
}
validRequests.push(now);
this.requests.set(clientId, validRequests);
return true;
}
}
Conclusionβ
Effective system design requires understanding and applying these key concepts:
- Scalability: Design for growth with horizontal and vertical scaling strategies
- Reliability: Build fault-tolerant systems with proper redundancy
- Performance: Optimize through caching, database tuning, and efficient algorithms
- Security: Implement proper authentication, authorization, and rate limiting
- Observability: Monitor systems with metrics, logging, and tracing
- Consistency: Balance consistency requirements with availability needs
The choice of patterns and technologies depends on specific requirements like scale, consistency needs, team expertise, and business constraints.