URL Shortener System Design (like bit.ly)
Problem Statement
Design a URL shortening service that converts long URLs into shorter, more manageable links with high availability and performance.
Requirements
Functional Requirements
- Shorten long URLs to create unique short links
- Redirect short URLs to original long URLs
- Custom aliases for URLs (optional)
- URL expiration with TTL
- Basic analytics (click count, geographic data)
- User accounts and URL management
Non-Functional Requirements
- Availability: 99.9% uptime
- Scalability: 100M URLs shortened per day, 10B redirections
- Performance: URL redirection < 100ms
- Durability: URLs should not be lost
- Read/Write Ratio: 100:1 (heavily read-oriented)
Capacity Estimation
Traffic
- Write Operations: 100M URLs/day = ~1,160 URLs/second
- Read Operations: 10B redirections/day = ~115K redirections/second
- Peak Load: 5x average = ~575K redirections/second
Storage
- URL Data: 500 bytes per URL mapping
- 5 Years Storage: 100M × 365 × 5 × 500 bytes = ~91TB
- With Metadata: ~150TB total
Bandwidth
- Incoming: 1,160 URLs/sec × 500 bytes = ~580KB/sec
- Outgoing: 115K redirections/sec × 100 bytes = ~11.5MB/sec
High-Level Architecture
[Load Balancer] → [Web Servers] → [Application Layer] → [Cache Layer] → [Database]
↓
[Analytics Service]
↓
[Message Queue]
Core Components
1. URL Encoding Service
Base62 Encoding Strategy:
- Characters: [a-z, A-Z, 0-9] = 62 characters
- 7-character URL = 62^7 = ~3.5 trillion combinations
- Counter-based approach for uniqueness
2. Database Design
Primary URL Table:
TABLE urls:
id: BIGINT (PRIMARY KEY, AUTO_INCREMENT)
short_code: STRING(7) (UNIQUE, NOT NULL)
original_url: TEXT (NOT NULL)
user_id: BIGINT (FOREIGN KEY)
created_at: TIMESTAMP (DEFAULT: CURRENT_TIME)
expires_at: TIMESTAMP (NULLABLE)
click_count: BIGINT (DEFAULT: 0)
INDEXES:
(short_code)
(user_id)
(expires_at)
TABLE users:
id: BIGINT (PRIMARY KEY, AUTO_INCREMENT)
username: STRING(50) (UNIQUE)
email: STRING(255) (UNIQUE)
created_at: TIMESTAMP (DEFAULT: CURRENT_TIME)
TABLE analytics:
id: BIGINT (PRIMARY KEY, AUTO_INCREMENT)
short_code: STRING(7)
ip_address: STRING(45)
user_agent: TEXT
country: STRING(2)
clicked_at: TIMESTAMP (DEFAULT: CURRENT_TIME)
INDEXES:
(short_code, clicked_at)
3. Caching Strategy
- Redis Cluster for distributed caching
- Cache-Aside Pattern for URL lookups
- LRU Eviction for memory management
- Cache TTL: 24 hours for popular URLs
4. Rate Limiting
- Per-User Limits: 1000 URLs/day for free users
- Per-IP Limits: 100 requests/hour for anonymous users
- Sliding Window algorithm implementation
Detailed Questions & Answers
Q1: How do you generate unique short URLs?
Answer: There are several approaches:
Approach 1: Counter-Based (Recommended)
FUNCTION encode_base62(num):
chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
IF num == 0:
RETURN chars[0]
result = []
WHILE num > 0:
result.ADD(chars[num % 62])
num = num // 62
RETURN REVERSE_AND_JOIN(result)
// Use auto-increment ID from database
short_code = encode_base62(auto_increment_id)
Approach 2: Hash-Based
FUNCTION generate_short_url(original_url, salt):
hash_input = original_url + salt
hash_object = MD5_HASH(hash_input)
RETURN FIRST_7_CHARS(hash_object.hexdigest())
Approach 3: UUID-Based
FUNCTION generate_uuid_based():
RETURN FIRST_7_CHARS(GENERATE_UUID())
Trade-offs:
- Counter: Predictable but guaranteed unique, requires centralized counter
- Hash: Fast but collision possible, deterministic
- UUID: Random but lower collision probability
Q2: How do you handle high read traffic and ensure low latency?
Answer: Multi-layer caching strategy:
1. Browser Cache
HTTP/1.1 301 Moved Permanently
Location: https://example.com/very/long/url
Cache-Control: public, max-age=300
2. CDN Layer
- Cache redirections at edge locations
- Geographic distribution reduces latency
- Handle 80% of requests without reaching origin
3. Application Cache
cache_client = CacheClient()
FUNCTION get_original_url(short_code):
// Try cache first
cached_url = cache_client.get("url:" + short_code)
IF cached_url EXISTS:
RETURN cached_url
// Fallback to database
original_url = database.get_url(short_code)
IF original_url EXISTS:
// Cache for 24 hours
cache_client.set_with_expiry("url:" + short_code, 86400, original_url)
RETURN original_url
4. Database Optimization
- Read replicas for geographic distribution
- Proper indexing on short_code
- Connection pooling
Q3: How do you handle database scaling as the system grows?
Answer: Progressive scaling strategy:
Phase 1: Vertical Scaling
- Increase CPU, RAM, and storage
- Optimize queries and add indexes
- Good up to ~1M URLs
Phase 2: Read Replicas
CLASS DatabaseRouter:
INITIALIZE:
master = get_master_connection()
replicas = [get_replica_connection(i) FOR i IN range(3)]
FUNCTION read(query):
replica = RANDOM_CHOICE(replicas)
RETURN replica.execute(query)
FUNCTION write(query):
RETURN master.execute(query)
Phase 3: Horizontal Partitioning (Sharding)
FUNCTION get_shard_id(short_code):
RETURN HASH(short_code) % NUM_SHARDS
FUNCTION get_database_shard(short_code):
shard_id = get_shard_id(short_code)
RETURN database_connections[shard_id]
Sharding Strategies:
- Range-based: Partition by short_code prefix
- Hash-based: Hash short_code to determine shard
- Directory-based: Lookup service to find correct shard
Q4: How do you implement analytics without affecting performance?
Answer: Asynchronous analytics processing:
1. Event-Driven Architecture
CLASS AnalyticsCollector:
INITIALIZE:
message_producer = MessageProducer([
servers: ['kafka1:9092', 'kafka2:9092']
])
FUNCTION log_click(short_code, request_data):
event = {
short_code: short_code,
ip_address: request_data.remote_addr,
user_agent: request_data.headers.get('User-Agent'),
timestamp: current_timestamp(),
country: get_country(request_data.remote_addr)
}
// Non-blocking send
message_producer.send('url_clicks', event)
2. Batch Processing
CLASS AnalyticsProcessor:
FUNCTION process_clicks_batch(events):
// Batch insert to reduce database load
click_records = []
FOR EACH event IN events:
click_records.ADD({
short_code: event.short_code,
ip_address: event.ip_address,
country: event.country,
clicked_at: convert_timestamp(event.timestamp)
})
// Bulk insert every 1000 records or 30 seconds
database.bulk_insert('analytics', click_records)
// Update click counts in cache
FOR EACH record IN click_records:
cache_client.increment("clicks:" + record.short_code)
3. Real-time Aggregations
// Use cache for real-time counters
FUNCTION increment_click_count(short_code):
batch_operations = cache_client.create_pipeline()
batch_operations.increment("clicks:total:" + short_code)
batch_operations.increment("clicks:daily:" + short_code + ":" + today)
batch_operations.increment("clicks:hourly:" + short_code + ":" + current_hour)
batch_operations.execute()
Q5: How do you handle URL expiration and cleanup?
Answer: Multi-layered expiration strategy:
1. Application-Level Check
FUNCTION get_url_with_expiration_check(short_code):
url_data = database.get_url(short_code)
IF url_data IS NULL:
THROW URLNotFound()
IF url_data.expires_at EXISTS AND current_timestamp() > url_data.expires_at:
// Mark as expired but don't delete immediately
database.mark_expired(short_code)
THROW URLExpired()
RETURN url_data.original_url
2. Background Cleanup Service
CLASS ExpirationCleanupService:
INITIALIZE:
batch_size = 1000
FUNCTION cleanup_expired_urls():
// Find expired URLs
expired_urls = database.query("
SELECT short_code FROM urls
WHERE expires_at < CURRENT_TIME()
AND deleted_at IS NULL
LIMIT batch_size
")
FOR EACH batch IN CHUNKS(expired_urls, 100):
// Soft delete - mark as deleted but keep for analytics
database.soft_delete_batch(batch)
// Remove from cache
cache_keys = ["url:" + code FOR code IN batch]
cache_client.delete_multiple(cache_keys)
3. Database-Level TTL (for some databases)
// For databases that support TTL
TABLE temp_urls:
short_code: STRING(7) (PRIMARY KEY)
original_url: TEXT
expires_at: TIMESTAMP
TTL_COLUMN: expires_at
Q6: How do you prevent abuse and implement security measures?
Answer: Multi-layered security approach:
1. Rate Limiting
class RateLimiter:
def __init__(self, redis_client):
self.redis = redis_client
def is_allowed(self, identifier, limit, window):
key = f"rate_limit:{identifier}:{window}"
current = self.redis.get(key) or 0
if int(current) >= limit:
return False
pipeline = self.redis.pipeline()
pipeline.incr(key)
pipeline.expire(key, window)
pipeline.execute()
return True
2. URL Validation
import validators
from urllib.parse import urlparse
def validate_url(url):
# Basic format validation
if not validators.url(url):
raise InvalidURLError("Invalid URL format")
parsed = urlparse(url)
# Block dangerous protocols
if parsed.scheme not in ['http', 'https']:
raise InvalidURLError("Only HTTP/HTTPS URLs allowed")
# Block internal/private networks
if is_internal_ip(parsed.netloc):
raise InvalidURLError("Cannot shorten internal URLs")
# Check against malicious URL database
if is_malicious_url(url):
raise MaliciousURLError("URL flagged as malicious")
3. CAPTCHA for High Volume Users
def create_short_url(original_url, user_info):
if user_info.daily_count > 100: # High volume user
if not verify_captcha(user_info.captcha_response):
raise CaptchaRequired("Please complete CAPTCHA")
return generate_short_url(original_url)
Q7: How would you design the system to handle custom aliases?
Answer: Enhanced database schema and validation:
1. Database Schema Update
CREATE TABLE urls (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
short_code VARCHAR(50) UNIQUE NOT NULL, -- Increased length
original_url TEXT NOT NULL,
user_id BIGINT,
is_custom BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
expires_at TIMESTAMP,
UNIQUE KEY unique_short_code (short_code)
);
2. Custom Alias Validation
import re
class CustomAliasValidator:
def __init__(self):
self.reserved_words = {'api', 'admin', 'www', 'app', 'help'}
self.pattern = re.compile(r'^[a-zA-Z0-9_-]{3,50}$')
def validate_custom_alias(self, alias):
if not self.pattern.match(alias):
raise ValueError("Invalid characters or length")
if alias.lower() in self.reserved_words:
raise ValueError("Reserved word not allowed")
if self.is_alias_taken(alias):
raise ValueError("Alias already taken")
return True
def is_alias_taken(self, alias):
return database.exists('urls', short_code=alias)
3. Priority System
def create_url_with_alias(original_url, custom_alias=None, user_id=None):
if custom_alias:
validator = CustomAliasValidator()
validator.validate_custom_alias(custom_alias)
return database.create_url(
short_code=custom_alias,
original_url=original_url,
user_id=user_id,
is_custom=True
)
else:
# Generate automatic short code
auto_id = database.get_next_id()
short_code = encode_base62(auto_id)
return database.create_url(
short_code=short_code,
original_url=original_url,
user_id=user_id,
is_custom=False
)
Advanced Scenarios
Scenario 1: Black Friday Traffic Spike
Challenge: Handle 10x normal traffic during peak shopping days.
Solution:
- Auto-scaling: Implement horizontal pod autoscaling
- Circuit Breakers: Fail fast when database is overwhelmed
- Degraded Mode: Serve cached responses only during extreme load
- Queue-based Processing: Buffer URL creation requests
Scenario 2: Database Failover
Challenge: Primary database goes down during peak hours.
Solution:
- Health Checks: Continuous monitoring of database health
- Automatic Failover: Promote read replica to master
- Connection Retry Logic: Exponential backoff for failed connections
- Split-brain Prevention: Use consensus algorithms for leader election
Scenario 3: Regional Compliance
Challenge: Data residency requirements for EU users (GDPR).
Solution:
- Geographic Sharding: Route EU users to EU data centers
- Data Classification: Tag URLs with user region
- Right to be Forgotten: Implement complete data deletion APIs
- Cross-region Replication: Backup compliance for disaster recovery
This comprehensive design covers all major aspects of building a scalable URL shortener with detailed solutions to common interview questions and scenarios.