URL Shortener System Design (like bit.ly)

Problem Statement

Design a URL shortening service that converts long URLs into shorter, more manageable links with high availability and performance.

Requirements

Functional Requirements

Shorten long URLs to create unique short links
Redirect short URLs to original long URLs
Custom aliases for URLs (optional)
URL expiration with TTL
Basic analytics (click count, geographic data)
User accounts and URL management

Non-Functional Requirements

Availability: 99.9% uptime
Scalability: 100M URLs shortened per day, 10B redirections
Performance: URL redirection < 100ms
Durability: URLs should not be lost
Read/Write Ratio: 100:1 (heavily read-oriented)

Capacity Estimation

Traffic

Write Operations: 100M URLs/day = ~1,160 URLs/second
Read Operations: 10B redirections/day = ~115K redirections/second
Peak Load: 5x average = ~575K redirections/second

Storage

URL Data: 500 bytes per URL mapping
5 Years Storage: 100M × 365 × 5 × 500 bytes = ~91TB
With Metadata: ~150TB total

Bandwidth

Incoming: 1,160 URLs/sec × 500 bytes = ~580KB/sec
Outgoing: 115K redirections/sec × 100 bytes = ~11.5MB/sec

High-Level Architecture

[Load Balancer] → [Web Servers] → [Application Layer] → [Cache Layer] → [Database]
                      ↓
                [Analytics Service]
                      ↓
                [Message Queue]

Core Components

1. URL Encoding Service

Base62 Encoding Strategy:

Characters: [a-z, A-Z, 0-9] = 62 characters
7-character URL = 62^7 = ~3.5 trillion combinations
Counter-based approach for uniqueness

2. Database Design

Primary URL Table:

TABLE urls:
    id: BIGINT (PRIMARY KEY, AUTO_INCREMENT)
    short_code: STRING(7) (UNIQUE, NOT NULL)
    original_url: TEXT (NOT NULL)
    user_id: BIGINT (FOREIGN KEY)
    created_at: TIMESTAMP (DEFAULT: CURRENT_TIME)
    expires_at: TIMESTAMP (NULLABLE)
    click_count: BIGINT (DEFAULT: 0)
    
    INDEXES:
        (short_code)
        (user_id)
        (expires_at)

TABLE users:
    id: BIGINT (PRIMARY KEY, AUTO_INCREMENT)
    username: STRING(50) (UNIQUE)
    email: STRING(255) (UNIQUE)
    created_at: TIMESTAMP (DEFAULT: CURRENT_TIME)

TABLE analytics:
    id: BIGINT (PRIMARY KEY, AUTO_INCREMENT)
    short_code: STRING(7)
    ip_address: STRING(45)
    user_agent: TEXT
    country: STRING(2)
    clicked_at: TIMESTAMP (DEFAULT: CURRENT_TIME)
    
    INDEXES:
        (short_code, clicked_at)

3. Caching Strategy

Redis Cluster for distributed caching
Cache-Aside Pattern for URL lookups
LRU Eviction for memory management
Cache TTL: 24 hours for popular URLs

4. Rate Limiting

Per-User Limits: 1000 URLs/day for free users
Per-IP Limits: 100 requests/hour for anonymous users
Sliding Window algorithm implementation

Detailed Questions & Answers

Q1: How do you generate unique short URLs?

Answer: There are several approaches:

Approach 1: Counter-Based (Recommended)

FUNCTION encode_base62(num):
    chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    IF num == 0:
        RETURN chars[0]
    
    result = []
    WHILE num > 0:
        result.ADD(chars[num % 62])
        num = num // 62
    
    RETURN REVERSE_AND_JOIN(result)

// Use auto-increment ID from database
short_code = encode_base62(auto_increment_id)

Approach 2: Hash-Based

FUNCTION generate_short_url(original_url, salt):
    hash_input = original_url + salt
    hash_object = MD5_HASH(hash_input)
    RETURN FIRST_7_CHARS(hash_object.hexdigest())

Approach 3: UUID-Based

FUNCTION generate_uuid_based():
    RETURN FIRST_7_CHARS(GENERATE_UUID())

Trade-offs:

Counter: Predictable but guaranteed unique, requires centralized counter
Hash: Fast but collision possible, deterministic
UUID: Random but lower collision probability

Q2: How do you handle high read traffic and ensure low latency?

Answer: Multi-layer caching strategy:

1. Browser Cache

HTTP/1.1 301 Moved Permanently
Location: https://example.com/very/long/url
Cache-Control: public, max-age=300

2. CDN Layer

Cache redirections at edge locations
Geographic distribution reduces latency
Handle 80% of requests without reaching origin

3. Application Cache

cache_client = CacheClient()

FUNCTION get_original_url(short_code):
    // Try cache first
    cached_url = cache_client.get("url:" + short_code)
    IF cached_url EXISTS:
        RETURN cached_url
    
    // Fallback to database
    original_url = database.get_url(short_code)
    IF original_url EXISTS:
        // Cache for 24 hours
        cache_client.set_with_expiry("url:" + short_code, 86400, original_url)
    
    RETURN original_url

4. Database Optimization

Read replicas for geographic distribution
Proper indexing on short_code
Connection pooling

Q3: How do you handle database scaling as the system grows?

Answer: Progressive scaling strategy:

Phase 1: Vertical Scaling

Increase CPU, RAM, and storage
Optimize queries and add indexes
Good up to ~1M URLs

Phase 2: Read Replicas

CLASS DatabaseRouter:
    INITIALIZE:
        master = get_master_connection()
        replicas = [get_replica_connection(i) FOR i IN range(3)]
    
    FUNCTION read(query):
        replica = RANDOM_CHOICE(replicas)
        RETURN replica.execute(query)
    
    FUNCTION write(query):
        RETURN master.execute(query)

Phase 3: Horizontal Partitioning (Sharding)

FUNCTION get_shard_id(short_code):
    RETURN HASH(short_code) % NUM_SHARDS

FUNCTION get_database_shard(short_code):
    shard_id = get_shard_id(short_code)
    RETURN database_connections[shard_id]

Sharding Strategies:

Range-based: Partition by short_code prefix
Hash-based: Hash short_code to determine shard
Directory-based: Lookup service to find correct shard

Q4: How do you implement analytics without affecting performance?

Answer: Asynchronous analytics processing:

1. Event-Driven Architecture

CLASS AnalyticsCollector:
    INITIALIZE:
        message_producer = MessageProducer([
            servers: ['kafka1:9092', 'kafka2:9092']
        ])
    
    FUNCTION log_click(short_code, request_data):
        event = {
            short_code: short_code,
            ip_address: request_data.remote_addr,
            user_agent: request_data.headers.get('User-Agent'),
            timestamp: current_timestamp(),
            country: get_country(request_data.remote_addr)
        }
        
        // Non-blocking send
        message_producer.send('url_clicks', event)

2. Batch Processing

CLASS AnalyticsProcessor:
    FUNCTION process_clicks_batch(events):
        // Batch insert to reduce database load
        click_records = []
        FOR EACH event IN events:
            click_records.ADD({
                short_code: event.short_code,
                ip_address: event.ip_address,
                country: event.country,
                clicked_at: convert_timestamp(event.timestamp)
            })
        
        // Bulk insert every 1000 records or 30 seconds
        database.bulk_insert('analytics', click_records)
        
        // Update click counts in cache
        FOR EACH record IN click_records:
            cache_client.increment("clicks:" + record.short_code)

3. Real-time Aggregations

// Use cache for real-time counters
FUNCTION increment_click_count(short_code):
    batch_operations = cache_client.create_pipeline()
    batch_operations.increment("clicks:total:" + short_code)
    batch_operations.increment("clicks:daily:" + short_code + ":" + today)
    batch_operations.increment("clicks:hourly:" + short_code + ":" + current_hour)
    batch_operations.execute()

Q5: How do you handle URL expiration and cleanup?

Answer: Multi-layered expiration strategy:

1. Application-Level Check

FUNCTION get_url_with_expiration_check(short_code):
    url_data = database.get_url(short_code)
    
    IF url_data IS NULL:
        THROW URLNotFound()
    
    IF url_data.expires_at EXISTS AND current_timestamp() > url_data.expires_at:
        // Mark as expired but don't delete immediately
        database.mark_expired(short_code)
        THROW URLExpired()
    
    RETURN url_data.original_url

2. Background Cleanup Service

CLASS ExpirationCleanupService:
    INITIALIZE:
        batch_size = 1000
        
    FUNCTION cleanup_expired_urls():
        // Find expired URLs
        expired_urls = database.query("
            SELECT short_code FROM urls 
            WHERE expires_at < CURRENT_TIME() 
            AND deleted_at IS NULL 
            LIMIT batch_size
        ")
        
        FOR EACH batch IN CHUNKS(expired_urls, 100):
            // Soft delete - mark as deleted but keep for analytics
            database.soft_delete_batch(batch)
            
            // Remove from cache
            cache_keys = ["url:" + code FOR code IN batch]
            cache_client.delete_multiple(cache_keys)

3. Database-Level TTL (for some databases)

// For databases that support TTL
TABLE temp_urls:
    short_code: STRING(7) (PRIMARY KEY)
    original_url: TEXT
    expires_at: TIMESTAMP
    
    TTL_COLUMN: expires_at

Q6: How do you prevent abuse and implement security measures?

Answer: Multi-layered security approach:

1. Rate Limiting

class RateLimiter:
    def __init__(self, redis_client):
        self.redis = redis_client
    
    def is_allowed(self, identifier, limit, window):
        key = f"rate_limit:{identifier}:{window}"
        current = self.redis.get(key) or 0
        
        if int(current) >= limit:
            return False
            
        pipeline = self.redis.pipeline()
        pipeline.incr(key)
        pipeline.expire(key, window)
        pipeline.execute()
        return True

2. URL Validation

import validators
from urllib.parse import urlparse

def validate_url(url):
    # Basic format validation
    if not validators.url(url):
        raise InvalidURLError("Invalid URL format")
    
    parsed = urlparse(url)
    
    # Block dangerous protocols
    if parsed.scheme not in ['http', 'https']:
        raise InvalidURLError("Only HTTP/HTTPS URLs allowed")
    
    # Block internal/private networks
    if is_internal_ip(parsed.netloc):
        raise InvalidURLError("Cannot shorten internal URLs")
    
    # Check against malicious URL database
    if is_malicious_url(url):
        raise MaliciousURLError("URL flagged as malicious")

3. CAPTCHA for High Volume Users

def create_short_url(original_url, user_info):
    if user_info.daily_count > 100:  # High volume user
        if not verify_captcha(user_info.captcha_response):
            raise CaptchaRequired("Please complete CAPTCHA")
    
    return generate_short_url(original_url)

Q7: How would you design the system to handle custom aliases?

Answer: Enhanced database schema and validation:

1. Database Schema Update

CREATE TABLE urls (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    short_code VARCHAR(50) UNIQUE NOT NULL,  -- Increased length
    original_url TEXT NOT NULL,
    user_id BIGINT,
    is_custom BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP,
    UNIQUE KEY unique_short_code (short_code)
);

2. Custom Alias Validation

import re

class CustomAliasValidator:
    def __init__(self):
        self.reserved_words = {'api', 'admin', 'www', 'app', 'help'}
        self.pattern = re.compile(r'^[a-zA-Z0-9_-]{3,50}$')
    
    def validate_custom_alias(self, alias):
        if not self.pattern.match(alias):
            raise ValueError("Invalid characters or length")
        
        if alias.lower() in self.reserved_words:
            raise ValueError("Reserved word not allowed")
        
        if self.is_alias_taken(alias):
            raise ValueError("Alias already taken")
        
        return True
    
    def is_alias_taken(self, alias):
        return database.exists('urls', short_code=alias)

3. Priority System

def create_url_with_alias(original_url, custom_alias=None, user_id=None):
    if custom_alias:
        validator = CustomAliasValidator()
        validator.validate_custom_alias(custom_alias)
        
        return database.create_url(
            short_code=custom_alias,
            original_url=original_url,
            user_id=user_id,
            is_custom=True
        )
    else:
        # Generate automatic short code
        auto_id = database.get_next_id()
        short_code = encode_base62(auto_id)
        
        return database.create_url(
            short_code=short_code,
            original_url=original_url,
            user_id=user_id,
            is_custom=False
        )

Advanced Scenarios

Scenario 1: Black Friday Traffic Spike

Challenge: Handle 10x normal traffic during peak shopping days.

Solution:

Auto-scaling: Implement horizontal pod autoscaling
Circuit Breakers: Fail fast when database is overwhelmed
Degraded Mode: Serve cached responses only during extreme load
Queue-based Processing: Buffer URL creation requests

Scenario 2: Database Failover

Challenge: Primary database goes down during peak hours.

Solution:

Health Checks: Continuous monitoring of database health
Automatic Failover: Promote read replica to master
Connection Retry Logic: Exponential backoff for failed connections
Split-brain Prevention: Use consensus algorithms for leader election

Scenario 3: Regional Compliance

Challenge: Data residency requirements for EU users (GDPR).

Solution:

Geographic Sharding: Route EU users to EU data centers
Data Classification: Tag URLs with user region
Right to be Forgotten: Implement complete data deletion APIs
Cross-region Replication: Backup compliance for disaster recovery

This comprehensive design covers all major aspects of building a scalable URL shortener with detailed solutions to common interview questions and scenarios.

Problem Statement​

Requirements​

Functional Requirements​

Non-Functional Requirements​

Capacity Estimation​

Traffic​

Storage​

Bandwidth​

High-Level Architecture​

Core Components​

1. URL Encoding Service​

2. Database Design​

3. Caching Strategy​

4. Rate Limiting​

Detailed Questions & Answers​

Q1: How do you generate unique short URLs?​

Q2: How do you handle high read traffic and ensure low latency?​

Q3: How do you handle database scaling as the system grows?​

Q4: How do you implement analytics without affecting performance?​

Q5: How do you handle URL expiration and cleanup?​

Q6: How do you prevent abuse and implement security measures?​

Q7: How would you design the system to handle custom aliases?​

Advanced Scenarios​

Scenario 1: Black Friday Traffic Spike​

Scenario 2: Database Failover​

Scenario 3: Regional Compliance​

Problem Statement

Requirements

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Traffic

Storage

Bandwidth

High-Level Architecture

Core Components

1. URL Encoding Service

2. Database Design

3. Caching Strategy

4. Rate Limiting

Detailed Questions & Answers

Q1: How do you generate unique short URLs?

Q2: How do you handle high read traffic and ensure low latency?

Q3: How do you handle database scaling as the system grows?

Q4: How do you implement analytics without affecting performance?

Q5: How do you handle URL expiration and cleanup?

Q6: How do you prevent abuse and implement security measures?

Q7: How would you design the system to handle custom aliases?

Advanced Scenarios

Scenario 1: Black Friday Traffic Spike

Scenario 2: Database Failover

Scenario 3: Regional Compliance