Note: check URL_SHORTENER_FLOW.md for some information regarding the encode and decode flows
- Docker and Docker Compose installed
- Git
- Clone the repository:
git clone https://github.com/oramadn/short-link.git
cd short-link- Install version 4.0.0 of ruby if not already present
rbenv install 4.0.0
rbenv local 4.0.0- Build and start the containers:
docker-compose up --buildNote: .env file is included in the repo for easier setup
- In a new terminal, set up the database:
docker-compose exec web bin/rails db:create db:migrate- Access the application at
http://0.0.0.0:3000
The application runs the following services:
- Web: Rails server on port 3000
- CSS: Tailwind CSS watcher for live stylesheet compilation
- Database: PostgreSQL on port 5432
- Redis: Redis server on port 6337
- Sidekiq: Background job processor
Stop all services:
docker-compose downView logs:
docker-compose logs -fRun Rails commands:
docker-compose exec web bin/rails <command>Access Rails console:
docker-compose exec web bin/rails consoleRunning tests:
docker-compose exec web bin/rails db:test:prepare
docker-compose exec web bin/rails testIf you see "watchman not found" warnings in the CSS service logs, this is expected and can be safely ignored. The CSS watcher will function correctly using an alternative file watcher.
Introduce a column called long_url_hash. Hash any provided URL using SHA-256 and store the result in this column. This allows fast lookups on a fixed-length value instead of comparing strings that can be 1,000–2,000 characters long.
Write-through caching is the better option. At scale, a viral shortened URL is expected to be requested by hundreds of users at once, if not more. If we rely on cache misses, many of those requests will hit the database unnecessarily. Instead, we push the short URL to Redis on write and give it a TTL of 24–48 hours.
Array lookups have a time complexity of O(n), while hashmap lookups are O(1). Using a hashmap results in faster decoding.
A URL shortener will typically have far more reads than writes — roughly 100 reads for every 1 write. To handle this load, we have two main approaches:
- Read replicas: Encode requests go to the primary database, while decode requests go to read replicas. This reduces load on the primary but increases infrastructure cost and adds replication delay.
- Sharding: Split the load by assigning subsets of short codes to different databases. For example, links starting with a–m go to DB-1 and n–z go to DB-2, etc. This distributes load, but a viral link can still create hotspots on a single shard. The Redis layer can be sharded as well.
Currently, ID generation is a single point of failure and a performance bottleneck due to the lack of async operations. In a multi-region setup, two databases could also attempt to issue the same ID.
To resolve this, we can use a distributed ID generator such as Twitter Snowflake. These systems guarantee unique IDs per request and also solve predictability issues where attackers can guess links. This does add complexity, as a separate microservice must be deployed and maintained.
The simplest (but most expensive) approach is horizontal scaling: add more Rails containers for the web server and Sidekiq, and place NGINX in front to load balance across them.
This project currently uses a counter to avoid collisions. In its current form, an attacker could iterate through all links by requesting shortlink.com/1 through /99999, or any range they choose.
This allows an attacker to effectively scrape the entire URL database, including links intended to be private.
Mitigation: ID shuffling or large prime offsets. Twitter uses a similar approach where IDs are distributed in a pseudo-random manner.
The decode endpoint checks Redis first and then falls back to the database. An attacker could spam non-existing codes, forcing repeated database hits and potentially degrading performance or causing an outage.
Mitigation: Negative caching can be used. Cache “not found” results in Redis for ~60 seconds to prevent repeated misses from hitting the database.
Attackers can shorten malicious phishing links. If this happens frequently and the links are shared via email or other channels, Google and ISPs may start blocking the ShortLink domain entirely.
Mitigation: Cloudflare provides a service that checks the safety of submitted links. This adds some latency to the workflow but significantly improves safety.
An attacker can spam requests to overload the database, Redis, or the job queue.
Mitigation: a rate limiter should be added so each IP address is limited to X requests per minute.