A minimal Next.js application that provides intelligent 404 page recommendations by leveraging semantic search and vector embeddings. Instead of showing dead-end 404 pages, it suggests relevant on-site content to help users find what they're looking for.
Visit better404.dev to get started instantly:
- Enter your domain
- Verify ownership with a DNS record
- Copy the snippet to your 404 page
- Done! Your site will be automatically crawled and indexed
Benefits: Zero setup, automatic updates, managed infrastructure, no maintenance
Deploy your own instance for full control:
- Requires: Kernel API key, OpenAI API key, PostgreSQL with pgvector, hosting platform
- Smart 404 Recommendations: Show relevant pages instead of dead-end 404s
- Semantic Search: Uses vector embeddings for intelligent content matching
- Easy Integration: Simple JavaScript snippet for any website
- Direct Crawling: Uses Kernel browsers to crawl and vectorize sites directly
- PostgreSQL + pgvector: Efficient vector similarity search
- OpenAI Embeddings: High-quality semantic understanding
- Content Ingestion: Your website content is crawled and vectorized using Kernel browsers
- Vector Storage: Content chunks and embeddings are stored in PostgreSQL with pgvector
- Smart Recommendations: When a 404 occurs, the system performs semantic search to find relevant pages
- User Experience: A simple snippet displays helpful suggestions instead of a dead-end page
- Frontend: Next.js App Router with TypeScript
- Database: PostgreSQL with pgvector extension for vector similarity search
- Embeddings: OpenAI API for generating semantic embeddings
- Crawling: Kernel browsers for direct web scraping and content vectorization
- Visit better404.dev
- Enter your domain (e.g.,
example.com) - Add DNS verification record:
Name: _better404.example.com Type: TXT Value: [your-site-key] - Verify ownership by clicking "Check verification"
- Copy the snippet and paste it into your 404 page
- Done! Your site will be automatically crawled and indexed
- Node.js 18+ and Bun
- PostgreSQL with pgvector extension
- OpenAI API key
- Kernel API key
- Kernel CLI installed (
brew install onkernel/tap/kernel) - Hosting platform (Vercel, Railway, etc.)
-
Clone and install dependencies:
git clone <repository-url> cd better404 bun install
-
Set up environment variables:
cp .env.example .env.local
Configure the following variables:
DATABASE_URL="postgres://user:password@localhost:5432/better404" OPENAI_API_KEY="sk-..." KERNEL_API_KEY="..." APP_BASE_URL="https://your-domain.com" TOP_N_DEFAULT="5"
-
Set up the database:
# Run the migrations to create tables and enable pgvector psql $DATABASE_URL -f migrations/001_init.sql psql $DATABASE_URL -f migrations/002_add_last_scraped_at.sql
-
Deploy the Kernel app (required for crawling):
cd src/lib/kernel-app kernel deploy index.ts --env-file ../../../.envThis deploys the web scraping and content vectorization service that crawls your website and creates embeddings for the recommendation engine.
-
Deploy to your hosting platform:
bun run build
-
Start using: Navigate to your deployed URL and follow the same steps as the hosted service
POST /api/v1/recommendations- Get 404 page recommendationsGET /api/v1/status/[domain]- Check domain indexing statusPOST /api/v1/domains- Register a new domainPOST /api/v1/domains/[id]/verify- Verify domain ownership
- Visit better404.dev
- Enter your domain and get your site key
- Add DNS verification and verify ownership
- Copy the snippet and paste it into your 404 page
- Done! Crawling happens automatically
curl -X POST https://your-domain.com/api/v1/domains \
-H "Content-Type: application/json" \
-d '{"domain": "example.com"}'HTML Version:
<div id="better404"></div>
<script>
(function(){
const siteKey = "pk_live_xxx"; // Get this from your domain registration
const url = location.href;
const ref = document.referrer || null;
fetch("https://your-domain.com/api/v1/recommendations",{
method:"POST",
headers:{"Content-Type":"application/json"},
body:JSON.stringify({siteKey,url,referrer:ref,topN:5})
}).then(r=>r.json()).then(({results})=>{
const el=document.getElementById("better404");
if(!el||!Array.isArray(results)) return;
el.innerHTML=`
<div style="margin:16px 0">
<h2 style="margin:0 0 8px">Were you looking for one of these?</h2>
<ul style="list-style:none;padding:0;margin:0;display:grid;gap:8px">
${results.map(r=>`<li><a href="${r.url}">${r.title||r.url}</a><div style="opacity:.7">${r.snippet||""}</div></li>`).join("")}
</ul>
</div>`;
}).catch(()=>{});
})();
</script>React Version:
import { Better404 } from './Better404';
export function NotFoundPage() {
return (
<div>
<h1>Page Not Found</h1>
<Better404 siteKey="pk_live_xxx" />
</div>
);
}curl -X POST https://your-domain.com/api/internal/kernel/crawl \
-H "Content-Type: application/json" \
-d '{
"domain": "example.com",
"sitemapUrl": "https://example.com/sitemap.xml"
}'The application uses PostgreSQL with the following key tables:
domains- Registered domains and their settingspages- Crawled pages with metadatachunks- Text chunks with vector embeddingsrecommendation_events- Analytics for recommendationsblocklist_rules- URL patterns to exclude
src/
app/
api/
v1/ # Public API endpoints
internal/ # Internal webhooks
lib/
kernel-app/ # Kernel app for crawling and vectorization
db.ts # Database client and helpers
embeddings.ts # Embedding provider wrapper
urls.ts # URL normalization
validation.ts # Zod schemas
bun testbun run build- Origin Validation: Checks that requests come from verified domains
- API Authentication: Kernel API calls are authenticated using public site keys
- No PII Storage: Only stores public content and metadata
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For questions and support, please open an issue on GitHub or contact the development team.