Skip to content

AI-powered 404 page recommendations that help users find the right content instead of dead ends.

Notifications You must be signed in to change notification settings

slashml/better404

 
 

Repository files navigation

Better404

A minimal Next.js application that provides intelligent 404 page recommendations by leveraging semantic search and vector embeddings. Instead of showing dead-end 404 pages, it suggests relevant on-site content to help users find what they're looking for.

Two Ways to Use Better404

🌐 Hosted Service

Visit better404.dev to get started instantly:

  1. Enter your domain
  2. Verify ownership with a DNS record
  3. Copy the snippet to your 404 page
  4. Done! Your site will be automatically crawled and indexed

Benefits: Zero setup, automatic updates, managed infrastructure, no maintenance

🛠️ Self-Hosted

Deploy your own instance for full control:

  • Requires: Kernel API key, OpenAI API key, PostgreSQL with pgvector, hosting platform

Features

  • Smart 404 Recommendations: Show relevant pages instead of dead-end 404s
  • Semantic Search: Uses vector embeddings for intelligent content matching
  • Easy Integration: Simple JavaScript snippet for any website
  • Direct Crawling: Uses Kernel browsers to crawl and vectorize sites directly
  • PostgreSQL + pgvector: Efficient vector similarity search
  • OpenAI Embeddings: High-quality semantic understanding

How It Works

  1. Content Ingestion: Your website content is crawled and vectorized using Kernel browsers
  2. Vector Storage: Content chunks and embeddings are stored in PostgreSQL with pgvector
  3. Smart Recommendations: When a 404 occurs, the system performs semantic search to find relevant pages
  4. User Experience: A simple snippet displays helpful suggestions instead of a dead-end page

Architecture

  • Frontend: Next.js App Router with TypeScript
  • Database: PostgreSQL with pgvector extension for vector similarity search
  • Embeddings: OpenAI API for generating semantic embeddings
  • Crawling: Kernel browsers for direct web scraping and content vectorization

Quick Start

Option 1: Use the Hosted Service (Easiest)

  1. Visit better404.dev
  2. Enter your domain (e.g., example.com)
  3. Add DNS verification record:
    Name:    _better404.example.com
    Type:    TXT
    Value:   [your-site-key]
    
  4. Verify ownership by clicking "Check verification"
  5. Copy the snippet and paste it into your 404 page
  6. Done! Your site will be automatically crawled and indexed

Option 2: Self-Hosted Deployment

Prerequisites

  • Node.js 18+ and Bun
  • PostgreSQL with pgvector extension
  • OpenAI API key
  • Kernel API key
  • Kernel CLI installed (brew install onkernel/tap/kernel)
  • Hosting platform (Vercel, Railway, etc.)

Installation

  1. Clone and install dependencies:

    git clone <repository-url>
    cd better404
    bun install
  2. Set up environment variables:

    cp .env.example .env.local

    Configure the following variables:

    DATABASE_URL="postgres://user:password@localhost:5432/better404"
    OPENAI_API_KEY="sk-..."
    KERNEL_API_KEY="..."
    APP_BASE_URL="https://your-domain.com"
    TOP_N_DEFAULT="5"
  3. Set up the database:

    # Run the migrations to create tables and enable pgvector
    psql $DATABASE_URL -f migrations/001_init.sql
    psql $DATABASE_URL -f migrations/002_add_last_scraped_at.sql
  4. Deploy the Kernel app (required for crawling):

    cd src/lib/kernel-app
    kernel deploy index.ts --env-file ../../../.env

    This deploys the web scraping and content vectorization service that crawls your website and creates embeddings for the recommendation engine.

  5. Deploy to your hosting platform:

    bun run build
  6. Start using: Navigate to your deployed URL and follow the same steps as the hosted service

API Endpoints

Public API

  • POST /api/v1/recommendations - Get 404 page recommendations
  • GET /api/v1/status/[domain] - Check domain indexing status
  • POST /api/v1/domains - Register a new domain
  • POST /api/v1/domains/[id]/verify - Verify domain ownership

Integration

Using the Hosted Service

  1. Visit better404.dev
  2. Enter your domain and get your site key
  3. Add DNS verification and verify ownership
  4. Copy the snippet and paste it into your 404 page
  5. Done! Crawling happens automatically

Self-Hosted Integration

1. Register Your Domain

curl -X POST https://your-domain.com/api/v1/domains \
  -H "Content-Type: application/json" \
  -d '{"domain": "example.com"}'

2. Add the Snippet to Your 404 Page

HTML Version:

<div id="better404"></div>
<script>
(function(){
  const siteKey = "pk_live_xxx"; // Get this from your domain registration
  const url = location.href;
  const ref = document.referrer || null;
  fetch("https://your-domain.com/api/v1/recommendations",{
    method:"POST",
    headers:{"Content-Type":"application/json"},
    body:JSON.stringify({siteKey,url,referrer:ref,topN:5})
  }).then(r=>r.json()).then(({results})=>{
    const el=document.getElementById("better404");
    if(!el||!Array.isArray(results)) return;
    el.innerHTML=`
      <div style="margin:16px 0">
        <h2 style="margin:0 0 8px">Were you looking for one of these?</h2>
        <ul style="list-style:none;padding:0;margin:0;display:grid;gap:8px">
          ${results.map(r=>`<li><a href="${r.url}">${r.title||r.url}</a><div style="opacity:.7">${r.snippet||""}</div></li>`).join("")}
        </ul>
      </div>`;
  }).catch(()=>{});
})();
</script>

React Version:

import { Better404 } from './Better404';

export function NotFoundPage() {
  return (
    <div>
      <h1>Page Not Found</h1>
      <Better404 siteKey="pk_live_xxx" />
    </div>
  );
}

3. Start Content Crawling

curl -X POST https://your-domain.com/api/internal/kernel/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "example.com",
    "sitemapUrl": "https://example.com/sitemap.xml"
  }'

Database Schema

The application uses PostgreSQL with the following key tables:

  • domains - Registered domains and their settings
  • pages - Crawled pages with metadata
  • chunks - Text chunks with vector embeddings
  • recommendation_events - Analytics for recommendations
  • blocklist_rules - URL patterns to exclude

Development

Project Structure

src/
  app/
    api/
      v1/                    # Public API endpoints
      internal/              # Internal webhooks
  lib/
    kernel-app/             # Kernel app for crawling and vectorization
    db.ts                   # Database client and helpers
    embeddings.ts           # Embedding provider wrapper
    urls.ts                 # URL normalization
    validation.ts           # Zod schemas

Running Tests

bun test

Building for Production

bun run build

Security

  • Origin Validation: Checks that requests come from verified domains
  • API Authentication: Kernel API calls are authenticated using public site keys
  • No PII Storage: Only stores public content and metadata

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For questions and support, please open an issue on GitHub or contact the development team.

About

AI-powered 404 page recommendations that help users find the right content instead of dead ends.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 94.0%
  • CSS 5.1%
  • JavaScript 0.9%