Upwork → LinkedIn Matcher

A modular system for matching Upwork freelancer profiles to LinkedIn profiles using SERP APIs and optional LLM reranking. The system uses rule-based scoring with semantic refinement to achieve high-precision matches.

Quick Start

# 1. Setup
pip install --break-system-packages requests python-dotenv

# 2. Configure API keys in .env file
SERPER_API_KEY=your_serper_key
OPENAI_API_KEY=your_openai_key

# 3. Run basic matching
python3 upwork_to_linkedin_matcher.py \
  --input your_upwork_profiles.csv \
  --output results.csv \
  --provider serper

# 4. Add LLM selection (recommended)
python3 llm_select_existing.py \
  --upwork your_upwork_profiles.csv \
  --candidates results.csv \
  --output final_results.csv \
  --llm-model gpt-5-nano-2025-08-07 \
  --llm-keep-threshold 0.5

Performance Benchmarks

Based on testing with 51 profiles against golden dataset:

SERP Coverage: 51% (26/51 profiles found)
LLM Selection (threshold 0.5): 73.1% success rate when SERP found correct profile (19/26)
End-to-End Success: 37.3% (19/51 correct matches)
LLM Precision: 59.4% when making selections
False Positive Rate: 25.5% (13 wrong selections)

Key Insight: The primary bottleneck is SERP coverage (finding the right LinkedIn profiles in search results), not LLM selection accuracy. Improving search queries and expanding candidate pools yields the highest gains.

System Architecture

The system is modularized into focused components:

upwork_to_linkedin_matcher.py - Main orchestration script (459 lines, down from 1833)
providers.py - SERP API integrations (Serper, SerpAPI)
models.py - Data structures and constants
features.py - Text processing and feature extraction
queries.py - Search query generation strategies
scoring.py - Candidate scoring and matching logic
llm.py - OpenAI integration for semantic reranking
utils.py - Common utilities
llm_select_existing.py - Standalone LLM selection script

How It Works

1. Feature Extraction

From each Upwork profile, the system extracts:

Name variants: Handles cultural naming patterns and permutations
Geographic signals: City, country, location aliases
Professional context: Title phrases, top skills, companies, schools
Descriptive phrases: Key phrases from descriptions

2. Query Generation

Generates targeted LinkedIn searches using patterns like:

site:linkedin.com/in "First Last" "City" -inurl:"/jobs/"
site:linkedin.com/in "First Name" "Top Skill" "Country"
"First Name" "Title Phrase" site:linkedin.com/in

3. Scoring System

Multi-signal scoring with hard guards:

Hard Rejections:

Missing first name in profile text
Last initial mismatch

Positive Signals (additive):

City match: +5 points
Country match: +3 points
Title phrase match: +4 points
Skill match: +3 points
Education/Company match: +3 points each
Last initial match: +5 points

Confidence Levels:

High: ≥10 points (accept threshold)
Medium: ≥7 points (review threshold)
Low: ≥3 points (minimum score)

4. LLM Refinement (Recommended)

Uses GPT-5-nano with reasoning capabilities to semantically evaluate top candidates:

Decision Framework:

Requires name match (first name + last initial)
Looks for 1+ supporting signals: location, skills, role, company, education
Returns single best match with confidence score (0.0-1.0)
Provides detailed rationale for selection/rejection

Optimal Settings:

--llm-keep-threshold 0.5 (balanced precision/recall)
--llm-top-k 5 (candidates sent to LLM)
Model: gpt-5-nano-2025-08-07 (supports reasoning.effort parameter)

Input Requirements

Your CSV must contain these columns (case-sensitive):

Full Name: "Amna M." format (first name + last initial)
Title: Role/title information
Description: Free text description
Country: "Pakistan", "Serbia", etc.
City: "Lahore", "Belgrade", etc.
Skills: Comma/pipe/semicolon separated skills
Education: Educational background (schools extracted automatically)
Employment History: Work history (companies extracted automatically)

Optional columns: English Level, Certifications, Profile URL

Configuration

Environment Variables (.env file)

# Required: Choose one SERP provider
SERP_PROVIDER=serper
SERPER_API_KEY=your_serper_key_here
# OR
SERPAPI_API_KEY=your_serpapi_key_here

# Required for LLM features
OPENAI_API_KEY=sk-proj-your_openai_key_here

# Optional OpenAI attribution
OPENAI_ORG_ID=org-your_org_id
OPENAI_PROJECT_ID=proj-your_project_id

# Optional alternative provider
FIRECRAWL_API_KEY=fc-your_firecrawl_key

Key Command Line Options

Basic Matching:

--input path/to/input.csv        # Input Upwork profiles
--output path/to/output.csv      # Output results
--provider serper|serpapi        # SERP provider choice
--max-queries 10                 # Queries per profile (default: 6)
--min-score 1                    # Minimum score to include (default: 3)

Quality Controls:

--accept-threshold 10            # High confidence threshold
--review-threshold 7             # Medium confidence threshold
--no-score-filter               # Include all matches regardless of score
--no-require-role-signal        # Disable role/skill requirement

LLM Options:

--llm-model gpt-5-nano-2025-08-07    # Model to use (supports reasoning)
--llm-keep-threshold 0.5             # Minimum LLM confidence (recommended)
--llm-top-k 5                        # Max candidates to send to LLM (recommended)

Usage Examples

Basic High-Precision Run

python3 upwork_to_linkedin_matcher.py \
  --input profiles.csv \
  --output results.csv \
  --provider serper \
  --accept-threshold 11 \
  --min-score 3 \
  --max-queries 8

High-Coverage Run

python3 upwork_to_linkedin_matcher.py \
  --input profiles.csv \
  --output results.csv \
  --provider serper \
  --min-score 1 \
  --no-score-filter \
  --max-queries 10

LLM-Enhanced Workflow

# Step 1: Generate candidates
python3 upwork_to_linkedin_matcher.py \
  --input profiles.csv \
  --output candidates.csv \
  --min-score 1 \
  --max-queries 10

# Step 2: LLM selection
python3 llm_select_existing.py \
  --upwork profiles.csv \
  --candidates candidates.csv \
  --output final_results.csv \
  --llm-model gpt-5-nano-2025-08-07 \
  --llm-keep-threshold 0.5 \
  --llm-top-k 5

Analysis and Validation

# Compare results against golden dataset
python3 compare_all_results.py

# Analyze LLM selection performance
python3 analyze_llm_selection.py

Output Format

Each result row contains:

Upwork Context:

upwork_name, upwork_title, upwork_location, upwork_skills

LinkedIn Match:

linkedin_url, linkedin_title, linkedin_snippet

Scoring Details:

match_score (integer), confidence (High/Medium/Low)
matched_signals (comma-separated list)
query_used (the search query that found this match)

LLM Analysis (when used):

llm_selected (yes/secondary/blank)
llm_confidence (0.0-1.0)
llm_rationale (reasoning for selection/rejection)
llm_rank (1-based ranking)

Optimization Strategies

For Higher Precision

Increase scoring thresholds: --accept-threshold 12 --min-score 5
Keep role requirements: Don't use --no-require-role-signal
Use LLM selection: Add the LLM step with --llm-keep-threshold 0.5+
Focus queries: Use fewer, more targeted queries

For Higher Recall

Lower thresholds: --min-score 1 or --no-score-filter
More queries: --max-queries 10+
Disable filters: --no-require-role-signal
Lower LLM threshold: --llm-keep-threshold 0.4 (below 0.5 increases false positives)

Name Handling Improvements

The system includes cultural name variations:

"Necip Eray D." → tries "Eray Necip", "Necip Eray Damar", "Eray Damar"
"Anastasiia G." → preserves unique spelling with conservative variations
General permutations for two-part names

Troubleshooting

No Results Found

# Try lower thresholds
--min-score 1 --no-score-filter

# More queries per profile
--max-queries 10 --results-per-query 8

# Disable role requirement
--no-require-role-signal

Too Many False Positives

# Stricter scoring
--min-score 5 --accept-threshold 12

# Add LLM filtering
python3 llm_select_existing.py --llm-keep-threshold 0.6

LLM Not Working

Check API key: Ensure OPENAI_API_KEY is in .env
Install dotenv: pip install python-dotenv
Verify model: Use gpt-5-nano-2025-08-07 or gpt-3.5-turbo

Rate Limiting

# Add delays between requests
--sleep-min 1.5 --sleep-max 3.0

# Reduce query volume
--max-queries 5 --results-per-query 3

API Costs & Limits

SERP API Usage

Serper: ~$5 per 1000 queries
SerpAPI: ~$50 per 1000 queries
Estimate: 6-10 queries per profile

OpenAI Usage (LLM step)

GPT-5-nano: $0.05/1M input tokens, $0.40/1M output tokens
Estimate: ~$0.01-0.02 per profile for LLM selection
Only top 5 candidates sent to reduce costs

Advanced Features

Query Logging

--query-log queries.jsonl --debug-serp

Logs all search queries and responses for debugging.

Batch Processing

# Process large files in chunks
python3 batch_processor.py --chunk-size 50 --input large_file.csv

Custom Scoring

Modify scoring.py to adjust signal weights:

SIGNAL_SCORES = {
    'city': 5,
    'country': 3,
    'title_phrase': 4,
    'skill': 3,
    # ... customize as needed
}

Limitations & Considerations

Technical Limitations

SERP dependency: Results quality depends on search engine snippets
Name ambiguity: Common names may produce false matches
Geographic bias: Search results may favor certain regions
Rate limits: API quotas may limit processing speed

Data Quality Factors

Incomplete profiles: Missing skills/location data reduces match quality
Transliteration: Non-English names may have spelling variations
Professional context: Generic titles/skills provide weak signals

Cost Considerations

SERP API costs scale with profile count and query volume
LLM costs are controlled but add overhead for large batches
Consider cost/accuracy tradeoffs for your use case

Contributing

The modular architecture makes it easy to:

Add new SERP providers in providers.py
Modify scoring logic in scoring.py
Enhance query strategies in queries.py
Customize feature extraction in features.py

License & Compliance

This tool uses public SERP APIs and respects:

Provider terms of service and rate limits
LinkedIn's robots.txt (no direct crawling)
Data privacy and applicable regulations

Ensure compliance with local laws and platform policies before use.# linkedin_matcher_v2

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
core		core
docs		docs
.gitignore		.gitignore
README.md		README.md
comprehensive_analysis.py		comprehensive_analysis.py
model_comparison.py		model_comparison.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Upwork → LinkedIn Matcher

Quick Start

Performance Benchmarks

System Architecture

How It Works

1. Feature Extraction

2. Query Generation

3. Scoring System

4. LLM Refinement (Recommended)

Input Requirements

Configuration

Environment Variables (.env file)

Key Command Line Options

Usage Examples

Basic High-Precision Run

High-Coverage Run

LLM-Enhanced Workflow

Analysis and Validation

Output Format

Optimization Strategies

For Higher Precision

For Higher Recall

Name Handling Improvements

Troubleshooting

No Results Found

Too Many False Positives

LLM Not Working

Rate Limiting

API Costs & Limits

SERP API Usage

OpenAI Usage (LLM step)

Advanced Features

Query Logging

Batch Processing

Custom Scoring

Limitations & Considerations

Technical Limitations

Data Quality Factors

Cost Considerations

Contributing

License & Compliance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages