Skip to content

Add more benchmark tasks (20 → 50 tasks) #9

@jackjin1997

Description

@jackjin1997

Goal

Expand from 10 to 50 benchmark tasks for more statistically meaningful results.

Proposed new tasks

Code (5 more)

  • Refactor a class hierarchy to use composition over inheritance
  • Add type hints to an untyped Python module
  • Fix a race condition in async code
  • Implement a caching decorator with TTL
  • Convert callbacks to async/await

Data (5 more)

  • Clean and merge two messy CSVs
  • Build a simple ML pipeline (train/eval/predict)
  • Visualize time series with anomaly highlights
  • Parse and analyze a web server access log
  • Generate a PDF report from structured data

Research (3 more)

  • Compare 3 database options for a specific use case
  • Write a threat model for a web application
  • Summarize and critique a technical RFC

Tool Use (4 more)

  • Set up a GitHub Actions workflow
  • Create and configure a Docker Compose stack
  • Automate a Slack notification pipeline
  • Build a simple MCP server

Multi-step (3 more)

  • Full PR workflow: branch → code → test → PR
  • Debug production incident from logs to fix
  • Migrate a project from JavaScript to TypeScript

How to contribute

See task authoring guide. Each task is a single YAML file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions