Live Competitions

Competition 3: INSTRUCT_8B

Goal

The goal of this competition is to train a SOTA instruct 8B model. This competition provides more freedom to miners than other competitions: there are no restrictions on the tokenizer used and miners are allowed to use a wider range of architectures.

Evaluation

The evaluation tasks are the same as the B7_MULTICHOICE competition

Definitions

Code Link

Scheduled Competitions

Competition 4: DISTILLED_REASONING_3B

Goal

The goal of this competition is to train a 3.2-3.4B parameter model specialized in step-by-step reasoning. Submitted models must demonstrate strong capabilities in structured thinking, particularly for mathematical reasoning and code understanding tasks. This competition aims to produce efficient, smaller-scale models that maintain high-quality reasoning abilities compared to larger models.

In this first iteration of the competition we will focus on optimizing the model's perplexity score on reasoning traces. This will then allow us to calibrate the next stages of our reasoning competition.

Evaluation

Models submitted to this competition are evaluated on the SYNTHETIC_1_SFT dataset, which contains verifiable and structured reasoning problems:

Our task evaluates the models on a dataset of verifiable math and code output prediction problems requiring step-by-step reasoning.

The evaluation uses the REFERENCE_LOSS method, which measures how well the model can generate accurate reasoning traces and answers for these problems.

Definitions

Competition Constraints

Dataset Loader

Evaluation Method

Deprecated Competitions

Competition 1: SN9_MODEL

Competition 1 was the OG competition for the finetuning subnet.

Goal

The purpose of this competition was to finetune the top models from the pretraining subnet to produce a chat bot.

Evaluation

Models submitted to this competition were evaluated using a synthetically generated Q&A dataset from the cortex subnet. Specifically, models were evaluated based on their average loss of their generated answers.

Competition 2: B7_MULTICHOICE

Goal

The purpose of this competition is to finetune the top models from the pretraining subnet to produce a chat bot.

Evaluation

Models submitted to this competition are evaluated on a set of evaluation tasks, where each task is worth a subportion of the overall score. The current evaluations are:

SYNTHENTIC_MMLU: In this task model is evaluated on a synthetic MMLU-like dataset from the Text Prompting subnet. This dataset is a multiple choice dataset with a large array of multiple choice questions, spanning a domain of topics and difficulty levels, akin to MMLU. Currently, the dataset is generated using Wikipedia as the source-of-truth, though this will be expanded over time to include more domain-focused sources.
WORD_SORTING: In this task, the model is given a list of words and are required to sort them alphabetically. Code
FINEWEB: In this task, the model's cross entropy loss is computed on a small sample of the fineweb dataset: https://hf.rst.im/datasets/HuggingFaceFW/fineweb-edu-score-2
IF_EVAL: In this task, the model is evaluated on a sythentic version of the IFEval dataset (https://hf.rst.im/datasets/google/IFEval). The prompt contains a list of rules the response must follow. The full list of possible rules is listed in rule.py

Definitions

Code Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!