Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

README.md

🪐 Weasel Project: Named Entity Recognition (WikiNER)

Simple example of downloading and converting source data and training a named entity recognition model. The example uses the WikiNER corpus, which was constructed semi-automatically. The main advantage of this corpus is that it's freely available, so the data can be downloaded as a project asset. The WikiNER corpus is distributed in IOB format, a fairly common text encoding for sequence data. The corpus subcommand splits the corpus into training, development and testing partitions, and uses spacy convert to convert them into spaCy's binary format. You can then edit the config to try out different settings, and trigger training with the train subcommand.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the Weasel documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using weasel run [name]. Commands are only re-run if their inputs have changed.

Command Description
corpus Convert the data to spaCy's format
train Train the full pipeline
evaluate Evaluate on the test data and save the metrics
clean Remove intermediate files

⏭ Workflows

The following workflows are defined by the project. They can be executed using weasel run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all corpustrainevaluate

🗂 Assets

The following assets are defined by the project. They can be fetched by running weasel assets in the project directory.

File Source Description
assets/aij-wikiner-en-wp2.bz2 URL

🚀 Accelerate

If you are interested in accelerating this pipeline, have a look at ner_wikiner_speedster pipeline.