Skip to content

Commit 398ca10

Browse files
committed
Add generate embeddings
1 parent 673fe2e commit 398ca10

6 files changed

+1050
-6
lines changed

README.md

+79
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# MomConnect Intent Classifier
2+
3+
Model that classifies the intent of inbound messages.
4+
5+
## Development
6+
This project uses [poetry](https://python-poetry.org/docs/#installation) for packaging and dependancy management, so install that first.
7+
8+
Ensure you're also running at least python 3.11, `python --version`.
9+
10+
Then you can install the dependencies
11+
```bash
12+
~ poetry install
13+
```
14+
15+
To run a local worker, set NLU_USERNAME and NLU_PASSWORD environment variables, then start up the flask worker
16+
```bash
17+
~ poetry run flask --app src.application run
18+
```
19+
20+
To run the autoformatting and linting, run
21+
```bash
22+
~ ruff format && ruff check && mypy --install-types
23+
```
24+
25+
For the test runner, we use [pytest](https://docs.pytest.org/):
26+
```bash
27+
~ pytest
28+
```
29+
30+
## Regenerating the embeddings json file
31+
32+
1. Delete the json embeddings file in src/data/
33+
1. Update the nlu.yaml with your changes
34+
1. Run the flask app, this should regenerate the embeddings file. `poetry run flask --app src.application run`
35+
36+
## Editor configuration
37+
38+
If you'd like your editor to handle linting and/or formatting for you, here's how to set it up.
39+
40+
### Visual Studio Code
41+
42+
1. Install the Python and Ruff extensions
43+
1. In settings, check the "Python > Linting: Mypy Enabled" box
44+
1. In settings, set the "Python > Formatting: Provider" to "black" (apparently "ruff format" isn't supported by the Python extension yet and "black" is probably close enough)
45+
1. If you want to have formatting automatically apply, in settings, check the "Editor: Format On Save" checkbox
46+
47+
Alternatively, add the following to your `settings.json`:
48+
```json
49+
{
50+
"python.linting.mypyEnabled": true,
51+
"python.formatting.provider": "black",
52+
"editor.formatOnSave": true,
53+
}
54+
```
55+
56+
## Release process
57+
58+
To release a new version, follow these steps:
59+
60+
1. Make sure all relevant PRs are merged and that all necessary QA testing is complete
61+
1. Make sure release notes are up to date and accurate
62+
1. In one commit on the `main` branch:
63+
- Update the version number in `pyproject.toml` to the release version
64+
- Replace the UNRELEASED header in `CHANGELOG.md` with the release version and date
65+
1. Tag the release commit with the release version (for example, `v0.2.1` for version `0.2.1`)
66+
1. Push the release commit and tag
67+
1. In one commit on the `main` branch:
68+
- Update the version number in `pyproject.toml` to the next pre-release version
69+
- Add a new UNRELEASED header in `CHANGELOG.md`
70+
1. Push the post-release commit
71+
72+
## Running in Production
73+
There is a [docker image](https://github.com/praekeltfoundation/mc-intent-classifier/pkgs/container/mc-intent-classifier) that can be used to easily run this service. It uses the following environment variables for configuration:
74+
75+
| Variable | Description |
76+
| ---------- | ----------- |
77+
| NLU_USERNAME | The username used for API requests |
78+
| NLU_PASSWORD | The password used for API requests |
79+
| SENTRY_DSN | Where to send exceptions to |

src/application.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313

1414
dirname = os.path.dirname(__file__)
1515
DATA_PATH = Path(f"{dirname}/data")
16-
OUTPUT_FILE_PATH = DATA_PATH / "new_test_intent_embeddings.json"
16+
NLU_FILE_PATH = DATA_PATH / "nlu.yaml"
17+
EMBEDDINGS_FILE_PATH = DATA_PATH / "intent_embeddings.json"
1718

1819
app = Flask(__name__)
1920
app.config["BASIC_AUTH_USERNAME"] = os.environ.get("NLU_USERNAME")
@@ -31,7 +32,9 @@
3132
metrics = PrometheusMetrics(app)
3233
metrics.info("app_info", "Application info", version=version)
3334

34-
classifier = IntentClassifier(json_path=OUTPUT_FILE_PATH)
35+
classifier = IntentClassifier(
36+
embeddings_path=EMBEDDINGS_FILE_PATH, nlu_path=NLU_FILE_PATH
37+
)
3538

3639

3740
@app.route("/")

src/data/intent_embeddings.json

+1-1
Large diffs are not rendered by default.

src/data/new_test_intent_embeddings.json

-1
This file was deleted.

0 commit comments

Comments
 (0)