Skip to content

Commit

Permalink
changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
Thomzoy committed Oct 15, 2024
1 parent 406ff1d commit 5f31166
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 25 deletions.
12 changes: 12 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Changelog

## Unreleased

### Added

- `EDS.Tokenizer` now handles `-\n` (found in text when spliting a long word with a linebreak) as a specific token, which can be discarded by the normalizer pipe.

### Fixed

- When using `ignore_space_tokens=True`, words separated only by linebreaks will be collected (via `get_text()`) with spaces inbetween
- The `process` method of `Qualifiers` now accepts `Span` as input, an treats it as a `Doc` to avoid alignment issues
- The `detailed_status_mapping` of disorder/behavior pipes is now a defaultdict to avoid `KeyError: None` that can occur when loading pre-annotated docs without instanciating pipes beforehands

## v0.13.1

### Added
Expand Down
37 changes: 18 additions & 19 deletions docs/pipes/ner/behaviors/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,25 +9,24 @@ Some general considerations about those components:
- The matched comorbidity is also available under the `ent.label_` of each match.
- Matches have an associated `_.status` attribute taking the value `1`, or `2`. A corresponding `_.detailed_status` attribute stores the human-readable status, which can be component-dependent. See each component documentation for more details.
- Some components add additional information to matches. For instance, the `tobacco` adds, if relevant, extracted *pack-year* (= *paquet-année*). Those information are available under the `ent._.assigned` attribute.
- Those components work on **normalized** documents. Please use the `eds.normalizer` pipeline with the following parameters:
```{ .python .no-check }
nlp.add_pipe(
eds.normalizer(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
- Those components work on **normalized** documents. Please use the `eds.normalizer` pipeline with these additional flags:

```{ .python .no-check }
import edsnlp, edsnlp.pipes as eds
...
nlp.add_pipe(
eds.normalizer(
accents=True,
lowercase=True,
quotes=True,
spaces=True,
pollution=dict(
biology=True,
coding=True,
),
),
)
```

--8<-- "docs/pipes/ner/disorders/warning.md"
Expand Down
7 changes: 1 addition & 6 deletions docs/pipes/ner/disorders/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Some general considerations about those components:
- The matched comorbidity is also available under the `ent.label_` of each match.
- Matches have an associated `_.status` attribute taking the value `1`, or `2`. A corresponding `_.detailed_status` attribute stores the human-readable status, which can be component-dependent. See each component documentation for more details.
- Some components add additional information to matches. For instance, the `tobacco` adds, if relevant, extracted *pack-year* (= *paquet-année*). Those information are available under the `ent._.assigned` attribute.
- Those components work on **normalized** documents. Please use the `eds.normalizer` pipeline with the following parameters:
- Those components work on **normalized** documents. Please use the `eds.normalizer` pipeline with these additional flags:

```{ .python .no-check }
import edsnlp, edsnlp.pipes as eds
Expand All @@ -25,13 +25,8 @@ Some general considerations about those components:
quotes=True,
spaces=True,
pollution=dict(
information=True,
bars=True,
biology=True,
doctors=True,
web=True,
coding=True,
footer=True,
),
),
)
Expand Down

0 comments on commit 5f31166

Please sign in to comment.