-
Notifications
You must be signed in to change notification settings - Fork 39
DEVELOPMENT: TODO
- OdinException
- specify label on named capture
(?<name:label>...)
- Chunks!
- Head word (from
Mention
) -
GraphPattern
=>DependencyPattern
, ~SRLPattern
- How to handle Universal Dependencies?
- trigger constraints (must have path/must not have path) (proposed keywords: with/without, assert, constraint)
- lookarounds in DependencyPath (to add constraints to arguments)
- composable actions
- multi-sentence mentions
- add unit (word, lemma, etc)
- actions with priority
- EngineAdmin EngineConfig yaml
- add finalAction to engine
- ??
- binarize by argument
- keep most complete/contained per event/relation
- keep least complete (for symmetry)
- filter, map, flatMap
- copy constructor with overridable params
- multi-sentence mentions
Reduce the time spent matching mentions with this one simple trick!
If you take a look at pos-reg_template.yml
you will notice that the same trigger
pattern is repeated in many rules. The same is true for neg-reg and simple-events.
This can be optimized by introducing a single rule that matches the trigger and
creates a mention for it (e.g., PosRegTrigger
). Then we can use these mentions instead
of a trigger pattern in the event rules. This is supported by odin with the following syntax:
- name: pos_reg_1
label: PosReg
pattern: |
trigger:PosRegTrigger
controlled:${controlledType} = ...
controller:${controllerType} = ...
note that trigger has a label and no token pattern. This tells odin to use an existing mention as the trigger instead of trying to match a token pattern. This way we match the token pattern once instead of once per rule. That should improve reach's runtime.
If we needed to add some constraints to the trigger we can do it like this:
- name: pos_reg_2
label: PosReg
pattern: |
trigger = @PosRegTrigger (?!protein) # trigger should not be followed by the token protein
controlled:${controlledType} = ...
controller:${controllerType} = ...
this would run a token pattern, but it is hopefully simpler than the one used to match the triggers, so it would still be an improvement.
Shouldn't binding rules attempt to identify the result of a binding? Perhaps we could scaffold using binding event mentions as triggers (i.e. Protein_with_site
approach)?
BioProcessor (maybe all processors) should normalize unicode characters. We have already had problems with all the different unicode hyphens.
Add ^
and $
operators to surface patterns for matching the start and end of a sentence.
Named captures only capture a single token interval. Ideally they should be able to capture a sequence of intervals.