Skip to content
Gustave Hahn-Powell edited this page Feb 2, 2016 · 8 revisions

TODOs

  • OdinException
  • specify label on named capture (?<name:label>...)
  • Chunks!
  • Head word (from Mention)
  • GraphPattern => DependencyPattern, ~SRLPattern
  • How to handle Universal Dependencies?
  • trigger constraints (must have path/must not have path) (proposed keywords: with/without, assert, constraint)
  • lookarounds in DependencyPath (to add constraints to arguments)
  • composable actions
  • multi-sentence mentions
  • add unit (word, lemma, etc)
  • actions with priority
  • EngineAdmin EngineConfig yaml
  • add finalAction to engine

To be added to the odin manual

  • ??

actions wish-list

  • binarize by argument
  • keep most complete/contained per event/relation
  • keep least complete (for symmetry)
  • filter, map, flatMap

mentions wish-list

  • copy constructor with overridable params
  • multi-sentence mentions

optimizations

Reduce the time spent matching mentions with this one simple trick!

If you take a look at pos-reg_template.yml you will notice that the same trigger pattern is repeated in many rules. The same is true for neg-reg and simple-events. This can be optimized by introducing a single rule that matches the trigger and creates a mention for it (e.g., PosRegTrigger). Then we can use these mentions instead of a trigger pattern in the event rules. This is supported by odin with the following syntax:

- name: pos_reg_1
  label: PosReg
  pattern: |
    trigger:PosRegTrigger
    controlled:${controlledType} = ...
    controller:${controllerType} = ...

note that trigger has a label and no token pattern. This tells odin to use an existing mention as the trigger instead of trying to match a token pattern. This way we match the token pattern once instead of once per rule. That should improve reach's runtime.

If we needed to add some constraints to the trigger we can do it like this:

- name: pos_reg_2
  label: PosReg
  pattern: |
    trigger = @PosRegTrigger (?!protein) # trigger should not be followed by the token protein
    controlled:${controlledType} = ...
    controller:${controllerType} = ...

this would run a token pattern, but it is hopefully simpler than the one used to match the triggers, so it would still be an improvement.

Other stuff

Bindings

Shouldn't binding rules attempt to identify the result of a binding? Perhaps we could scaffold using binding event mentions as triggers (i.e. Protein_with_site approach)?

Unicode

BioProcessor (maybe all processors) should normalize unicode characters. We have already had problems with all the different unicode hyphens.

operators

Add ^ and $ operators to surface patterns for matching the start and end of a sentence.

Named captures

Named captures only capture a single token interval. Ideally they should be able to capture a sequence of intervals.