@@ -117,7 +117,7 @@ additional step to the process. Previously, the core steps for running linkage w
117
117
3 . Comparisons (also known as evaluation)
118
118
4 . Aggregation and prediction
119
119
120
- With the new approach, a "cleaning" step is added between steps 1 and 2 . While the
120
+ With the new approach, a "cleaning" steps are added between steps 1-2 and 3-4 . While the
121
121
computational overhead of this additional step is minimal, the increased complexity is a
122
122
concern. Each added step makes the system more challenging to evolve and harder for both
123
123
users and developers to understand.
@@ -129,17 +129,20 @@ certain matches were made.
129
129
## Implementation Plan
130
130
131
131
For the purposes of this RFC, we will not be overly prescriptive about the implementation
132
- details. However, the work can be broadly divided into three tasks:
132
+ details. However, the work can be broadly divided into four tasks:
133
133
1 . A new ` NAME ` feature will be created, that will allow us to specify skip conditions
134
134
for the entirety of the name specified. (This likely won't be used for evaluation,
135
135
as its still preferable to compare the first and last names separately, but users
136
136
will have that option)
137
- 2 . Modify the existing Algorithm schema to include the new ` skip_values ` attribute,
137
+ 2 . Modify the existing Algorithm schema to include the new ` skip_values ` attribute,
138
138
along with parsing these values and storing the specified conditions.
139
139
3 . Implement a new cleaning step that takes the incoming data payload and a list of skip
140
140
conditions, then returns a copy of the data payload with placeholder values removed.
141
141
This cleaned copy will be used for blocking, evaluation, and aggregation, while the
142
142
original incoming payload will be retained for persistence.
143
+ 4 . Update the linking algorithm to clean the incoming data payload before blocking, ** and**
144
+ clean the MPI patient records after blocking. It’s crucial to sanitize both incoming
145
+ and existing data, as unclean values on either side could result in invalid comparisons.
143
146
144
147
## Unresolved Questions
145
148
0 commit comments