-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(website): Column remapping when submitting metadata files #3478
base: main
Are you sure you want to change the base?
Conversation
771d376
to
f6b607a
Compare
3bbd86a
to
bdb1122
Compare
This comment was marked as outdated.
This comment was marked as outdated.
I think the logic needs to be reversed, with a row for each "Column in your file", for which you can choose the inputField to map to. This is because often the uploaded file may only have like 4 columns from the selection. |
ok! |
What happens if you have a column that is not mentioned in the |
Files are generally not rejected. The column remapping however would force you to either map a column to a column that is specified in the input fields, or to just not include the column. You can also just ignore column remapping altogether, and still submit files with whatever structure you like to the backend. |
Ah, so it's possible to skip the remapping? (Sorry, I haven't tried out the feature yet!) |
yes. Remapping is only done if you click the button to add a column mapping. I also took care to not decompress TSV files if no column mapping is to be applied! (I still need to test if this actually works the way I implemented it) |
This comment was marked as outdated.
This comment was marked as outdated.
I just tried this on the preview. Overall, the flow seems sensible to me. But there were a few things I stumbled on.
|
Ideally, I think we should have required columns at the top of the list. And if easy to do, we could highly columns that are already selected with a different background color or similar? |
beyond that, I think it would be good to merge this in asap after some testing. |
5ae506e
to
3809556
Compare
I checked about "And if easy to do, we could highly columns that are already selected with a different background color or similar?" -> Unfortunately the native select element options cannot be styled. I experimented with putting UTF-8 symbols with the text but wasn't happy. I think it's quite helpful though to see which ones are already mapped. A proper solution would, however, involve getting rid of the native select element. Probably not for this PR. EDIT: I ended up actually doing this; replacing the native select element. It is a bit usability improvement I think! |
db4e992
to
61fd1d1
Compare
5c5d4f1
to
219fbe7
Compare
08cc5d2
to
2aff3cf
Compare
Co-authored-by: Chaoran Chen <[email protected]>
@anna-parker preview working again! |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
website/src/components/Submission/FileUpload/ColumnMappingModal.tsx
Outdated
Show resolved
Hide resolved
website/src/components/Submission/FileUpload/ColumnMappingModal.tsx
Outdated
Show resolved
Hide resolved
website/src/components/Submission/FileUpload/ColumnMappingModal.tsx
Outdated
Show resolved
Hide resolved
/* Apply this mapping to a TSV file, returning a new file with remapped columns. */ | ||
public async applyTo(tsvFile: ProcessedFile): Promise<File> { | ||
const text = await tsvFile.text(); | ||
const inputRows = text.trim().split('\n'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're not using a full TSV parser? This would cause problems if there is a multiline cell which, I think, is a reasonable use case for fields like version comment. The following file doesn't work:
submissionId versionComment geoLocCountry sampleCollectionDate
test1 "Paragraph 1...
...and paragraph 2" USA 2011-07-18
test2 "Paragraph 1...
...and paragraph 2" USA 2011-07-18
If we add column remapping, it will throw the error:
Without the remapping, it works fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I didn't know that multi line cells were possible.
I'm now using a library and also adapted the test: d87c684
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: used a different lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Felix - this is looking really great I found one bug if you could fix it - I was also wondering if you could add a couple more desired fields to the values.yaml to check this works correctly - this is also a place where there could be duplications - a field can be desired and also sample-specific - so I wanted to test duplications are handled correctly there :-)
Update: you can see a list of desired fields here: https://pathoplexus.org/docs/concepts/metadataformat, I was also adding them here: https://github.com/pathoplexus/pathoplexus/pull/335/files#diff-21a8c5ff083727625bb435b82d73142d7b8bd0a857948a5743aaab1f96b29fd7
sourceColumns.forEach((sourceColumn) => { | ||
const bestMatch = this.getBestMatchingTargetColumn(sourceColumn, availableFields); | ||
mapping.set(sourceColumn, bestMatch); | ||
availableFields = availableFields.filter((field) => field.name !== bestMatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! I found another issue here, so first you need to do an exact match and then only fuzzy matching, for example if I have my list of columns [geoLocAdmin3, geoLocAdmin1] then geoLocAdmin3 will get mapped to geoLocAdmin1 and geoLocAdmin1 will get mapped to geoLocAdmin2, see example below where I did this on the preview:
It might also make sense to mark which fields have not been mapped exactly (e.g. score under 1) with a different color so that the user will quickly see that this mapping happened - otherwise they might miss sth being mapped incorrectly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! I had a murky feeling about the mapping, good think you spotted it. I've fixed it now. Also the non-exact matches are not in italic, does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added all the 'desired' flags!
resolves #3432
preview URL: https://column-remapping.loculus.org/
Summary
Screenshot
PR Checklist