WIP: generic midi parser#13
Conversation
Data copied directly from the table that can be found online at: https://staff.aist.go.jp/m.goto/RWC-MDB/rwc-mdb-p.html
Implemented process_folder, create_jams, fill_annotation_metadata and fill_file_metadata
|
Calling all JAMSers for comments: ./midi_parser.py ~/midi_folder -o ~/jams_folder --curator_name "joe blogs" --curator_email "joe@blogs.com" --corpus "joes awesome midi file collection"Is this the first generic parser, or do we have precedent? Also, this would only be useful for metadata that applies to the entire folder being processed, not individual files (so for example the user wouldn't be able to add the song name for each midi file). Alternatively, but this might be overkill, we could specify some type of CSV format such that the filename is used as an index, and then every column header needs to match a Thoughts? |
|
Anyone care to weigh in? (e.g. @bmcfee @ejhumphrey @urinieto @rabitt) |
|
Great stuff! 😄 Wouldn't one expect a meta event to contain the song name typically? And in case it doesn't exist then maybe just use the filename as fallback? Also, I'm not precisely sure what you consider a curator, but there's a copyright field in midi too, right? Couldn't that be used? |
|
|
||
| # Collect all MIDI annotations. | ||
| midi_files = jams.util.find_with_extension(midi_dir, '.mid', depth=1) | ||
| midi_files.extend(jams.util.find_with_extension(midi_dir, '.MID', depth=1)) |
There was a problem hiding this comment.
Instead of two lines, shouldn't this really be a case insensitive search (e.g. song.MiD is actually a valid extension on some OS:es)? Also .midi is a very common extension (even Windows doesn't rely on FAT anymore).
There was a problem hiding this comment.
This could be more elegant, yes, it was just the simplest way to use find_with_extension without modification. In the short-term iterating over a list of accepted file extensions (mid, MID, midi, MIDI) would do the trick, though perhaps building case insensitivity into find_with_extension would be a nice thing to have @bmcfee ?
@carlthome the messiness and non-uniformity of midi files makes me fairly avert to any solution that tries to scoop metadata from the midi file itself. And that still wouldn't support filling in fields that should be common to all midi files in a collection such as curator or dataset. When I have a moment I'll implement my idea of a metadata csv, I think it's a nice solution that will extend beyond this specific converter to any JAMS converter.
Curator is the person who collected all the audio (or midi) files into a "collection" or "dataset", so the copyright field in the midi file is not exactly the information you're looking for. It's actually a good example of a field that will never exist inside the midi files themselves since it's independent of the files. |
In general, absolutely not. Text meta events are 100% optional so are sometimes not included at all, and when they are included there are tons of things which are included as text meta events (composer, copyright, random things about the person who transcribed the song, year, etc.). The song name/artist are only one rare thing that can appear, and they usually don't. The other meta events are also optional and somewhat rare, and are not really appropriate for storing the song name.
In general, no. In my own collection of ~180,000 unique MIDI files, filenames provided reasonable metadata about 10% of the time. Incidentally, these issues are what motivated my thesis work, so I've thought about it a bit. I discuss them briefly in my ISMIR 2015 paper, and in much more depth in my thesis and ISMIR 2016 submission, I'd be happy to send you any of those things.
There is, but it is optional and somewhat rarely used, and when it is it does generally not indicate the person who transcribed the MIDI, and when it does there is no standard for how the author is attributed. Here's a short random sample of copyright messages from my MIDI collection: |
Ah, then it seems way more useful. 👍
I'd love to read up on that! 😄 cthome@kth.se MIDI sure seems like quite the historical mess. |
A generic converter that takes any folder of MIDI files and converts them to JAMS files.