Skip to content

WIP: generic midi parser#13

Open
justinsalamon wants to merge 7 commits into
masterfrom
midiparser
Open

WIP: generic midi parser#13
justinsalamon wants to merge 7 commits into
masterfrom
midiparser

Conversation

@justinsalamon
Copy link
Copy Markdown
Contributor

A generic converter that takes any folder of MIDI files and converts them to JAMS files.

@justinsalamon
Copy link
Copy Markdown
Contributor Author

Calling all JAMSers for comments:
Since this is a generic parser, we don't have any metadata a priori. One option would be to add optional command line arguments so that the user can provide some metadata, something a la:

./midi_parser.py ~/midi_folder -o ~/jams_folder --curator_name "joe blogs" --curator_email "joe@blogs.com" --corpus "joes awesome midi file collection"

Is this the first generic parser, or do we have precedent? Also, this would only be useful for metadata that applies to the entire folder being processed, not individual files (so for example the user wouldn't be able to add the song name for each midi file).

Alternatively, but this might be overkill, we could specify some type of CSV format such that the filename is used as an index, and then every column header needs to match a file_metadata or annotation_metadata field name, and is used to populate that field for that file. Any column header that doesn't match one of the predefined field names could go in the sandbox.

Thoughts?

@justinsalamon
Copy link
Copy Markdown
Contributor Author

Anyone care to weigh in? (e.g. @bmcfee @ejhumphrey @urinieto @rabitt)

@carlthome
Copy link
Copy Markdown

carlthome commented May 7, 2016

Great stuff! 😄

Wouldn't one expect a meta event to contain the song name typically? And in case it doesn't exist then maybe just use the filename as fallback?

Also, I'm not precisely sure what you consider a curator, but there's a copyright field in midi too, right? Couldn't that be used?

Comment thread parsers/midi_parser.py

# Collect all MIDI annotations.
midi_files = jams.util.find_with_extension(midi_dir, '.mid', depth=1)
midi_files.extend(jams.util.find_with_extension(midi_dir, '.MID', depth=1))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of two lines, shouldn't this really be a case insensitive search (e.g. song.MiD is actually a valid extension on some OS:es)? Also .midi is a very common extension (even Windows doesn't rely on FAT anymore).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be more elegant, yes, it was just the simplest way to use find_with_extension without modification. In the short-term iterating over a list of accepted file extensions (mid, MID, midi, MIDI) would do the trick, though perhaps building case insensitivity into find_with_extension would be a nice thing to have @bmcfee ?

@justinsalamon
Copy link
Copy Markdown
Contributor Author

Wouldn't one expect a meta event to contain the song name typically? And in case it doesn't exist then maybe just use the filename as fallback?

@carlthome the messiness and non-uniformity of midi files makes me fairly avert to any solution that tries to scoop metadata from the midi file itself. And that still wouldn't support filling in fields that should be common to all midi files in a collection such as curator or dataset. When I have a moment I'll implement my idea of a metadata csv, I think it's a nice solution that will extend beyond this specific converter to any JAMS converter.

Also, I'm not precisely sure what you consider a curator, but there's a copyright field in midi too, right? Couldn't that be used?

Curator is the person who collected all the audio (or midi) files into a "collection" or "dataset", so the copyright field in the midi file is not exactly the information you're looking for. It's actually a good example of a field that will never exist inside the midi files themselves since it's independent of the files.

@craffel
Copy link
Copy Markdown

craffel commented May 9, 2016

Wouldn't one expect a meta event to contain the song name typically?

In general, absolutely not. Text meta events are 100% optional so are sometimes not included at all, and when they are included there are tons of things which are included as text meta events (composer, copyright, random things about the person who transcribed the song, year, etc.). The song name/artist are only one rare thing that can appear, and they usually don't. The other meta events are also optional and somewhat rare, and are not really appropriate for storing the song name.

And in case it doesn't exist then maybe just use the filename as fallback?

In general, no. In my own collection of ~180,000 unique MIDI files, filenames provided reasonable metadata about 10% of the time.

Incidentally, these issues are what motivated my thesis work, so I've thought about it a bit. I discuss them briefly in my ISMIR 2015 paper, and in much more depth in my thesis and ISMIR 2016 submission, I'd be happy to send you any of those things.

Also, I'm not precisely sure what you consider a curator, but there's a copyright field in midi too, right? Couldn't that be used?

There is, but it is optional and somewhat rarely used, and when it is it does generally not indicate the person who transcribed the MIDI, and when it does there is no standard for how the author is attributed. Here's a short random sample of copyright messages from my MIDI collection:

MIDI �1995 Robert C. Goodyear
(C)1993 Backbeat Studio
From the LP "Hi-Fi In Focus" (RCA-1957)
TablEdited by Russ Jenkins
russ_jenkins@lineone.net
www.songgalaxy.com
This editon Copyright � 2000 by EDK
(C)1995 by MdB Software
Copyright � 2000 di fiorellaearmando@panet.it
Ichigo's Sheet Music - http://ichigos.com/
All Rights Reserved
All Rights Reserved
��� L��
Copyright � <Year> by <Name> All Rights Reserved
(c)Hal Leonard Publishing
� DJ CALM    2005               http://www.scootertrace.ru
Lyrics introduced by Canta Brasil (http://www.geocities.com/lucialeite)
Copyright TopList Team for FoxMusic
This Arrangment Copyright �2000 (Dec. 19) by Benjamin Robert Tubb
YAMAHA 1995

@carlthome
Copy link
Copy Markdown

extend beyond this specific converter to any JAMS converter.

Ah, then it seems way more useful. 👍

Incidentally, these issues are what motivated my thesis work, so I've thought about it a bit.

I'd love to read up on that! 😄 cthome@kth.se MIDI sure seems like quite the historical mess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants