WIP: RWC POP melody parser#12
Conversation
|
Hey everyone, do you think we could have a folder inside parsers called "data", which contains a folder per parser (when needed) with any data files the parser requires? In my case the RWC Pop melody metadata lives online in a table, which I've converted to a csv file, so I'd like to keep this csv file in the repo. |
|
On a related note, this table also includes the track duration (resolution in seconds only though). @bmcfee do you think it'd make sense to populate the duration field of the JAMS file based on this table? I could do it from the audio, but RWC is distributed via CD's (at least it was when I got a copy), which means the exact duration of the audio I have might depend on whatever software I used to rip the CD's in the first place, and I have no way of recovering that information. I also don't have the original CD's at hand. So I was thinking the durations reported online might be my best in this case? |
Yeah, in this case, I think that's the best you can hope for. RWC is distributed on cds, but I thought they were data (not audio) cds? I may be mistaken though. |
I don't remember TBH, you may very well be right. Still, I don't remember by which process I obtained the WAV files that I have... so I'll stick to the times reported online for now. |
Data copied directly from the table that can be found online at: https://staff.aist.go.jp/m.goto/RWC-MDB/rwc-mdb-p.html
Implemented process_folder, create_jams, fill_annotation_metadata and fill_file_metadata
|
I think this is good to go, anyone care to CR it before I merge the PR? |
|
p.s. - I still need to get permission to include the converted jams files in the repo, working on that. |
| jam.file_metadata.title = metadata['Title'][n_track] | ||
|
|
||
| d_str = metadata['Length'][n_track] | ||
| jam.file_metadata.duration = (float(d_str.split(":")[0]) * 60 + |
There was a problem hiding this comment.
this is pretty dirty. Since we're already using pandas here, we can do something like this instead:
In [4]: pd.Timedelta('0:2:34').total_seconds()
Out[4]: 154.0(only problem is that you'll have to pad in hours, but i generally prefer that to reinventing second-parsing.)
|
@justinsalamon C has been R'ed. Mostly minor style issues, though the tempfile cleanup is a real problem. Otherwise LGTM. |
See #1 (comment)