-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mappify should set missing values (for rows shorter than header) to nil #65
Comments
Hi @ABeltramo. Thanks for bringing this up. This is actually how I personally expect this to behave, at least by default. Ultimately, I think it's a consequence of how the underlying csv grammar library parses into vectors, and doesn't remeber what row dimensions it's seen. I'd consider a processing fn/option which produced the behavior you're after, but I don't think I'd make it default. Does that seem reasonable? Thanks again. |
Yeah sure, I was mistakely thinking that given an header all the rows will have the same number of elements no matter what is the content of the CSV. |
Hey @ABeltramo. Sorry for letting this hang for so long. Is there a reason why just using Thanks! |
Hi @metasoarous,
given a large enough CSV file you'll get an exception in creating a struct bigger than X. Unfortunately, I don't have a CSV file to reproduce this, and I'm not even sure this was the issue. I think you'll catch this by using spec generators instead of trivial examples in unit tests, if I can find the time I'll try something. |
Thanks for this feedback @ABeltramo. I was just able to create a struct with 10k fields, and didn't hit any issues. I think you're right though that using some spec-based generative testing would be a good idea before taking this plunge though. Thanks again |
Thanks again for submitting this @ABeltramo. I've decided that for now I'm going to close this issue. To my mind, nothing is broken, and this is actually how I'd expect the function to behave, at least by default. There are absolutely cases (triangular matrices, for example), where you wouldn't want to take up any more space than necessary in the dicts you parse. I am not categorically apposed to an option that would preserve kv pairs (v= Please feel free to reopen if you disagree or have more pertinent information here. Thanks again. |
I found it weird and undocomunted but with the following malformed CSV (first two lines have less commas than the header)
Using the
mappify
method will produce the following:As you can see some rows are smaller than others, totally missing from the mappified results. I was expecting that all rows will have the same size, with
nil
values when something is missing from the CSV.Bare in mind that using
{:structs true}
will produce the expected results:I have some other issues using structs but I will probably open another issue when I can get a reproducible environment.
I'll open up a pull request with a fix I have made in order to fix this.
The text was updated successfully, but these errors were encountered: