-
Notifications
You must be signed in to change notification settings - Fork 1.7k
escape special chars in csv and json output. #802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
1f56de0
2cb120a
5db5aab
807edf4
f3ceed0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -160,13 +160,19 @@ std::string StrFormat(const char* format, ...) { | |||||||
| return tmp; | ||||||||
| } | ||||||||
|
|
||||||||
| void ReplaceAll(std::string* str, const std::string& from, | ||||||||
| const std::string& to) { | ||||||||
| std::size_t start = 0; | ||||||||
| while ((start = str->find(from, start)) != std::string::npos) { | ||||||||
| str->replace(start, from.length(), to); | ||||||||
| start += to.length(); | ||||||||
| } | ||||||||
| std::string StrEscape(const std::string & s) { | ||||||||
| std::string tmp; | ||||||||
|
||||||||
| std::string tmp; | |
| std::string tmp; | |
| tmp.reserve(s.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do realize that std::string is already designed for this kind of concatenation? this is trading 0-1-2 reallocs (depending on the length of the key) with a guaranteed two calls (strlen,realloc), not sure if there's a real win here, unless it's an issue of coding style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the escaped string will be at least not shorter than the input string.
clang/gcc will not magically insert that pre-allocation.
Therefore, if we don't do it explicitly, we are relying on the sane behavior
of the std::string implementation in the C++ std library that is used.
- We will likely always have some allocs if the string after escaping does not fit within small-size-optimization.
- Without pre-allocation, if the source string does not fit within small-size-optimization,
and there won't be extra escape symbols, we will have allocations. - WITH pre-allocation, if the source string does not fit within small-size-optimization,
and there won't be extra escape symbols, we will NOT have allocations. WIN.
It is well-known that reserve() is good, and it is quite obvious it won't hurt here.
I do not understand your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please link where it's documented what should be escaped in CSV?
I'm probably looking in wrong places, https://tools.ietf.org/html/rfc4180 only says to replace
"with""?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there even a standard for csv? at least \r and \n need to be escaped, \b and \f probably should be, \t is a matter of taste- the tests dont check for it in any case. and anyway, isn't csv support being deprecated #500 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing that in a quick test with loading
into libreoffice:

And replacing
\nwith "\n":breaks it:

Thus yes, i'm curious as to motivation/documentation.
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, \t and \ should proabaly not be escaped for csv. the other ones I think the "right thing" is to escape them. \b because if you're lookign at csv, you typically want to see it on a terminal, and any \b would end up hidden, and they're more often than not in there by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that JSON changes are good.
The issue does not mention anything about CSV; are you affected by CSV side of things?
How about a middle ground solution then?
reserve()suggestion.