Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 45 additions & 4 deletions include/scrimmage/common/CSV.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,18 +34,54 @@
#define INCLUDE_SCRIMMAGE_COMMON_CSV_H_

#include <fstream>
#include <iomanip>
#include <iostream>
#include <list>
#include <map>
#include <memory>
#include <sstream>
#include <string>
#include <utility>
#include <variant>

#include "scrimmage/parse/ParseUtils.h"

namespace {
struct StringifyVisitor {

bool double_is_fixed = true;
bool double_is_scientific = true;
int double_precision = 13;

std::string operator()(int value) const { return std::to_string(value); }
std::string operator()(unsigned int value) const { return std::to_string(value); }
std::string operator()(size_t value) const { return std::to_string(value); }
std::string operator()(long value) const { return std::to_string(value); }
std::string operator()(const std::string& value) const { return value; }
std::string operator()(bool value) const { return value ? "True" : "False"; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd opt for "true" and "false" (not capitalized). There's not a huge reason for this, but there's also not a huge reason for capitalizing, so my "break the tie" reasoning" is that if people are ever typing it, that's one less thing to mess up. If it's read case-insensitive (which I would hope it is), then no reason to throw in a capital and make people wonder. Python is the only thing I know of that has case-sensitive "True" and "False". No reason to bring that into a CSV.

std::string operator()(double value) const {
// default precision values for double are not enough in many cases
std::ostringstream conv;
if (double_is_fixed) {
conv << std::fixed;
}
if (double_is_scientific) {
conv << std::scientific;
}
conv << std::setprecision(double_precision) << value;
return conv.str();
}
};
} // namespace

namespace scrimmage {

class CSV {
public:
typedef std::list<std::string> Headers;
typedef std::list<std::pair<std::string, double>> Pairs;
typedef std::variant<bool, size_t, unsigned int, long, int, double, std::string>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given possible ripple effects, I'm not sure if what I'm about to say is feasible. Actually, if all you had was double before, then all these options should be new, so it wouldn't affect existing code any differently, so I'll go with that assumption. Try to erase unsized types like unsigned int, long, int, etc. from usage. Prefer sized types. What's an int? It has a minimum size, but you don't really know because those are implementation-defined. Same for long. I'm guessing for int you're expecting a 32-bit value, so use int32_t. I'm guessing long is expected to be a 64-bit value, so use int64_t.

And if these are for data files, you can probably get away without a size_t. Depends on what you're expecting to represent. A size_t isn't "how many things", but is specifically a value used to represent memory sizes, so it's tied to the size of a pointer. That's why they're used as offsets in arrays -- not because they're counting items, but because they're being used as memory offsets. So while you use it in code, I'm not sure if you'd want that kind of semantic exposed as an option in a data file.

PossibleVariantTypes;
typedef std::list<std::pair<std::string, PossibleVariantTypes>> Pairs;

~CSV();

Expand Down Expand Up @@ -75,7 +111,11 @@ class CSV {

size_t rows();

double at(int row, const std::string& header);
template <class T1>
T1 at(int row, const std::string& header) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see -- now this is where I'd use a size_t, because it's an index. But this ties into existing underlying definitions, so I wouldn't change that now. (Actually, taking a closer look... the table uses an int as a key into a map... wut? Why is it not a vector? And can you have negative column/row indexes? Weird... still -- not something for this update.)

const int column = column_headers_.at(header);
return convert<T1>(table_.at(row).at(column));
}

friend std::ostream& operator<<(std::ostream& os, const CSV& csv);

Expand All @@ -87,6 +127,8 @@ class CSV {
void set_double_scientific(bool is_scientific) { double_is_scientific_ = is_scientific; }

protected:
std::string get_csv_string(const PossibleVariantTypes& val) const;

std::list<std::string> get_csv_line_elements(const std::string& str);

void write_headers();
Expand All @@ -100,7 +142,7 @@ class CSV {
// Key 1 : Row Index
// Key 2 : Column Index
// Value : Cell Value
std::map<int, std::map<int, double>> table_;
std::map<int, std::map<int, std::string>> table_;
int next_row_ = 0;

std::ofstream file_out_;
Expand All @@ -110,7 +152,6 @@ class CSV {
int double_precision_ = 13;
bool double_is_fixed_ = true;
bool double_is_scientific_ = false;

std::string headers_to_string() const;
std::string rows_to_string() const;
std::string row_to_string(const int& i) const;
Expand Down
45 changes: 15 additions & 30 deletions src/common/CSV.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,12 @@
*
*/

#include <fstream>
#include <iomanip>
#include <iostream>
#include <sstream>
#include "scrimmage/common/CSV.h"

#include <vector>

#include <boost/algorithm/string.hpp>
#include <boost/tokenizer.hpp>
#include <scrimmage/parse/ParseUtils.h>
#include "scrimmage/common/CSV.h"

using std::cout;
using std::endl;
Expand Down Expand Up @@ -73,14 +69,23 @@ void CSV::set_column_headers(const std::string& headers, bool write) {
set_column_headers(headers_vec, write);
}

std::string CSV::get_csv_string(const PossibleVariantTypes& v) const {
return std::visit(
StringifyVisitor{
.double_is_fixed = double_is_fixed_,
.double_is_scientific = double_is_scientific_,
.double_precision = double_precision_},
v);
}

bool CSV::append(const Pairs& pairs, bool write, bool keep_in_memory) {

for (std::pair<std::string, double> pair : pairs) {
for (auto pair : pairs) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code was making a copy of the pair every time, and the current code still does that. You want to iterate over the pairs by reference, so auto&, and if you're not changing anything, which you're not since pairs is const, then const auto&.

auto it = column_headers_.find(pair.first);
if (it == column_headers_.end()) {
cout << "Warning: column header doesn't exist: " << pair.first << endl;
}
table_[next_row_][it->second] = pair.second;
table_[next_row_][it->second] = get_csv_string(pair.second);
}

if (write) {
Expand Down Expand Up @@ -163,22 +168,7 @@ std::string CSV::row_to_string(const int& row) const {
std::vector<std::string> values(column_headers_.size(), no_value_str_);
auto it_row = table_.find(row);
for (auto& kv : it_row->second) {
if (static_cast<int64_t>(kv.second) == kv.second) {
values[kv.first] = std::to_string(static_cast<int64_t>(kv.second));
} else if (static_cast<double>(kv.second) == kv.second) {
// default precision values for double are not enough in many cases
std::ostringstream conv;
if (double_is_fixed_) {
conv << std::fixed;
}
if (double_is_scientific_) {
conv << std::scientific;
}
conv << std::setprecision(double_precision_) << kv.second;
values[kv.first] = conv.str();
} else {
values[kv.first] = std::to_string(kv.second);
}
values[kv.first] = kv.second;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work, but check the fuller context. Originally, the logic was: 1) build a vector of converted strings, 2) build an aggregate string out of those. You already have the strings, though, so your values vector here is really just making a bunch of copies. You could get rid of the copies by making it a vector of string ptrs (or references, but it'd have to be of reference_wrappers because you can't make a vector of raw references, but ptrs would do fine in this case). But you can get rid of values altogether; it was never really needed in the first place since you can append to the final string as you iterate. No real need for an intermediate list. And continually appending to a string like this isn't the most efficient since it's having to re-allocate a bigger and bigger string every time. Make result a std::ostringstream of appending is more efficient to the stream, and then return result.str() so it's converted only once at the end.

}

// Append the rows to the resultant string
Expand Down Expand Up @@ -238,7 +228,7 @@ bool CSV::read_csv_from_string(const std::string& csv_str, const bool& contains_
std::vector<std::string> tokens;
boost::split(tokens, line, boost::is_any_of(","));
for (unsigned int i = 0; i < tokens.size(); i++) {
table_[row_num][i] = std::stod(tokens[i]);
table_[row_num][i] = tokens[i];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be the most annoying thing, and is going to tie into CSV::row_to_string()... what happens if somebody writes a string that contains a comma? Look at the specification section at https://en.wikipedia.org/wiki/Comma-separated_values, and it tells you how to handle that situation. Any time you decide to allow strings in data files, that opens up "fun" edge cases that you have to think about. What about newlines? Carriage returns? Maybe CSV::append() needs to have more failure cases.

Note that when you handle things like escaped double quotes, those need to be handled via linearly scanning through the string; you can't do global substring substitutions because that may not work the way you expect it to for some cases.

}

// If this is the first line and the file doesn't contain a header,
Expand Down Expand Up @@ -284,11 +274,6 @@ size_t CSV::rows() {
return table_.size();
}

double CSV::at(int row, const std::string& header) {
const int column = column_headers_.at(header);
return table_.at(row).at(column);
}

std::list<std::string> CSV::get_csv_line_elements(const std::string& str) {
std::list<std::string> elems;
std::vector<std::string> tokens;
Expand Down