Skip to content

Commit b2ce3e8

Browse files
authored
Merge pull request #121 from staadecker/plots
Paper figures, improved graphing code and other improvements
2 parents 78dc415 + 745fa05 commit b2ce3e8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+2625
-769
lines changed

docs/Pandas.md

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,39 @@ where the columns over which we are merging are `key_1` and `key_2`.
108108

109109
- `Series.unique()`: Returns a series where duplicate values are dropped.
110110

111+
## Note on reading switch files
112+
113+
When reading SWITCH csv files, it is recommended to use the following arguments in `pd.read_csv()`.
114+
115+
- `index_col=False`. This forces Pandas to not automatically use the
116+
first column as an index to ensure you are not using custom indexes
117+
(See notes on custom indexes above).
118+
119+
- `dtype={"GENERATION_PROJECT": str}`: If all the generation project IDs happen to be
120+
numbers, then Pandas will automatically set the `GENERATION_PROJECT` column type
121+
to `int`. However, we don't want this since this may cause issues when dealing with
122+
multiple dataframes, some of which have non-numeric IDs. (E.g. if you try merging
123+
a Dataframe where `GENERATION_PROJECT` is an `int` with another where it's a `str`, it
124+
won't work properly.)
125+
126+
- `dtype=str`: An even safer option than `dtype={"GENERATION_PROJECT": str}` is `dtype=str` instead.
127+
This is particularly important when reading a file that will than be re-outputed with minimal changes.
128+
Without this option, there's the risk of floating point values being slightly
129+
modified (see [here](https://github.com/pandas-dev/pandas/issues/16452)) or integer columns
130+
containing na values (`.`) being ["promoted"](https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html?highlight=nan#na-type-promotions)
131+
to floats. Note that with `dtype=str`, all columns are strings so to do mathematical
132+
computation on a column it will first need to be converted with `.astype()`.
133+
134+
- `na_values="."`. Switch uses full stops to indicate an unspecified value. We want Pandas
135+
to interpret full stops as `NaN` rather than the string `.` so that the column type is
136+
still properly interpreted rather than being detected as a string.
137+
138+
Combining these parameters, here is an example of how to read a switch file.
139+
140+
```
141+
df = pd.read_csv("some_SWITCH_file.csv", index_col=False, dtype={"GENERATION_PROJECT": str}, na_values=".")
142+
```
143+
111144
## Example
112145

113146
This example shows how we can use Pandas to generate a more useful view
@@ -117,9 +150,11 @@ of our generation plants from the SWITCH input files.
117150
import pandas as pd
118151

119152
# READ
153+
# See note above on why we use these parameters
120154
kwargs = dict(
121155
index_col=False,
122-
dtype={"GENERATION_PROJECT": str}, # This ensures that the project id column is read as a string not an int
156+
dtype={"GENERATION_PROJECT": str},
157+
na_values=".",
123158
)
124159
gen_projects = pd.read_csv("generation_projects_info.csv", **kwargs)
125160
costs = pd.read_csv("gen_build_costs.csv", **kwargs)
@@ -138,7 +173,7 @@ gen_projects = gen_projects.merge(
138173
)
139174

140175
# FILTER
141-
# When uncommented will filter out all the projects that aren't wind.
176+
# When uncommented, this line will filter out all the projects that aren't wind.
142177
# gen_projects = gen_projects[gen_projects["gen_energy_source"] == "Wind"]
143178

144179
# WRITE

0 commit comments

Comments
 (0)