@@ -108,6 +108,39 @@ where the columns over which we are merging are `key_1` and `key_2`.
108
108
109
109
- ` Series.unique() ` : Returns a series where duplicate values are dropped.
110
110
111
+ ## Note on reading switch files
112
+
113
+ When reading SWITCH csv files, it is recommended to use the following arguments in ` pd.read_csv() ` .
114
+
115
+ - ` index_col=False ` . This forces Pandas to not automatically use the
116
+ first column as an index to ensure you are not using custom indexes
117
+ (See notes on custom indexes above).
118
+
119
+ - ` dtype={"GENERATION_PROJECT": str} ` : If all the generation project IDs happen to be
120
+ numbers, then Pandas will automatically set the ` GENERATION_PROJECT ` column type
121
+ to ` int ` . However, we don't want this since this may cause issues when dealing with
122
+ multiple dataframes, some of which have non-numeric IDs. (E.g. if you try merging
123
+ a Dataframe where ` GENERATION_PROJECT ` is an ` int ` with another where it's a ` str ` , it
124
+ won't work properly.)
125
+
126
+ - ` dtype=str ` : An even safer option than ` dtype={"GENERATION_PROJECT": str} ` is ` dtype=str ` instead.
127
+ This is particularly important when reading a file that will than be re-outputed with minimal changes.
128
+ Without this option, there's the risk of floating point values being slightly
129
+ modified (see [ here] ( https://github.com/pandas-dev/pandas/issues/16452 ) ) or integer columns
130
+ containing na values (` . ` ) being [ "promoted"] ( https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html?highlight=nan#na-type-promotions )
131
+ to floats. Note that with ` dtype=str ` , all columns are strings so to do mathematical
132
+ computation on a column it will first need to be converted with ` .astype() ` .
133
+
134
+ - ` na_values="." ` . Switch uses full stops to indicate an unspecified value. We want Pandas
135
+ to interpret full stops as ` NaN ` rather than the string ` . ` so that the column type is
136
+ still properly interpreted rather than being detected as a string.
137
+
138
+ Combining these parameters, here is an example of how to read a switch file.
139
+
140
+ ```
141
+ df = pd.read_csv("some_SWITCH_file.csv", index_col=False, dtype={"GENERATION_PROJECT": str}, na_values=".")
142
+ ```
143
+
111
144
## Example
112
145
113
146
This example shows how we can use Pandas to generate a more useful view
@@ -117,9 +150,11 @@ of our generation plants from the SWITCH input files.
117
150
import pandas as pd
118
151
119
152
# READ
153
+ # See note above on why we use these parameters
120
154
kwargs = dict (
121
155
index_col = False ,
122
- dtype = {" GENERATION_PROJECT" : str }, # This ensures that the project id column is read as a string not an int
156
+ dtype = {" GENERATION_PROJECT" : str },
157
+ na_values = " ." ,
123
158
)
124
159
gen_projects = pd.read_csv(" generation_projects_info.csv" , ** kwargs)
125
160
costs = pd.read_csv(" gen_build_costs.csv" , ** kwargs)
@@ -138,7 +173,7 @@ gen_projects = gen_projects.merge(
138
173
)
139
174
140
175
# FILTER
141
- # When uncommented will filter out all the projects that aren't wind.
176
+ # When uncommented, this line will filter out all the projects that aren't wind.
142
177
# gen_projects = gen_projects[gen_projects["gen_energy_source"] == "Wind"]
143
178
144
179
# WRITE
0 commit comments