@@ -108,6 +108,39 @@ where the columns over which we are merging are `key_1` and `key_2`.
108108
109109- ` Series.unique() ` : Returns a series where duplicate values are dropped.
110110
111+ ## Note on reading switch files
112+
113+ When reading SWITCH csv files, it is recommended to use the following arguments in ` pd.read_csv() ` .
114+
115+ - ` index_col=False ` . This forces Pandas to not automatically use the
116+ first column as an index to ensure you are not using custom indexes
117+ (See notes on custom indexes above).
118+
119+ - ` dtype={"GENERATION_PROJECT": str} ` : If all the generation project IDs happen to be
120+ numbers, then Pandas will automatically set the ` GENERATION_PROJECT ` column type
121+ to ` int ` . However, we don't want this since this may cause issues when dealing with
122+ multiple dataframes, some of which have non-numeric IDs. (E.g. if you try merging
123+ a Dataframe where ` GENERATION_PROJECT ` is an ` int ` with another where it's a ` str ` , it
124+ won't work properly.)
125+
126+ - ` dtype=str ` : An even safer option than ` dtype={"GENERATION_PROJECT": str} ` is ` dtype=str ` instead.
127+ This is particularly important when reading a file that will than be re-outputed with minimal changes.
128+ Without this option, there's the risk of floating point values being slightly
129+ modified (see [ here] ( https://github.com/pandas-dev/pandas/issues/16452 ) ) or integer columns
130+ containing na values (` . ` ) being [ "promoted"] ( https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html?highlight=nan#na-type-promotions )
131+ to floats. Note that with ` dtype=str ` , all columns are strings so to do mathematical
132+ computation on a column it will first need to be converted with ` .astype() ` .
133+
134+ - ` na_values="." ` . Switch uses full stops to indicate an unspecified value. We want Pandas
135+ to interpret full stops as ` NaN ` rather than the string ` . ` so that the column type is
136+ still properly interpreted rather than being detected as a string.
137+
138+ Combining these parameters, here is an example of how to read a switch file.
139+
140+ ```
141+ df = pd.read_csv("some_SWITCH_file.csv", index_col=False, dtype={"GENERATION_PROJECT": str}, na_values=".")
142+ ```
143+
111144## Example
112145
113146This example shows how we can use Pandas to generate a more useful view
@@ -117,9 +150,11 @@ of our generation plants from the SWITCH input files.
117150import pandas as pd
118151
119152# READ
153+ # See note above on why we use these parameters
120154kwargs = dict (
121155 index_col = False ,
122- dtype = {" GENERATION_PROJECT" : str }, # This ensures that the project id column is read as a string not an int
156+ dtype = {" GENERATION_PROJECT" : str },
157+ na_values = " ." ,
123158)
124159gen_projects = pd.read_csv(" generation_projects_info.csv" , ** kwargs)
125160costs = pd.read_csv(" gen_build_costs.csv" , ** kwargs)
@@ -138,7 +173,7 @@ gen_projects = gen_projects.merge(
138173)
139174
140175# FILTER
141- # When uncommented will filter out all the projects that aren't wind.
176+ # When uncommented, this line will filter out all the projects that aren't wind.
142177# gen_projects = gen_projects[gen_projects["gen_energy_source"] == "Wind"]
143178
144179# WRITE
0 commit comments