diff --git a/.gitignore b/.gitignore index df0878d..244100a 100644 --- a/.gitignore +++ b/.gitignore @@ -1,18 +1,10 @@ *~ config.yml -config_10*.yml -config_annual.yml run_da_*.sh output_logfile*.txt -output_logfile*.out -notes.txt -da_explore_variance.py -other_models.py -other_stuff/* +code_v1_archive/* +develop_method/* +saved_config/* more/* -development/* -process_output/* __pycache__/* -extra_config/* -compare_methods/* -saved_config/* + diff --git a/README.md b/README.md index e4dc903..06bbe4e 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,11 @@ Written by: Michael P. Erb, Contact: michael.erb@nau.edu ## 1. Introduction -Data and models are two methods of exploring past climate. Data (such as proxy records) provide point data and models simulate climate changes and climate-system interactions. The goal of this Holocene Reconstruction project is to use data assimilation--a method of combining information from proxy data and model results--to reconstruct climate over the past 12,000 years. +Data and models are two methods of exploring past climate. Data (such as proxy records) provide point data and models simulate climate changes and climate-system interactions. The goal of this Holocene Reconstruction project is to use paleoclimate data assimilation--a method of combining information from proxy data and model results--to reconstruct climate over the past 12,000 years. This GitHub repository contains the Holocene Reconstruction code, and this readme file explains how to set it up and use it. -This code and readme are still under development. +This code and readme are still under development. To read about the Holocene reconstruction made using v1.0.0 of this code, see Erb et al., in press: "Reconstructing Holocene temperatures in time and space using paleoclimate data assimilation" ## 2. Getting started @@ -15,21 +15,23 @@ The Holocene Reconstruction code is written in Python 3. The instructions below ### 2.1. Getting the Holocene Reconstruction code -Clone the Github repository into your Linux environment with the command: +To get v1.0.0 of the code, search for "Holocene reconstruction code" on Zenodo. + +Alternately, you may be able to find a newer version of code on Github. Clone the Github repository into your Linux environment with the command: git clone https://github.com/Holocene-Reconstruction/Holocene-code.git ### 2.2. Getting the necessary data files -The code uses climate model output as well as proxy data files. To get this data, download the zip file from this link: [data on Google Drive](https://drive.google.com/file/d/1Iqfbpa4mhoIw_ccKYzfljkkTTz47HKzJ/view?usp=sharing) +The code uses climate model output as well as proxy data files. To get this data, download the zip file from this link: https://doi.org/10.5281/zenodo.7407116 -Put this fine in a convenient place and unzip it using `unzip holocene_da_data.zip`. It should contain the following subdirectories: +Put this file in a convenient place and unzip it using `unzip holocene_da_data.zip`. It should contain the following subdirectories: models/ Model output proxies/ Proxy files results/ Results of the data assimilation (initially empty) -The data set currently includes only the TraCE-21ka model output and Temp12k proxies. Other data may be added later. +The data set includes TraCE-21ka temperature output, HadCM3 temperature output, and Temp12k proxies. ### 2.3. Installing Python 3 and necessary packages @@ -37,15 +39,17 @@ Make sure that you have Python 3 installed. If you don't, one option is the pac Most of the necessary packages should come with the standard Python 3 installation, installed using `conda create -n python3_da anaconda`. The only ones you should need to install are xarray, netCDF4, and lipd. Either install these yourself (note: LiPD can be installed with `pip install LiPD`) or go to your Holocene Reconstruction directory and use the commands below to create a new Python 3 environment with the necessary packages and switch to it: - conda env create -f environment_da.py + conda env create -f environment_da.yml conda activate python3_da ### 2.4. First-time setup Before running the Holocene Reconstruction code for the first time, do the following: - 1. Open config_default.py. + 1. Open config_default.yml. 2. Near the top, change the 'data_dir' directory to the location of the unzipped data from section 2.2 above (i.e., the directory which has the models/, proxies/, and results/ subdirectories.) - 3. Save config_default.py and copy it to config.py. You can set up new experiments in config.py while keeping the original file for reference. + 3. Save config_default.yml and copy it to config.yml. You can set up new experiments in config.yml while keeping the original file for reference. + +The "default" settings are used for the main experiment shown in Erb et al., in press "Reconstructing Holocene temperatures in time and space using paleoclimate data assimilation". ## 3. Running the code @@ -67,7 +71,7 @@ The file `config.yml` contains a variety of settings for running new experiments #### Age range to reconstruct -By default, the Holocene Reconstruction code reconstructs the past 12,000 years. This is set in the `age_range_to_reconstruct` variable (default: [0,12000]), but can be shortened or lengthened as requested. The Temp12k proxy database contains proxy data older than 12ka, but the quality of the reconstruction will decline as the number of available proxies diminish further back in time. Another relevant variable is `reference_period` (default: [0,5000]), which defines the reference period used for each proxy. Proxies without data in the selected reference period will not be used. +By default, the Holocene Reconstruction code reconstructs the past 12,000 years. This is set in the `age_range_to_reconstruct` variable (default: [0,12000]), but can be shortened or lengthened as requested. The Temp12k proxy database contains proxy data older than 12ka, but the quality of the reconstruction will decline as the number of available proxies diminish further back in time. Another relevant variable is `reference_period` (default: [3000,5000]), which defines the reference period used for each proxy. When doing a reconstruction relative to a reference period, proxies without data in the selected reference period will not be used. #### Proxies to assimilate @@ -75,7 +79,7 @@ The variable `proxy_datasets_to_assimilate` (default: ['temp12k']) lists the pro #### Models to use as the prior -The variable `models_for_prior` (default: ['trace_regrid']) defines the model simulation(s) to be used as the model prior. Only TraCE-21ka is included in the data set download in section 2.2 above, but HadCM3 will be added soon. +The variable `models_for_prior` (default: ['hadcm3_regrid','trace_regrid']) defines the model simulation(s) to be used as the model prior. #### The size of the prior 'window' @@ -83,22 +87,26 @@ The model prior consists of a set of model states chosen from the simulation(s) #### Time resolutions -The Holocene Reconstruction uses a multi-timescale approach to data assimilation, which can assimilate proxies at a variety of timescales. To change the details of how this works, the variables `time_resolution` (default: 10) and `maximum_resolution` (default: 1000) can be changed. `time_resolution` defines the base temporal resolution, in years, used in the reconstruction. Proxy data will be preprocessed at this resolution and the final reconstruction will be output at this resolution. Few proxies in the Temp12k database have temporal resolution finer than 10 years, so little information should be lost when using this base resolution. Changing this number should also affect the speed of the data assimilation, but this has not been well-tested. `maximum_resolution` specifies the maximum temporal resolution, in years, that proxy data points will be treated as. This option does not affect which proxies are assimilated or not, but only affects the timescale which is used when translating proxy data to the larger climate system; proxy data points which appear to represent longer periods of time will be assumed to represent this amount of time instead. +The Holocene Reconstruction uses a multi-timescale approach to data assimilation, which can assimilate proxies at a variety of timescales. To change the details of how this works, the variables `time_resolution` (default: 10) and `maximum_resolution` (default: 1000) can be changed. `time_resolution` defines the base temporal resolution, in years, used in the reconstruction. Proxy data will be preprocessed at this resolution and the final reconstruction will be output at this resolution. Few proxies in the Temp12k database have temporal resolution finer than 10 years, so little information should be lost when using this base resolution. Changing this number should also affect the speed of the data assimilation, but this has not been well-tested. `maximum_resolution` specifies the maximum temporal resolution, in years, that proxy data points will be treated as. This option does not affect which proxies are assimilated or omitted, but only affects the timescale which is used when translating proxy data to the larger climate system; proxy data points which appear to represent longer periods of time will be assumed to represent this amount of time instead. #### Localization radius -If a localization radius is desired, set it in the `localization_radius` (default: None) variable, in units of meters. To use a localization radius, the variable `assimate_together` (default: True) should be set to False, or else the localization radius will not be used. Setting `assimilate_together` to False will significantly slow down the code, since it will assimilate proxies one at a time instead of all together. However, this must be done for a localization radius to work. +If a localization radius is desired, set it in the `localization_radius` (default: None) variable, in units of meters. To use a localization radius, the variable `assimate_together` (default: True) should be set to False, or else the localization radius will not be used. Setting `assimilate_together` to False will probably slow down the code, since it will assimilate proxies one at a time instead of all together. However, this must be done for a localization radius to work in this code. #### Assimilating only a portion of the proxy database -By default, all valid proxies are assimilated. To assimilate only a portion of the proxy database, set the variable `percent_to_assimilate` (default: 100) to a lower number. +To only assimilate some of the proxies in the database, modify the variables `assimilate_selected_seasons`, `assimilate_selected_archives`, `assimilate_selected_region`, and/or `assimilate_selected_resolution`. See the config file for more details. Note: the `assimilate_selected_resolution` option should be improved before use. + +Additionally, to assimilate only a portion of the proxy database, set the variable `percent_to_assimilate` (default: 100) to a lower number. #### Using pseudoproxies To generate/use pseudoproxies, the string in 'proxy_datasets_to_assimilate' should be given in the form ['pseudo_VAR1_using_VAR2_noise_VAR3'], where: - VAR1: proxy dataset [e.g. 'temp12k','basicgrid10','basicgrid5'] - VAR2: model [e.g. 'hadcm3','trace','famous'] - VAR3: noise [e.g. 'none','whitesnr05','whiteproxyrmse'] + + VAR1: The space/time structure of the pseudoproxy database (e.g. 'temp12k','basicgrid10','basicgrid5') + VAR2: The model that the pseudoproxies are generated from (e.g. 'hadcm3','trace','famous') + VAR3: The noise added to the pseudoproxies (e.g. 'none','whiteproxyrmse') + Example: ['pseudo_temp12k_using_hadcm3_noise_whiteproxyrmse'] @@ -114,39 +122,38 @@ For now, all reconstructed values and proxy inputs are temperature. #### The gridded Holocene Reconstruction - recon_mean [age,lat,lon] The mean of the Holocene Reconstruction - recon_median [age,lat,lon] The median of the Holocene Reconstruction - recon_ens [age,ens_selected,lat,lon] Selected ensemble members of the Holocene Reconstruction - recon_gmt [age,ens] The global-mean of the Holocene Reconstruction for all ensemble members - ages [age] Age (year BP) - lat [lat] Latitude - lon [lon] Longitude + recon_tas_mean [age,lat,lon] The ensemble-mean of the Holocene Reconstruction + recon_tas_ens [age,ens_selected,lat,lon] Selected ensemble members of the Holocene Reconstruction + recon_tas_global_mean [age,ens] The global-mean of the Holocene Reconstruction for all ensemble members + recon_tas_nh_mean [age,ens] The NH-mean of the Holocene Reconstruction for all ensemble members + recon_tas_sh_mean [age,ens] The SH-mean of the Holocene Reconstruction for all ensemble members + ages [age] Age (year BP) + lat [lat] Latitude + lon [lon] Longitude + +#### The prior + + prior_tas_mean [age,lat,lon] The ensemble-mean of the prior + prior_tas_global_mean [age,ens] The global-mean of the prior for all ensemble members #### The proxies - proxy_values [proxy,age] The proxy values, binned and/or nearest neighbor interpolated to the base temporal resolution - proxy_resolutions [proxy,age] The effective temporal resolution of the proxy values, in years + proxy_values [age,proxy] The proxy values, binned and/or nearest neighbor interpolated to the base temporal resolution + proxy_resolutions [age,proxy] The effective temporal resolution of the proxy values, in years proxy_uncertainty [proxy] The uncertainty values used for each proxy - proxy_metadata [proxy,metadata] Additional metadata about the proxies (datasetname,TSid,lat,lon,seasonality,seasonality_general,median_age_res,collection) + proxy_metadata [proxy,metadata] Additional metadata about the proxies (datasetname,TSid,lat,lon,seasonality,seasonality_general,median_age_res_calc*,collection) *The "median_age_res_calc" variable may not accurately represent the proxy resolution. You may want to load the proxy data and calculate this yourself. #### Reconstruction of the proxies (at proxy locations and seasonalities) proxyrecon_mean [age,proxy] The mean of the proxy reconstructions - proxyrecon_median [age,proxy] The median of the proxy reconstructions proxyrecon_ens [age,ens_selected,proxy] Selected ensemble members of the proxy reconstructions #### Additional outputs options [options] The values set in the config.yml file when running the reconstruction - prior_gmt [age,ens] The global-mean of the prior for all ensemble members - proxies_selected [proxy] The proxies which, in theory, were selected for assimilation according to 'percent_to_assimilate' + proxies_selected [proxy] The proxies which were selected for assimilation proxies_assimilated [age,proxy] The proxies which were actually assimilated at each time step. This may differ from 'proxies_selected' in some cases. - -### 5.2. Basic analysis - -Within your Holocene Reconstruction directory, the subdirectory 'analysis' contains a Python 3 script for making a simple analysis. Feel free to use this script as a starting point for more in-depth analysis. - -Before running the script, open it in a text editor and update the `recon_dir` and `recon_filename` variables to point to the netCDF output you want to analyze. + proxyprior_mean [age,proxy] The mean of the prior estimates of the proxies --- diff --git a/config_default.yml b/config_default.yml index 54a8f6d..604264d 100644 --- a/config_default.yml +++ b/config_default.yml @@ -8,30 +8,32 @@ data_dir: '/projects/pd_lab/data/data_assimilation/' # Primary options -exp_name: 'default' # The experiment name, which will be appended to the filename. Pick something short & memorable, using only letters, numbers, and underscores. -vars_to_reconstruct: ['tas'] # The variables to reconstruct {['tas']}. Note, only tas is fully supported at the moment. -time_resolution: 10 # The base temporal resolution, in years {10} -maximum_resolution: 1000 # The maximum temporal resolution to assimilate proxies at, in years {1000} -prior_window: 5010 # The size of the window for the time-varying prior, in years {'all',5010,1010} -prior_mean_always_0: False # Is the mean of the prior allowed to change through time? {True,False}. If true, the mean of the prior is always 0. -assimate_together: True # If False, assimile proxies one at a time (slower) {True,False} -localization_radius: None # Localization radius, in meters {None,15000} Only works if assimate_together=False -percent_of_prior: 100 # What percentage of the prior states to assimilate {100}. Note, a value of 8.7 -> 100 ens, if other settings are default -seed_for_prior: 0 # This seed is only used if 'percent_of_prior' is below 100 {0} +exp_name: 'default' # The experiment name, which will be appended to the filename. Pick something short & memorable, using only letters, numbers, and underscores. +vars_to_reconstruct: ['tas'] # The variables to reconstruct {['tas','precip']}. +season_to_reconstruct: 'annual' # The season to reconstruct {'annual','jja','djf'} +time_resolution: 10 # The base temporal resolution, in years {10} +maximum_resolution: 1000 # The maximum temporal resolution to assimilate proxies at, in years {1000} +prior_window: 5010 # The size of the window for the time-varying prior, in years {'all',5010,1010} +prior_mean_always_0: False # Is the mean of the prior allowed to change through time? {True,False}. If true, the mean of the prior is always 0. +assimate_together: True # If False, assimile proxies one at a time (slower) {True,False} +localization_radius: None # Localization radius, in meters {None,15000} Only works if assimate_together=False +percent_of_prior: 100 # What percentage of the prior states to assimilate {100}. Note, a value of 8.7 -> 100 ens, if other settings are default +seed_for_prior: 0 # This seed is only used if 'percent_of_prior' is below 100 {0} # Proxies to assimilate -assimilate_selected_seasons: ['annual','summerOnly','winterOnly'] # Assimilate proxies with the selected seasonality. Note, the reconstruction will still try to represent annual means. {False, ['annual','summerOnly','winterOnly','summer+','winter+']} +assimilate_selected_seasons: ['annual','summeronly','winteronly'] # Assimilate proxies with the selected seasonality. Note, the reconstruction will still try to represent annual means. {False, ['annual','summeronly','winteronly','summer+','winter+'], 'jja_preferred', 'djf_preferred'} assimilate_selected_archives: False # Provide a list of one or more archive types to assimilate {False, ['LakeSediment','MarineSediment','Peat','GlacierIce','Midden','Speleothem','Wood','Ice-other']} assimilate_selected_region: False # Provide bounds to assimilate only proxies in a given region {False, [lat_min,lat_max,lon_min,lon_max]} using lons between 0-360. -assimilate_selected_resolution: False # Assimilate proxies with median resolutions in a given band {False, [0,100]}. Bounds are [inclusive,exclusive]. +assimilate_selected_resolution: False # Assimilate proxies with median resolutions in a given band {False, [0,100]}. Bounds are [inclusive,exclusive]. Note: this option should be improved before use. percent_to_assimilate: 100 # What percentage of proxies to assimilate {100} seed_for_proxy_choice: 0 # This seed is only used if 'percent_to_assimilate' is below 100 {0} # Models for prior {['hadcm3_regrid','trace_regrid','famous_regrid']} -models_for_prior: ['trace_regrid'] +models_for_prior: ['hadcm3_regrid','trace_regrid'] -# Proxy datasets {['temp12k'],['pages2k'],['pseudo_temp12k_using_trace_noise_whiteproxyrmse']} +# Proxy datasets {['temp12k','hydro2k'],['pages2k'],['pseudo_temp12k_using_trace_noise_whiteproxyrmse']} proxy_datasets_to_assimilate: ['temp12k'] +version_temp12k: '1_0_2' # The version of temp12k to use, if it's included in 'proxy_datasets_to_assimilate' {'1_0_2','1_1_0'} # Age ranges age_range_to_reconstruct: [0,12000] @@ -39,6 +41,7 @@ reference_period: [3000,5000] # This is only used if 'reconstruction_ty age_range_model: [0,22000] # Experimental options (These options are less well-tested or may not be fully implemented. Edit with care.) +age_uncertain_method: False # If True, age uncertainty is incorporated in the proxy values and time-varying uncertainty; if False, the original method is used, with a single-value R value. This only works if "time_resolution" is set to 10 {True,False} reconstruction_type: 'relative' # Reconstruction absolute or relative values {'relative','absolute'}. If 'absolute', prior_mean_always_0 is ignored. model_processing: None # Processing the model prior {None,'linear_global','linear_spatial',None} assign_seasonality: False # If something other than false, all proxies are set to specific seasonalities {False,'annual','summer','winter','jja','djf'} diff --git a/config_kmat.yml b/config_kmat.yml new file mode 100644 index 0000000..3c07269 --- /dev/null +++ b/config_kmat.yml @@ -0,0 +1,54 @@ +#============================================== +# SETTINGS for running the DA code. +# Some options are listed in curly brackets. +# See the README.md file for more explaination. +#============================================== + +# Location of data and lipd directories +data_dir: '/projects/pd_lab/data/data_assimilation/' + +# Primary options +exp_name: 'arctic_raw_21k_200yr_locrad_5k_kmat_2model' # The experiment name, which will be appended to the filename. Pick something short & memorable, using only letters, numbers, and underscores. +vars_to_reconstruct: ['tas'] # The variables to reconstruct {['tas','precip']}. +season_to_reconstruct: 'annual' # The season to reconstruct {'annual','jja','djf'} +time_resolution: 200 # The base temporal resolution, in years {10} +maximum_resolution: 200 # The maximum temporal resolution to assimilate proxies at, in years {1000} +prior_window: 5200 # The size of the window for the time-varying prior, in years {'all',5010,1010} +prior_mean_always_0: False # Is the mean of the prior always set to 0 through time? {True,False}. If false, the mean of the prior is allowed to change through time. +assimate_together: False # If False, assimile proxies one at a time (slower) {True,False} +localization_radius: 5000 # Localization radius, in meters {None,15000} Only works if assimate_together=False +percent_of_prior: 100 # What percentage of the prior states to assimilate {100}. Note, a value of 8.7 -> 100 ens, if other settings are default +seed_for_prior: 0 # This seed is only used if 'percent_of_prior' is below 100 {0} + +# Proxies to assimilate +psms_to_use: ['calibrated_tas'] # Choose the PSMs to use {['get_tas','get_precip']}. Proxies that don't match the criteria for one of these PSMs won't be used. +assimilate_selected_seasons: ['annual','summer','winter'] # Assimilate proxies with the selected seasonality. Note, the reconstruction will still try to represent annual means. {False, ['annual','summeronly','winteronly','summer+','winter+'], 'jja_preferred', 'djf_preferred'} +assimilate_selected_archives: False # Provide a list of one or more archive types to assimilate {False, ['LakeSediment','MarineSediment','Peat','GlacierIce','Midden','Speleothem','Wood','Ice-other']} +assimilate_selected_region: False # Provide bounds to assimilate only proxies in a given region {False, [lat_min,lat_max,lon_min,lon_max]} using lons between 0-360. +assimilate_selected_resolution: False # Assimilate proxies with median resolutions in a given band {False, [0,100]}. Bounds are [inclusive,exclusive]. +percent_to_assimilate: 100 # What percentage of proxies to assimilate {100} +seed_for_proxy_choice: 0 # This seed is only used if 'percent_to_assimilate' is below 100 {0} + +# Models for prior {['hadcm3_regrid','trace_regrid','famous_regrid']} +models_for_prior: ['hadcm3_regrid','trace_regrid'] + +# Proxy datasets {['temp12k','hydro12k','raw'],['pages2k'],['pseudo_temp12k_using_trace_noise_whiteproxyrmse']} +proxy_datasets_to_assimilate: ['raw'] +version_temp12k: 'raw' # The version of temp12k to use, if it's included in 'proxy_datasets_to_assimilate' {'1_0_2','1_1_0','raw_positive_corr'} + +# Age ranges +age_range_to_reconstruct: [0,21000] +reference_period: [3000,5000] # This is only used if 'reconstruction_type' is set to 'relative' +age_range_model: [0,22000] + +# Experimental options (These options are less well-tested or may not be fully implemented. Edit with care.) +age_uncertain_method: False # If True, age uncertainty is incorporated in the proxy values and time-varying uncertainty; if False, the original method is used, with a single-value R value. This only works if "time_resolution" is set to 10 {True,False} +reconstruction_type: 'relative' # Reconstruction absolute or relative values {'relative','absolute'}. If 'absolute', prior_mean_always_0 is ignored. +model_processing: None # Processing the model prior {None,'linear_global','linear_spatial',None} +assign_seasonality: False # If something other than false, all proxies are set to specific seasonalities {False,'annual','summer','winter','jja','djf'} +change_uncertainty: False # Change uncertainty values in several ways {False,'mult_#','all_#','path/to/file.txt'} + # False Use values from metadata. + # 'mult_#' Multiply uncertainty values from metadata by a constant number. + # 'all_#' Set uncertainty values to a constant number. + # 'path/to/file.txt' Use uncertainty values from a text file. The file must contain TSids and new MSE values for each proxy. + diff --git a/config_kmat_hadcm3.yml b/config_kmat_hadcm3.yml new file mode 100644 index 0000000..454cace --- /dev/null +++ b/config_kmat_hadcm3.yml @@ -0,0 +1,54 @@ +#============================================== +# SETTINGS for running the DA code. +# Some options are listed in curly brackets. +# See the README.md file for more explaination. +#============================================== + +# Location of data and lipd directories +data_dir: '/projects/pd_lab/data/data_assimilation/' + +# Primary options +exp_name: 'arctic_raw_21k_200yr_locrad_5k_kmat_hadcm3' # The experiment name, which will be appended to the filename. Pick something short & memorable, using only letters, numbers, and underscores. +vars_to_reconstruct: ['tas'] # The variables to reconstruct {['tas','precip']}. +season_to_reconstruct: 'annual' # The season to reconstruct {'annual','jja','djf'} +time_resolution: 200 # The base temporal resolution, in years {10} +maximum_resolution: 200 # The maximum temporal resolution to assimilate proxies at, in years {1000} +prior_window: 5200 # The size of the window for the time-varying prior, in years {'all',5010,1010} +prior_mean_always_0: False # Is the mean of the prior always set to 0 through time? {True,False}. If false, the mean of the prior is allowed to change through time. +assimate_together: False # If False, assimile proxies one at a time (slower) {True,False} +localization_radius: 5000 # Localization radius, in meters {None,15000} Only works if assimate_together=False +percent_of_prior: 100 # What percentage of the prior states to assimilate {100}. Note, a value of 8.7 -> 100 ens, if other settings are default +seed_for_prior: 0 # This seed is only used if 'percent_of_prior' is below 100 {0} + +# Proxies to assimilate +psms_to_use: ['calibrated_tas'] # Choose the PSMs to use {['get_tas','get_precip']}. Proxies that don't match the criteria for one of these PSMs won't be used. +assimilate_selected_seasons: ['annual','summer','winter'] # Assimilate proxies with the selected seasonality. Note, the reconstruction will still try to represent annual means. {False, ['annual','summeronly','winteronly','summer+','winter+'], 'jja_preferred', 'djf_preferred'} +assimilate_selected_archives: False # Provide a list of one or more archive types to assimilate {False, ['LakeSediment','MarineSediment','Peat','GlacierIce','Midden','Speleothem','Wood','Ice-other']} +assimilate_selected_region: False # Provide bounds to assimilate only proxies in a given region {False, [lat_min,lat_max,lon_min,lon_max]} using lons between 0-360. +assimilate_selected_resolution: False # Assimilate proxies with median resolutions in a given band {False, [0,100]}. Bounds are [inclusive,exclusive]. +percent_to_assimilate: 100 # What percentage of proxies to assimilate {100} +seed_for_proxy_choice: 0 # This seed is only used if 'percent_to_assimilate' is below 100 {0} + +# Models for prior {['hadcm3_regrid','trace_regrid','famous_regrid']} +models_for_prior: ['hadcm3_regrid'] + +# Proxy datasets {['temp12k','hydro12k','raw'],['pages2k'],['pseudo_temp12k_using_trace_noise_whiteproxyrmse']} +proxy_datasets_to_assimilate: ['raw'] +version_temp12k: 'raw' # The version of temp12k to use, if it's included in 'proxy_datasets_to_assimilate' {'1_0_2','1_1_0','raw_positive_corr'} + +# Age ranges +age_range_to_reconstruct: [0,21000] +reference_period: [3000,5000] # This is only used if 'reconstruction_type' is set to 'relative' +age_range_model: [0,22000] + +# Experimental options (These options are less well-tested or may not be fully implemented. Edit with care.) +age_uncertain_method: False # If True, age uncertainty is incorporated in the proxy values and time-varying uncertainty; if False, the original method is used, with a single-value R value. This only works if "time_resolution" is set to 10 {True,False} +reconstruction_type: 'relative' # Reconstruction absolute or relative values {'relative','absolute'}. If 'absolute', prior_mean_always_0 is ignored. +model_processing: None # Processing the model prior {None,'linear_global','linear_spatial',None} +assign_seasonality: False # If something other than false, all proxies are set to specific seasonalities {False,'annual','summer','winter','jja','djf'} +change_uncertainty: False # Change uncertainty values in several ways {False,'mult_#','all_#','path/to/file.txt'} + # False Use values from metadata. + # 'mult_#' Multiply uncertainty values from metadata by a constant number. + # 'all_#' Set uncertainty values to a constant number. + # 'path/to/file.txt' Use uncertainty values from a text file. The file must contain TSids and new MSE values for each proxy. + diff --git a/config_kmat_trace.yml b/config_kmat_trace.yml new file mode 100644 index 0000000..b817296 --- /dev/null +++ b/config_kmat_trace.yml @@ -0,0 +1,54 @@ +#============================================== +# SETTINGS for running the DA code. +# Some options are listed in curly brackets. +# See the README.md file for more explaination. +#============================================== + +# Location of data and lipd directories +data_dir: '/projects/pd_lab/data/data_assimilation/' + +# Primary options +exp_name: 'arctic_raw_21k_200yr_locrad_5k_kmat_trace' # The experiment name, which will be appended to the filename. Pick something short & memorable, using only letters, numbers, and underscores. +vars_to_reconstruct: ['tas'] # The variables to reconstruct {['tas','precip']}. +season_to_reconstruct: 'annual' # The season to reconstruct {'annual','jja','djf'} +time_resolution: 200 # The base temporal resolution, in years {10} +maximum_resolution: 200 # The maximum temporal resolution to assimilate proxies at, in years {1000} +prior_window: 5200 # The size of the window for the time-varying prior, in years {'all',5010,1010} +prior_mean_always_0: False # Is the mean of the prior always set to 0 through time? {True,False}. If false, the mean of the prior is allowed to change through time. +assimate_together: False # If False, assimile proxies one at a time (slower) {True,False} +localization_radius: 5000 # Localization radius, in meters {None,15000} Only works if assimate_together=False +percent_of_prior: 100 # What percentage of the prior states to assimilate {100}. Note, a value of 8.7 -> 100 ens, if other settings are default +seed_for_prior: 0 # This seed is only used if 'percent_of_prior' is below 100 {0} + +# Proxies to assimilate +psms_to_use: ['calibrated_tas'] # Choose the PSMs to use {['get_tas','get_precip']}. Proxies that don't match the criteria for one of these PSMs won't be used. +assimilate_selected_seasons: ['annual','summer','winter'] # Assimilate proxies with the selected seasonality. Note, the reconstruction will still try to represent annual means. {False, ['annual','summeronly','winteronly','summer+','winter+'], 'jja_preferred', 'djf_preferred'} +assimilate_selected_archives: False # Provide a list of one or more archive types to assimilate {False, ['LakeSediment','MarineSediment','Peat','GlacierIce','Midden','Speleothem','Wood','Ice-other']} +assimilate_selected_region: False # Provide bounds to assimilate only proxies in a given region {False, [lat_min,lat_max,lon_min,lon_max]} using lons between 0-360. +assimilate_selected_resolution: False # Assimilate proxies with median resolutions in a given band {False, [0,100]}. Bounds are [inclusive,exclusive]. +percent_to_assimilate: 100 # What percentage of proxies to assimilate {100} +seed_for_proxy_choice: 0 # This seed is only used if 'percent_to_assimilate' is below 100 {0} + +# Models for prior {['hadcm3_regrid','trace_regrid','famous_regrid']} +models_for_prior: ['trace_regrid'] + +# Proxy datasets {['temp12k','hydro12k','raw'],['pages2k'],['pseudo_temp12k_using_trace_noise_whiteproxyrmse']} +proxy_datasets_to_assimilate: ['raw'] +version_temp12k: 'raw' # The version of temp12k to use, if it's included in 'proxy_datasets_to_assimilate' {'1_0_2','1_1_0','raw_positive_corr'} + +# Age ranges +age_range_to_reconstruct: [0,21000] +reference_period: [3000,5000] # This is only used if 'reconstruction_type' is set to 'relative' +age_range_model: [0,22000] + +# Experimental options (These options are less well-tested or may not be fully implemented. Edit with care.) +age_uncertain_method: False # If True, age uncertainty is incorporated in the proxy values and time-varying uncertainty; if False, the original method is used, with a single-value R value. This only works if "time_resolution" is set to 10 {True,False} +reconstruction_type: 'relative' # Reconstruction absolute or relative values {'relative','absolute'}. If 'absolute', prior_mean_always_0 is ignored. +model_processing: None # Processing the model prior {None,'linear_global','linear_spatial',None} +assign_seasonality: False # If something other than false, all proxies are set to specific seasonalities {False,'annual','summer','winter','jja','djf'} +change_uncertainty: False # Change uncertainty values in several ways {False,'mult_#','all_#','path/to/file.txt'} + # False Use values from metadata. + # 'mult_#' Multiply uncertainty values from metadata by a constant number. + # 'all_#' Set uncertainty values to a constant number. + # 'path/to/file.txt' Use uncertainty values from a text file. The file must contain TSids and new MSE values for each proxy. + diff --git a/config_replicate_default.yml b/config_replicate_default.yml new file mode 100644 index 0000000..b1aa10e --- /dev/null +++ b/config_replicate_default.yml @@ -0,0 +1,53 @@ +#============================================== +# SETTINGS for running the DA code. +# Some options are listed in curly brackets. +# See the README.md file for more explaination. +#============================================== + +# Location of data and lipd directories +data_dir: '/projects/pd_lab/data/data_assimilation/' + +# Primary options +exp_name: 'default_replicate_test' # The experiment name, which will be appended to the filename. Pick something short & memorable, using only letters, numbers, and underscores. +vars_to_reconstruct: ['tas'] # The variables to reconstruct {['tas','precip']}. +season_to_reconstruct: 'annual' # The season to reconstruct {'annual','jja','djf'} +time_resolution: 10 # The base temporal resolution, in years {10} +maximum_resolution: 1000 # The maximum temporal resolution to assimilate proxies at, in years {1000} +prior_window: 5010 # The size of the window for the time-varying prior, in years {'all',5010,1010} +prior_mean_always_0: False # Is the mean of the prior allowed to change through time? {True,False}. If true, the mean of the prior is always 0. +assimate_together: True # If False, assimile proxies one at a time (slower) {True,False} +localization_radius: None # Localization radius, in meters {None,15000} Only works if assimate_together=False +percent_of_prior: 100 # What percentage of the prior states to assimilate {100}. Note, a value of 8.7 -> 100 ens, if other settings are default +seed_for_prior: 0 # This seed is only used if 'percent_of_prior' is below 100 {0} + +# Proxies to assimilate +assimilate_selected_seasons: ['annual','summeronly','winteronly'] # Assimilate proxies with the selected seasonality. Note, the reconstruction will still try to represent annual means. {False, ['annual','summeronly','winteronly','summer+','winter+'], 'jja_preferred', 'djf_preferred'} +assimilate_selected_archives: False # Provide a list of one or more archive types to assimilate {False, ['LakeSediment','MarineSediment','Peat','GlacierIce','Midden','Speleothem','Wood','Ice-other']} +assimilate_selected_region: False # Provide bounds to assimilate only proxies in a given region {False, [lat_min,lat_max,lon_min,lon_max]} using lons between 0-360. +assimilate_selected_resolution: False # Assimilate proxies with median resolutions in a given band {False, [0,100]}. Bounds are [inclusive,exclusive]. +percent_to_assimilate: 100 # What percentage of proxies to assimilate {100} +seed_for_proxy_choice: 0 # This seed is only used if 'percent_to_assimilate' is below 100 {0} + +# Models for prior {['hadcm3_regrid','trace_regrid','famous_regrid']} +models_for_prior: ['hadcm3_regrid','trace_regrid'] + +# Proxy datasets {['temp12k','hydro12k'],['pages2k'],['pseudo_temp12k_using_trace_noise_whiteproxyrmse']} +proxy_datasets_to_assimilate: ['temp12k'] +version_temp12k: '1_0_2' # The version of temp12k to use, if it's included in 'proxy_datasets_to_assimilate' {'1_0_2','1_1_0'} + +# Age ranges +age_range_to_reconstruct: [0,12000] +reference_period: [3000,5000] # This is only used if 'reconstruction_type' is set to 'relative' +age_range_model: [0,22000] + +# Experimental options (These options are less well-tested or may not be fully implemented. Edit with care.) +age_uncertain_method: False # If True, age uncertainty is incorporated in the proxy values and time-varying uncertainty; if False, the original method is used, with a single-value R value. This only works if "time_resolution" is set to 10 {True,False} +reconstruction_type: 'relative' # Reconstruction absolute or relative values {'relative','absolute'}. If 'absolute', prior_mean_always_0 is ignored. +model_processing: None # Processing the model prior {None,'linear_global','linear_spatial',None} +assign_seasonality: False # If something other than false, all proxies are set to specific seasonalities {False,'annual','summer','winter','jja','djf'} +change_uncertainty: False # Change uncertainty values in several ways {False,'mult_#','all_#','path/to/file.txt'} + # False Use values from metadata. + # 'mult_#' Multiply uncertainty values from metadata by a constant number. + # 'all_#' Set uncertainty values to a constant number. + # 'path/to/file.txt' Use uncertainty values from a text file. The file must contain TSids and new MSE values for each proxy. + diff --git a/da_load_models.py b/da_load_models.py index cd81343..8c67950 100644 --- a/da_load_models.py +++ b/da_load_models.py @@ -48,11 +48,15 @@ def load_model_data(options): model_individual[var_name] = handle_model[var_name].values handle_model.close() # - # Compute annual means of the model data + # Compute annual, jja, and djf means of the model data n_lat = len(model_data['lat']) n_lon = len(model_data['lon']) + ind_jja = [5,6,7] + ind_djf = [11,0,1] time_ndays_model_latlon = np.repeat(np.repeat(model_individual['time_ndays'][:,:,None,None],n_lat,axis=2),n_lon,axis=3) model_individual[var_name+'_annual'] = np.average(model_individual[var_name],axis=1,weights=time_ndays_model_latlon) + model_individual[var_name+'_jja'] = np.average(model_individual[var_name][:,ind_jja,:,:],axis=1,weights=time_ndays_model_latlon[:,ind_jja,:,:]) #TODO: Check this. + model_individual[var_name+'_djf'] = np.average(model_individual[var_name][:,ind_djf,:,:],axis=1,weights=time_ndays_model_latlon[:,ind_djf,:,:]) #TODO: Check this. # # In each model, central values will not be selected within max_resolution/2 of the edges n_time = len(model_individual['age']) @@ -117,6 +121,11 @@ def detrend_model_data(model_data,options): n_lat = len(model_data['lat']) n_lon = len(model_data['lon']) # + # If summer or winter is to be reconstructed, pass #TODO: Update this? + if options['season_to_reconstruct'] in ['jja','djf']: + print('Model processing for JJA or DJF is not set up yet. Returning unchanged.') + return model_data + # # If desired, do a highpass filter on every location if options['model_processing'] == 'linear_global': # @@ -182,7 +191,8 @@ def process_models(model_name,var_name,time_resolution,age_range,output_dir,orig """ # # If the model name ends in "_regrid", remove that part of the model name. - if model_name[-7:] == '_regrid': model_name = model_name[:-7] + if model_name[-7:] == '_regrid': model_name = model_name[:-7] + #if model_name[-14:] == '_regrid_arctic': model_name = model_name[:-14] # # Set directories data_dir = {} diff --git a/da_load_proxies.py b/da_load_proxies.py index 3fafd39..35974fd 100644 --- a/da_load_proxies.py +++ b/da_load_proxies.py @@ -1,7 +1,7 @@ #============================================================================== # Functions for loading proxy data for the data assimilation project. # author: Michael P. Erb -# date : 3/16/2022 +# date : 10/12/2022 #============================================================================== import da_utils @@ -16,9 +16,10 @@ def load_proxies(options): # # Set the necessary directories - dir_proxies_temp12k = options['data_dir']+'proxies/temp12k/' - dir_proxies_pages2k = options['data_dir']+'proxies/pages2k/' - dir_proxies_pseudo = options['data_dir']+'proxies/pseudoproxies/' + dir_proxies_temp12k = options['data_dir']+'proxies/temp12k/' + dir_proxies_hydro12k = options['data_dir']+'proxies/hydro12k/' + dir_proxies_pages2k = options['data_dir']+'proxies/pages2k/' + dir_proxies_pseudo = options['data_dir']+'proxies/pseudoproxies/' collection_all = [] proxy_ts = [] # @@ -28,7 +29,7 @@ def load_proxies(options): if proxy_dataset == 'temp12k': # # Load the Temp12k proxy metadata - file_to_open = open(dir_proxies_temp12k+'Temp12k1_0_2.pkl','rb') + file_to_open = open(dir_proxies_temp12k+'Temp12k'+options['version_temp12k']+'.pkl','rb') proxies_all_12k = pickle.load(file_to_open)['D'] file_to_open.close() # @@ -56,6 +57,86 @@ def load_proxies(options): if proxy_ts[i]['paleoData_TSid'] == 'RXEc3JaUSUk': proxy_ts[i]['paleoData_temperature12kUncertainty'] = 2.1 # This record has an uncertainty value of "3; 2", but it should be 2.1. if proxy_ts[i]['paleoData_TSid'] == 'RWzl4NCma8r': proxy_ts[i]['paleoData_interpretation'][0]['seasonality'] = 'summer' # This record is lacking a seasonality field. # + elif proxy_dataset == 'raw': + # + # Load the processed RAW proxy data + dir_proxies = '/projects/pd_lab/data/proxies/Rapid_Arctic_Warming/RAW_processed/' + if options['version_temp12k'] == 'raw_positive_corr': file_proxies = 'RAW_proxies_processed_positive_corr.pickle' + else: file_proxies = 'RAW_proxies_processed.pickle' + file_to_open = open(dir_proxies+file_proxies,'rb') + all_ts_raw = pickle.load(file_to_open) + file_to_open.close() + # + # Only use temperature records + variable_all = [] + n_proxies = len(all_ts_raw) + for i in range(n_proxies): + try: variable_all.append(all_ts_raw[i]['paleoData_interpretation'][0]['variable']) + except: variable_all.append('Not given') + # + #ind_temp = np.where((np.array(variable_all) == 'Temperature'))[0] + #proxy_ts_raw = [all_ts_raw[ind] for ind in ind_temp] + proxy_ts_raw = all_ts_raw + # + # Only use records in units of degC + #proxy_ts_raw = lipd.filterTs(all_ts_raw,'paleoData_units == degC') + if options['reconstruction_type'] == 'absolute': proxy_ts_raw = lipd.filterTs(proxy_ts_raw,'paleoData_datum == abs') + # + proxy_ts = proxy_ts + proxy_ts_raw + collection_all = collection_all + ([proxy_dataset] * len(proxy_ts_raw)) + # + elif proxy_dataset == 'hydro12k': + # + # Load the uncertainty file #TODO: Incorporate this into the code better + uncertainty_file = '/projects/pd_lab/data/data_assimilation/proxies/hydro12k/LegacyClimateTSidPassQC.csv' + proxy_info_from_file = np.genfromtxt(uncertainty_file,delimiter=',',dtype='str') + tsid_from_file = proxy_info_from_file[1:,2] + rmse_from_file = proxy_info_from_file[1:,9].astype(float) + tsid_from_file = np.array([entry[1:-1] for entry in tsid_from_file]) + rmse_selected_median = 114 + # + # Load the Temp12k proxy metadata + file_to_open = open(dir_proxies_hydro12k+'HoloceneHydroclimate0_6_0.pkl','rb') + proxies_all_hydro12k = pickle.load(file_to_open)['D'] + file_to_open.close() + # + # Extract the time series + all_ts_hydro12k = lipd.extractTs(proxies_all_hydro12k) + # + # Get all proxies in the compilation + hydro_selected_units = ['mm/a'] # Only select records with these units + hydro_selected_interp = ['P'] # Only select records with these interpretations + ind_hydro = [] + for i in range(len(all_ts_hydro12k)): + keys = list(all_ts_hydro12k[i].keys()) + if ('paleoData_inCompilationBeta' in keys) and ('age' in keys) and ('archiveType' in keys): + compilations,versions = [],[] + n_values = len(all_ts_hydro12k[i]['paleoData_inCompilationBeta']) + for j in range(n_values): + compilations.append(all_ts_hydro12k[i]['paleoData_inCompilationBeta'][j]['compilationName']) + versions.append(all_ts_hydro12k[i]['paleoData_inCompilationBeta'][j]['compilationVersion']) + # + if ('HoloceneHydroclimate' in compilations) and ('0_6_0' in str(versions)): + try: _ = all_ts_hydro12k[i]['paleoData_interpretation'][0]['seasonality'] #TODO: Update this later + except: print('"seasonality" key not found for index',i); continue + dataunits = all_ts_hydro12k[i]['paleoData_units'] + interp = all_ts_hydro12k[i]['paleoData_interpretation'][0]['variable'] + # + # Hydro12k data currently lack uncertainty values. Set them here. #TODO: Update this later. + tsid = all_ts_hydro12k[i]['paleoData_TSid'] + ind_rmse = np.where(tsid_from_file == tsid)[0] + if len(ind_rmse) == 1: all_ts_hydro12k[i]['paleoData_temperature12kUncertainty'] = rmse_from_file[ind_rmse[0]] + else: all_ts_hydro12k[i]['paleoData_temperature12kUncertainty'] = rmse_selected_median + # + if (dataunits in hydro_selected_units) and (interp in hydro_selected_interp): ind_hydro.append(i) + # + proxy_ts_hydro12k = [all_ts_hydro12k[i] for i in ind_hydro] + if options['reconstruction_type'] == 'absolute': proxy_ts_hydro12k = lipd.filterTs(proxy_ts_hydro12k,'paleoData_datum == abs') + print('Number of hydro12k records selected:',len(proxy_ts_hydro12k)) + # + proxy_ts = proxy_ts + proxy_ts_hydro12k + collection_all = collection_all + ([proxy_dataset] * len(proxy_ts_hydro12k)) + # elif proxy_dataset == 'pages2k': # # Load the PAGES2k proxies @@ -113,22 +194,42 @@ def process_proxies(proxy_ts,collection_all,options): # # Set up arrays for the processed proxy data to be stored in proxy_data = {} - proxy_data['values_binned'] = np.zeros((n_proxies,n_ages)); proxy_data['values_binned'][:] = np.nan - proxy_data['resolution_binned'] = np.zeros((n_proxies,n_ages)); proxy_data['resolution_binned'][:] = np.nan - proxy_data['uncertainty'] = np.zeros((n_proxies)); proxy_data['uncertainty'][:] = np.nan - proxy_data['metadata'] = np.zeros((n_proxies,8),dtype=object); proxy_data['metadata'][:] = np.nan - proxy_data['lats'] = np.zeros((n_proxies)); proxy_data['lats'][:] = np.nan - proxy_data['lons'] = np.zeros((n_proxies)); proxy_data['lons'][:] = np.nan + proxy_data['values_binned'] = np.zeros((n_proxies,n_ages)); proxy_data['values_binned'][:] = np.nan + proxy_data['resolution_binned'] = np.zeros((n_proxies,n_ages)); proxy_data['resolution_binned'][:] = np.nan + proxy_data['metadata'] = np.zeros((n_proxies,11),dtype=object); proxy_data['metadata'][:] = np.nan + proxy_data['lats'] = np.zeros((n_proxies)); proxy_data['lats'][:] = np.nan + proxy_data['lons'] = np.zeros((n_proxies)); proxy_data['lons'][:] = np.nan + #proxy_data['uncertainty'] = np.zeros((n_proxies)); proxy_data['uncertainty'][:] = np.nan + proxy_data['uncertainty'] = [] proxy_data['archivetype'] = [] proxy_data['proxytype'] = [] proxy_data['units'] = [] + proxy_data['interp'] = [] + proxy_data['direction'] = [] proxy_data['seasonality_array'] = {} # + # If using data from age-uncertainty ensembles, load the data here + if options['age_uncertain_method']: + file_ageuncertain_median = options['data_dir']+'proxies/temp12k/ageuncertain_files/ageuncertainty_testing_medians_Temp12k'+options['version_temp12k']+'.csv' + file_ageuncertain_mse = options['data_dir']+'proxies/temp12k/ageuncertain_files/ageuncertainty_testing_mse_Temp12k'+options['version_temp12k']+'.csv' + # + ageuncertain_ages = np.genfromtxt(file_ageuncertain_median,delimiter=',',dtype='str')[1:,0].astype(float) + ageuncertain_tsids = np.genfromtxt(file_ageuncertain_median,delimiter=',',dtype='str')[0,1:] + ageuncertain_values = np.genfromtxt(file_ageuncertain_median,delimiter=',',dtype='str')[1:,1:].astype(float) + ageuncertain_ages2 = np.genfromtxt(file_ageuncertain_mse, delimiter=',',dtype='str')[1:,0].astype(float) + ageuncertain_tsids2 = np.genfromtxt(file_ageuncertain_mse, delimiter=',',dtype='str')[0,1:] + ageuncertain_uncertainties = np.genfromtxt(file_ageuncertain_mse, delimiter=',',dtype='str')[1:,1:].astype(float) + # + if not np.array_equal(ageuncertain_ages, ageuncertain_ages2): print('WARNING: COMPARING THE TWO AGE UNCERTAINTY FILES: DIFFERENT AGES') + if not np.array_equal(ageuncertain_tsids,ageuncertain_tsids2): print('WARNING: COMPARING THE TWO AGE UNCERTAINTY FILES: DIFFERENT TSIDS') + if not np.array_equal(ageuncertain_ages, age_centers): print('WARNING: COMPARING AGE_CENTERS TO UNCERTAINTY AGES: DIFFERENT AGES') + # # Loop through proxies, saving the necessary values to common variables. no_ref_data = 0; missing_uncertainty = 0 for i in range(n_proxies): # # Get proxy data + print('Processing proxies:',i) proxy_ages = np.array(proxy_ts[i]['age']).astype(float) proxy_values = np.array(proxy_ts[i]['paleoData_values']).astype(float) # @@ -140,6 +241,16 @@ def process_proxies(proxy_ts,collection_all,options): ind_sorted = np.argsort(proxy_ages) proxy_values = proxy_values[ind_sorted] proxy_ages = proxy_ages[ind_sorted] + # + # Update the units: + # - Precipitation: mm/yr -> mm/day + units = proxy_ts[i]['paleoData_units'] + if units == 'mm/yr': + proxy_ts[i]['paleoData_units'] = 'mm/day' + proxy_values = proxy_values/365.25 #TODO: Consider using a different value than 365.25 + if 'paleoData_temperature12kUncertainty' in list(proxy_ts[i].keys()): proxy_ts[i]['paleoData_temperature12kUncertainty'] = proxy_ts[i]['paleoData_temperature12kUncertainty']/365.25 + print('Proxy '+str(i)+': Updating units from mm/yr to mm/day') + # # INTERPOLATION # To interpolate the proxy data to the base resolution (by default: decadal): @@ -212,32 +323,52 @@ def process_proxies(proxy_ts,collection_all,options): plt.show() """ # - # If the reconstruction type is "relative," remove the mean of the reference period - if options['reconstruction_type'] == 'relative': - ind_ref = np.where((proxy_ages_interp >= options['reference_period'][0]) & (proxy_ages_interp < options['reference_period'][1]))[0] - proxy_values_12ka = proxy_values_12ka - np.nanmean(proxy_values_interp[ind_ref]) - if np.isnan(proxy_values_interp[ind_ref]).all(): print('No data in reference period, index: '+str(i)); no_ref_data += 1 + # Get uncertainty metadata + missing_uncertainty_value = np.nan + try: proxy_uncertainty = proxy_ts[i]['paleoData_temperature12kUncertainty'] + except: proxy_uncertainty = missing_uncertainty_value; missing_uncertainty += 1 + proxy_uncertainty = np.square(float(proxy_uncertainty)) # Proxy uncertainty was give as RMSE, but the code uses MSE + # + # If selected, get the age-uncertain proxy data + if options['age_uncertain_method']: + # + # Find the index of the proxy record in the age-uncertainty data. + if proxy_ts[i]['paleoData_TSid'] not in ageuncertain_tsids: print('WARNING: TSID NOT IN AGE-UNCERTAINTY FILE, INDEX='+str(i)) + ind_tsid = np.where(ageuncertain_tsids==proxy_ts[i]['paleoData_TSid'])[0][0] + # + # Assign the values and uncertainty data for the given proxy, in 1D arrays of length n_time. + proxy_values_12ka = ageuncertain_values[:,ind_tsid] + proxy_uncertainty = ageuncertain_uncertainties[:,ind_tsid] + #proxy_res_12ka = #TODO: Should proxy resolution be changed when this method is used? # # Set resolutions to a minimum of 1 and a maximum of max_res_value proxy_res_12ka[proxy_res_12ka < 1] = 1 proxy_res_12ka[proxy_res_12ka > max_res_value] = max_res_value # - # Save to common variables (y and ya) - proxy_data['values_binned'][i,:] = proxy_values_12ka - proxy_data['resolution_binned'][i,:] = proxy_res_12ka + # If the reconstruction type is "relative," remove the mean of the reference period + if options['reconstruction_type'] == 'relative': + if options['age_uncertain_method']: + ind_ref = np.where((age_centers >= options['reference_period'][0]) & (age_centers < options['reference_period'][1]))[0] + proxy_values_12ka = proxy_values_12ka - np.nanmean(proxy_values_12ka[ind_ref]) + if np.isnan(proxy_values_12ka[ind_ref]).all(): print('No data in reference period, index: '+str(i)); no_ref_data += 1 + else: + # The reference period is calculated using annualized data in this case. + ind_ref = np.where((proxy_ages_interp >= options['reference_period'][0]) & (proxy_ages_interp < options['reference_period'][1]))[0] + proxy_values_12ka = proxy_values_12ka - np.nanmean(proxy_values_interp[ind_ref]) + if np.isnan(proxy_values_interp[ind_ref]).all(): print('No data in reference period, index: '+str(i)); no_ref_data += 1 # # Get proxy metdata - missing_uncertainty_value = np.nan proxy_lat = proxy_ts[i]['geo_meanLat'] proxy_lon = proxy_ts[i]['geo_meanLon'] proxy_seasonality_txt = proxy_ts[i]['paleoData_interpretation'][0]['seasonality'] proxy_seasonality_general = proxy_ts[i]['paleoData_interpretation'][0]['seasonalityGeneral'] - try: proxy_uncertainty = proxy_ts[i]['paleoData_temperature12kUncertainty'] - except: proxy_uncertainty = missing_uncertainty_value; missing_uncertainty += 1 - proxy_uncertainty = float(proxy_uncertainty) proxy_data['archivetype'].append(proxy_ts[i]['archiveType']) proxy_data['proxytype'].append(proxy_ts[i]['paleoData_proxy']) proxy_data['units'].append(proxy_ts[i]['paleoData_units']) + proxy_data['interp'].append(proxy_ts[i]['paleoData_interpretation'][0]['variable']) + if 'direction' in list(proxy_ts[i]['paleoData_interpretation'][0].keys()): interp_direction = proxy_ts[i]['paleoData_interpretation'][0]['direction'] + else: interp_direction = 'Not given' + proxy_data['direction'].append(interp_direction) # # Convert seasonality to a list of months, with negative values corresponding to the previous year. if 'seasonality_array' in list(proxy_ts[i].keys()): @@ -262,17 +393,24 @@ def process_proxies(proxy_ts,collection_all,options): # proxy_data['seasonality_array'][i] = proxy_seasonality_array # - # Save some metadata to common variables + # Save to common variables (y and ya) + proxy_data['values_binned'][i,:] = proxy_values_12ka + proxy_data['resolution_binned'][i,:] = proxy_res_12ka + proxy_data['uncertainty'].append(proxy_uncertainty) + # + # Save some more metadata to a common variables if proxy_lon < 0: proxy_lon = proxy_lon+360 - proxy_data['uncertainty'][i] = np.square(proxy_uncertainty) # Proxy uncertainty was give as RMSE, but the code uses MSE proxy_data['metadata'][i,0] = proxy_ts[i]['dataSetName'] proxy_data['metadata'][i,1] = proxy_ts[i]['paleoData_TSid'] proxy_data['metadata'][i,2] = str(proxy_lat) proxy_data['metadata'][i,3] = str(proxy_lon) proxy_data['metadata'][i,4] = str(proxy_seasonality_array) proxy_data['metadata'][i,5] = proxy_seasonality_general - proxy_data['metadata'][i,6] = str(np.median(proxy_ages[1:]-proxy_ages[:-1])) + proxy_data['metadata'][i,6] = str(np.median(proxy_ages[1:]-proxy_ages[:-1])) #TODO: Consider calculating this a different way. proxy_data['metadata'][i,7] = collection_all[i] + proxy_data['metadata'][i,8] = proxy_ts[i]['paleoData_units'] + proxy_data['metadata'][i,9] = proxy_ts[i]['paleoData_interpretation'][0]['variable'] + proxy_data['metadata'][i,10] = interp_direction proxy_data['lats'][i] = proxy_lat proxy_data['lons'][i] = proxy_lon # @@ -280,6 +418,7 @@ def process_proxies(proxy_ts,collection_all,options): proxy_data['archivetype'] = np.array(proxy_data['archivetype']) proxy_data['proxytype'] = np.array(proxy_data['proxytype']) proxy_data['units'] = np.array(proxy_data['units']) + proxy_data['uncertainty'] = np.array(proxy_data['uncertainty']) # print('\n=== PROXY DATA LOADED ===') print('Proxy datasets loaded (n='+str(len(options['proxy_datasets_to_assimilate']))+'):'+str(options['proxy_datasets_to_assimilate'])) diff --git a/da_main_code.py b/da_main_code.py index 6822dc9..a1504d1 100644 --- a/da_main_code.py +++ b/da_main_code.py @@ -2,8 +2,8 @@ # This script contains the main code of the Holocene data assimilation. # Options are set in the config yml file. See README.txt for a more complete # explanation of the code and setup. -# author: Michael P. Erb -# date : 3/29/2022 +# author: Michael Erb +# date : 1/22/2025 #============================================================================== import sys @@ -19,13 +19,18 @@ import da_psms +#TODO: Update any instances of 'tas' to be for other variables. + + #%% SETTINGS starttime_total = time.time() # Start timer # Use a given config file. If not given, use config_default.yml. if len(sys.argv) > 1: config_file = sys.argv[1] -else: config_file = 'config_default.yml' +else: config_file = 'config.yml' +if len(sys.argv) > 2: seed_overwrite = sys.argv[2] +else: seed_overwrite = 'None' # Load the configuration options and print them to the screen. print('Using configuration file: '+config_file) @@ -39,6 +44,11 @@ #%% LOAD AND PROCESS DATA +# If the seed has been overwritten using sys.argv, update it here +if seed_overwrite != 'None': + options['seed_for_prior'] = int(seed_overwrite) + options['seed_for_proxy_choice'] = int(seed_overwrite) + # Load the chosen proxy data proxy_ts,collection_all = da_load_proxies.load_proxies(options) proxy_data = da_load_proxies.process_proxies(proxy_ts,collection_all,options) @@ -58,8 +68,11 @@ for i in range(n_models_in_prior): ind_for_model = (model_data['number'] == (i+1)) ind_ref = (model_data['age'] >= options['reference_period'][0]) & (model_data['age'] < options['reference_period'][1]) & ind_for_model - model_data['tas'][ind_for_model,:,:,:] = model_data['tas'][ind_for_model,:,:,:] - np.mean(model_data['tas'][ind_ref,:,:,:],axis=0) - model_data['tas_annual'][ind_for_model,:,:] = model_data['tas_annual'][ind_for_model,:,:] - np.mean(model_data['tas_annual'][ind_ref,:,:],axis=0) + for var in options['vars_to_reconstruct']: + model_data[var][ind_for_model,:,:,:] = model_data[var][ind_for_model,:,:,:] - np.mean(model_data[var][ind_ref,:,:,:],axis=0) + model_data[var+'_annual'][ind_for_model,:,:] = model_data[var+'_annual'][ind_for_model,:,:] - np.mean(model_data[var+'_annual'][ind_ref,:,:],axis=0) + model_data[var+'_jja'][ind_for_model,:,:] = model_data[var+'_jja'][ind_for_model,:,:] - np.mean(model_data[var+'_jja'][ind_ref,:,:],axis=0) + model_data[var+'_djf'][ind_for_model,:,:] = model_data[var+'_djf'][ind_for_model,:,:] - np.mean(model_data[var+'_djf'][ind_ref,:,:],axis=0) # If requested, alter the proxy uncertainty values. if options['change_uncertainty']: @@ -86,7 +99,21 @@ proxy_data['uncertainty'][i] = proxy_uncertainties_from_file[index_uncertainty,1].astype(float) # Use PSMs to get model-based proxy estimates -proxy_estimates_all,_ = da_psms.psm_main(model_data,proxy_data,options) +proxy_estimates_all,_,proxy_data = da_psms.psm_main(model_data,proxy_data,options) + + +#%% SET UNCERTAINTIES +""" +# Temperature proxies which are not in units of degC don't have uncertainty values. +# Here, set this to the standard deviation, as in Hancock et al., in press #TODO: Update this. +n_proxies = proxy_data['values_binned'].shape[0] +for i in range(n_proxies): + units = proxy_data['units'][i] + if units != 'degC': + print(i,units,proxy_data['uncertainty'][i]) + proxy_std = np.nanstd(proxy_data['values_binned'][i,:]) + proxy_data['uncertainty'][i] = proxy_std +""" #%% SET THINGS UP @@ -96,8 +123,8 @@ n_ages = proxy_data['values_binned'].shape[1] n_lat = len(model_data['lat']) n_lon = len(model_data['lon']) -n_latlonvars = n_lat*n_lon*n_vars -n_state = (n_latlonvars) + n_proxies +n_varslatlon = n_vars*n_lat*n_lon +n_state = (n_varslatlon) + n_proxies # Determine the total possible number of ensemble members n_ens_possible = len(da_load_models.get_indices_for_prior(options,model_data,0)) @@ -116,16 +143,17 @@ ind_to_save = np.sort(ind_to_save) # Set up arrays for reconstruction values and more outputs -recon_ens = np.zeros((n_state,n_ens_to_save,n_ages)); recon_ens[:] = np.nan -recon_mean = np.zeros((n_state,n_ages)); recon_mean[:] = np.nan -recon_global_all = np.zeros((n_ages,n_ens,n_vars)); recon_global_all[:] = np.nan -recon_nh_all = np.zeros((n_ages,n_ens,n_vars)); recon_nh_all[:] = np.nan -recon_sh_all = np.zeros((n_ages,n_ens,n_vars)); recon_sh_all[:] = np.nan -prior_ens = np.zeros((n_state,n_ens_to_save,n_ages)); prior_ens[:] = np.nan -prior_mean = np.zeros((n_state,n_ages)); prior_mean[:] = np.nan -prior_global_all = np.zeros((n_ages,n_ens,n_vars)); prior_global_all[:] = np.nan -prior_proxy_means = np.zeros((n_ages,n_proxies)); prior_proxy_means[:] = np.nan -proxies_to_assimilate_all = np.zeros((n_ages,n_proxies)); proxies_to_assimilate_all[:] = np.nan +recon_ens = np.zeros((n_state,n_ens_to_save,n_ages)); recon_ens[:] = np.nan +recon_mean = np.zeros((n_state,n_ages)); recon_mean[:] = np.nan +recon_global_all = np.zeros((n_ages,n_ens,n_vars)); recon_global_all[:] = np.nan +recon_nh_all = np.zeros((n_ages,n_ens,n_vars)); recon_nh_all[:] = np.nan +recon_sh_all = np.zeros((n_ages,n_ens,n_vars)); recon_sh_all[:] = np.nan +prior_ens = np.zeros((n_state,n_ens_to_save,n_ages)); prior_ens[:] = np.nan +prior_mean = np.zeros((n_state,n_ages)); prior_mean[:] = np.nan +prior_global_all = np.zeros((n_ages,n_ens,n_vars)); prior_global_all[:] = np.nan +prior_proxy_means = np.zeros((n_ages,n_proxies)); prior_proxy_means[:] = np.nan +prior_proxy_ens = np.zeros((n_ages,n_ens_to_save,n_proxies)); prior_proxy_ens[:] = np.nan +proxies_to_assimilate_all = np.zeros((n_ages,n_proxies)); proxies_to_assimilate_all[:] = np.nan #%% FIND PROXIES TO ASSIMILATE AND MORE @@ -138,12 +166,15 @@ # Find the proxies with uncertainty values proxy_ind_with_uncertainty = np.isfinite(proxy_data['uncertainty']) +if len(proxy_ind_with_uncertainty.shape) == 2: proxy_ind_with_uncertainty = proxy_ind_with_uncertainty.any(axis=1) print(' - Number of records with uncertainty values: '+str(sum(proxy_ind_with_uncertainty))) # If requested, select only proxies with certain seasonalities -if options['assimilate_selected_seasons']: +if options['assimilate_selected_seasons'] in ['jja_preferred','djf_preferred']: + proxy_ind_of_seasonality = da_utils.find_seasons(proxy_data,options) +elif options['assimilate_selected_seasons']: proxy_ind_of_seasonality = np.full((n_proxies),False,dtype=bool) - ind_seasons = [i for i, seasontype in enumerate(proxy_data['metadata'][:,5]) if seasontype in options['assimilate_selected_seasons']] + ind_seasons = [i for i, seasontype in enumerate(proxy_data['metadata'][:,5]) if seasontype.lower() in options['assimilate_selected_seasons']] proxy_ind_of_seasonality[ind_seasons] = True print(' - Number of records with seasonalities '+str(options['assimilate_selected_seasons'])+': '+str(sum(proxy_ind_of_seasonality))) else: @@ -208,7 +239,7 @@ # Loop through every age, doing the data assimilation with a time-varying prior print(' === Starting data assimilation === ') -#age_counter = 0; age = proxy_data['age_centers'][age_counter] +age_counter = 0; age = proxy_data['age_centers'][age_counter] for age_counter,age in enumerate(proxy_data['age_centers']): # starttime_loop = time.time() @@ -217,26 +248,34 @@ proxy_values_for_age = proxy_data['values_binned'][:,age_counter] proxy_resolution_for_age = proxy_data['resolution_binned'][:,age_counter] # + # Get the proxy uncertainty for the given age + if len(proxy_data['uncertainty'].shape) == 1: proxy_uncertainties_for_age = proxy_data['uncertainty'] + elif len(proxy_data['uncertainty'].shape) == 2: proxy_uncertainties_for_age = proxy_data['uncertainty'][:,age_counter] + else: print('Proxy uncertainties are an unknown shape: '+str(proxy_data['uncertainty'].shape)) + # # Get the indices of the prior which will be used for this data assimilation step indices_for_prior = da_load_models.get_indices_for_prior(options,model_data,age) model_number_for_prior = model_data['number'][indices_for_prior] if len(indices_for_prior) != n_ens_possible: print(' !!! Warning: number of prior ages selected does not match n_ens. Age='+str(age)) # + # Select the season to reconstruct in the prior + season_txt = options['season_to_reconstruct'] + # # Get the prior values for the variables to reconstruct for j,var_name in enumerate(options['vars_to_reconstruct']): - var_annual_for_prior = model_data[var_name+'_annual'][indices_for_prior,:,:][:,:,:,None] - if j == 0: vars_annual_for_prior_all = var_annual_for_prior - else: vars_annual_for_prior_all = np.concatenate((vars_annual_for_prior_all,var_annual_for_prior),axis=3) + var_season_for_prior = model_data[var_name+'_'+season_txt][indices_for_prior,:,:][:,None,:,:] + if j == 0: vars_season_for_prior_all = var_season_for_prior + else: vars_season_for_prior_all = np.concatenate((vars_season_for_prior_all,var_season_for_prior),axis=1) # # For each proxy, get the proxy estimates for the correct resolution model_estimates_for_age = np.zeros((n_ens_possible,n_proxies)); model_estimates_for_age[:] = np.nan for j in range(n_proxies): res = proxy_resolution_for_age[j] - if np.isnan(proxy_values_for_age[j]): continue + if np.isnan(proxy_values_for_age[j]) or np.isnan(res): continue model_estimates_for_age[:,j] = proxy_estimates_all[j][int(res)][indices_for_prior] # # Use only the randomly selected climate states in the prior - vars_annual_for_prior_all = vars_annual_for_prior_all[ind_to_use,:,:,:] + vars_season_for_prior_all = vars_season_for_prior_all[ind_to_use,:,:,:] model_estimates_for_age = model_estimates_for_age[ind_to_use,:] model_number_for_prior = model_number_for_prior[ind_to_use] # @@ -244,11 +283,11 @@ if ((options['reconstruction_type'] == 'relative') and (options['prior_mean_always_0'] == True)): for i in range(n_models_in_prior): ind_for_model = np.where(model_number_for_prior == (i+1))[0] - vars_annual_for_prior_all[ind_for_model,:,:,:] = vars_annual_for_prior_all[ind_for_model,:,:,:] - np.mean(vars_annual_for_prior_all[ind_for_model,:,:,:],axis=0) + vars_season_for_prior_all[ind_for_model,:,:,:] = vars_season_for_prior_all[ind_for_model,:,:,:] - np.mean(vars_season_for_prior_all[ind_for_model,:,:,:],axis=0) model_estimates_for_age[ind_for_model,:] = model_estimates_for_age[ind_for_model,:] - np.mean(model_estimates_for_age[ind_for_model,:],axis=0) # # Make the prior (Xb) - prior = np.reshape(vars_annual_for_prior_all,(n_ens,n_latlonvars)) + prior = np.reshape(vars_season_for_prior_all,(n_ens,n_varslatlon)) # # Append the proxy estimate to the prior, so that proxy estimates are reconstructed too prior = np.append(prior,model_estimates_for_age,axis=1) @@ -260,9 +299,10 @@ # # Save the prior estimates of proxies, for analysis later prior_proxy_means[age_counter,:] = np.mean(model_estimates_for_age,axis=0) + prior_proxy_ens[age_counter,:,:] = model_estimates_for_age[ind_to_save,:] # # Select only the proxies which meet the criteria - proxies_to_assimilate = proxy_ind_selected & np.isfinite(proxy_values_for_age) + proxies_to_assimilate = proxy_ind_selected & np.isfinite(proxy_values_for_age) & np.isfinite(proxy_uncertainties_for_age) & np.isfinite(proxy_resolution_for_age) & np.isfinite(prior_proxy_means[age_counter,:]) # # Keep a record of which proxies are assimilated proxies_to_assimilate_all[age_counter,:] = proxies_to_assimilate @@ -273,7 +313,7 @@ if n_proxies_at_age > 0: # proxy_values_selected = proxy_values_for_age[proxy_ind_to_assimilate] - proxy_uncertainty_selected = proxy_data['uncertainty'][proxy_ind_to_assimilate] + proxy_uncertainty_selected = proxy_uncertainties_for_age[proxy_ind_to_assimilate] model_estimates_selected = model_estimates_for_age[:,proxy_ind_to_assimilate] R_diagonal = np.diag(proxy_uncertainty_selected) # @@ -286,12 +326,12 @@ # Get values for proxy proxy_value = proxy_values_selected[proxy] proxy_uncertainty = proxy_uncertainty_selected[proxy] - tas_modelbased_estimates = Xb[n_latlonvars+proxy_ind_to_assimilate[proxy],:] - if options['localization_radius']: loc = proxy_localization_all[proxy_ind_to_assimilate[proxy],:] + proxy_modelbased_estimates = Xb[n_varslatlon+proxy_ind_to_assimilate[proxy],:] + if options['localization_radius'] != 'None': loc = proxy_localization_all[proxy_ind_to_assimilate[proxy],:] else: loc = None # # Do data assimilation - Xb = da_utils_lmr.enkf_update_array(Xb,proxy_value,tas_modelbased_estimates,proxy_uncertainty,loc=loc,inflate=None) + Xb = da_utils_lmr.enkf_update_array(Xb,proxy_value,proxy_modelbased_estimates,proxy_uncertainty,loc=loc,inflate=None) if np.isnan(Xb).all(): print(' !!! ERROR. ALL RECONSTRUCTION VALUES SET TO NAN. Age='+str(age)+', proxy number='+str(proxy)+' !!!') # # Set the final values @@ -302,17 +342,23 @@ Xa = Xb # # Compute the global-mean of the prior - prior_global = da_utils.global_mean(vars_annual_for_prior_all,model_data['lat'],1,2) + prior_global = da_utils.global_mean(vars_season_for_prior_all,model_data['lat'],2,3) prior_global_all[age_counter,:,:] = prior_global # # Compute the global and hemispheric means of the reconstruction - Xa_latlon = np.reshape(Xa[:n_latlonvars,:],(n_lat,n_lon,n_vars,n_ens)) - recon_global = da_utils.global_mean(Xa_latlon,model_data['lat'],0,1) - recon_nh = da_utils.spatial_mean(Xa_latlon,model_data['lat'],model_data['lon'], 0,90,0,360,0,1) - recon_sh = da_utils.spatial_mean(Xa_latlon,model_data['lat'],model_data['lon'],-90, 0,0,360,0,1) + Xa_latlon = np.reshape(Xa[:n_varslatlon,:],(n_vars,n_lat,n_lon,n_ens)) + recon_global = da_utils.global_mean(Xa_latlon,model_data['lat'],1,2) recon_global_all[age_counter,:,:] = np.transpose(recon_global) - recon_nh_all[age_counter,:,:] = np.transpose(recon_nh) - recon_sh_all[age_counter,:,:] = np.transpose(recon_sh) + try: + recon_nh = da_utils.spatial_mean(Xa_latlon,model_data['lat'],model_data['lon'],0,90,0,360,1,2) + recon_nh_all[age_counter,:,:] = np.transpose(recon_nh) + except: + print('Warning: Cannot calculate NH mean in prior. Leaving as NaN.') + try: + recon_sh = da_utils.spatial_mean(Xa_latlon,model_data['lat'],model_data['lon'],-90,0,0,360,1,2) + recon_sh_all[age_counter,:,:] = np.transpose(recon_sh) + except: + print('Warning: Cannot calculate SH mean in prior. Leaving as NaN.') # # Get the mean and selected ensemble values recon_mean[:,age_counter] = np.mean(Xa,axis=1) @@ -328,14 +374,14 @@ prior_ens = np.swapaxes(prior_ens,0,2) # Reshape the gridded reconstruction to a lat-lon grid -recon_mean_grid = np.reshape(recon_mean[:,:n_latlonvars], (n_ages, n_lat,n_lon,n_vars)) -recon_ens_grid = np.reshape(recon_ens[:,:,:n_latlonvars],(n_ages,n_ens_to_save,n_lat,n_lon,n_vars)) -prior_mean_grid = np.reshape(prior_mean[:,:n_latlonvars], (n_ages, n_lat,n_lon,n_vars)) -prior_ens_grid = np.reshape(prior_ens[:,:,:n_latlonvars],(n_ages,n_ens_to_save,n_lat,n_lon,n_vars)) +recon_mean_grid = np.reshape(recon_mean[:,:n_varslatlon], (n_ages, n_vars,n_lat,n_lon)) +recon_ens_grid = np.reshape(recon_ens[:,:,:n_varslatlon],(n_ages,n_ens_to_save,n_vars,n_lat,n_lon)) +prior_mean_grid = np.reshape(prior_mean[:,:n_varslatlon], (n_ages, n_vars,n_lat,n_lon)) +prior_ens_grid = np.reshape(prior_ens[:,:,:n_varslatlon],(n_ages,n_ens_to_save,n_vars,n_lat,n_lon)) # Put the proxy reconstructions into separate variables -recon_mean_proxies = recon_mean[:,n_latlonvars:] -recon_ens_proxies = recon_ens[:,:,n_latlonvars:] +recon_mean_proxies = recon_mean[:,n_varslatlon:] +recon_ens_proxies = recon_ens[:,:,n_varslatlon:] # Store the options into as list to save n_options = len(options.keys()) @@ -347,7 +393,9 @@ #%% SAVE THE OUTPUT time_str = str(datetime.datetime.now()).replace(' ','_') -output_filename = 'holocene_recon_'+time_str+'_'+str(options['exp_name']) +if seed_overwrite == 'None': extra_txt = '' +else: extra_txt = '_iter_'+str(int(seed_overwrite)).zfill(2) +output_filename = 'holocene_recon_'+time_str+'_'+season_txt+'_'+str(options['exp_name'])+extra_txt print('Saving the reconstruction as '+output_filename) # Save all data into a netCDF file @@ -370,18 +418,19 @@ output_recon_nh[var_name] = outputfile.createVariable('recon_'+var_name+'_nh_mean', 'f4',('ages','ens',)) output_recon_sh[var_name] = outputfile.createVariable('recon_'+var_name+'_sh_mean', 'f4',('ages','ens',)) output_prior_mean[var_name] = outputfile.createVariable('prior_'+var_name+'_mean', 'f4',('ages','lat','lon',)) - #output_prior_ens[var_name] = outputfile.createVariable('prior_'+var_name+'_ens', 'f4',('ages','ens_selected','lat','lon',)) + output_prior_ens[var_name] = outputfile.createVariable('prior_'+var_name+'_ens', 'f4',('ages','ens_selected','lat','lon',)) output_prior_global[var_name] = outputfile.createVariable('prior_'+var_name+'_global_mean','f4',('ages','ens',)) - output_recon_mean[var_name][:] = recon_mean_grid[:,:,:,i] - output_recon_ens[var_name][:] = recon_ens_grid[:,:,:,:,i] + output_recon_mean[var_name][:] = recon_mean_grid[:,i,:,:] + output_recon_ens[var_name][:] = recon_ens_grid[:,:,i,:,:] output_recon_global[var_name][:] = recon_global_all[:,:,i] output_recon_nh[var_name][:] = recon_nh_all[:,:,i] output_recon_sh[var_name][:] = recon_sh_all[:,:,i] - output_prior_mean[var_name][:] = prior_mean_grid[:,:,:,i] - #output_prior_ens[var_name][:] = prior_ens_grid[:,:,:,:,i] + output_prior_mean[var_name][:] = prior_mean_grid[:,i,:,:] + output_prior_ens[var_name][:] = prior_ens_grid[:,:,i,:,:] output_prior_global[var_name][:] = prior_global_all[:,:,i] output_proxyprior_mean = outputfile.createVariable('proxyprior_mean', 'f4',('ages','proxy',)) +output_proxyprior_ens = outputfile.createVariable('proxyprior_ens', 'f4',('ages','ens_selected','proxy',)) output_proxyrecon_mean = outputfile.createVariable('proxyrecon_mean', 'f4',('ages','proxy',)) output_proxyrecon_ens = outputfile.createVariable('proxyrecon_ens', 'f4',('ages','ens_selected','proxy',)) output_ages = outputfile.createVariable('ages', 'f4',('ages',)) @@ -389,13 +438,15 @@ output_lon = outputfile.createVariable('lon', 'f4',('lon',)) output_proxy_vals = outputfile.createVariable('proxy_values', 'f4',('ages','proxy',)) output_proxy_res = outputfile.createVariable('proxy_resolutions', 'f4',('ages','proxy',)) -output_proxy_uncer = outputfile.createVariable('proxy_uncertainty', 'f4',('proxy',)) +if len(proxy_data['uncertainty'].shape) == 1: output_proxy_uncer = outputfile.createVariable('proxy_uncertainty','f4',('proxy',)) +elif len(proxy_data['uncertainty'].shape) == 2: output_proxy_uncer = outputfile.createVariable('proxy_uncertainty','f4',('proxy','ages')) output_metadata = outputfile.createVariable('proxy_metadata', 'str',('proxy','metadata',)) output_options = outputfile.createVariable('options', 'str',('exp_options',)) output_proxies_selected = outputfile.createVariable('proxies_selected', 'i1',('proxy',)) output_proxies_assimilated = outputfile.createVariable('proxies_assimilated','i1',('ages','proxy',)) output_proxyprior_mean[:] = prior_proxy_means +output_proxyprior_ens[:] = prior_proxy_ens output_proxyrecon_mean[:] = recon_mean_proxies output_proxyrecon_ens[:] = recon_ens_proxies output_ages[:] = proxy_data['age_centers'] diff --git a/da_main_code_kmat.py b/da_main_code_kmat.py new file mode 100644 index 0000000..a671418 --- /dev/null +++ b/da_main_code_kmat.py @@ -0,0 +1,489 @@ +#============================================================================== +# This script contains the main code of the Holocene data assimilation. +# Options are set in the config yml file. See README.txt for a more complete +# explanation of the code and setup. +# author: Michael Erb +# date : 1/22/2025 +#============================================================================== + +import sys +import numpy as np +import yaml +import time +import datetime +import netCDF4 +import da_utils +import da_utils_lmr +import da_load_models +import da_load_proxies +import da_psms + + +#TODO: Update any instances of 'tas' to be for other variables. + + +#%% SETTINGS + +starttime_total = time.time() # Start timer + +# Use a given config file. If not given, use config_default.yml. +if len(sys.argv) > 1: config_file = sys.argv[1] +else: config_file = 'config_kmat.yml' +if len(sys.argv) > 2: seed_overwrite = sys.argv[2] +else: seed_overwrite = 'None' + +# Load the configuration options and print them to the screen. +print('Using configuration file: '+config_file) +with open(config_file,'r') as file: options = yaml.load(file,Loader=yaml.FullLoader) + +print('=== SETTINGS ===') +for key in options.keys(): + print('%30s: %-15s' % (key,str(options[key]))) +print('=== END SETTINGS ===') + + +#%% LOAD AND PROCESS DATA + +# If the seed has been overwritten using sys.argv, update it here +if seed_overwrite != 'None': + options['seed_for_prior'] = int(seed_overwrite) + options['seed_for_proxy_choice'] = int(seed_overwrite) + +# Load the chosen proxy data +proxy_ts,collection_all = da_load_proxies.load_proxies(options) +proxy_data = da_load_proxies.process_proxies(proxy_ts,collection_all,options) + +# Load the chosen model data +model_data = da_load_models.load_model_data(options) + +# Detrend the model data if selected +model_data = da_load_models.detrend_model_data(model_data,options) + +# Get some dimensions +n_models_in_prior = len(options['models_for_prior']) +n_proxies = proxy_data['values_binned'].shape[0] + +# If the prior is allowed to change through time, remove the mean of the reference period from each model. +if options['reconstruction_type'] == 'relative': + for i in range(n_models_in_prior): + ind_for_model = (model_data['number'] == (i+1)) + ind_ref = (model_data['age'] >= options['reference_period'][0]) & (model_data['age'] < options['reference_period'][1]) & ind_for_model + for var in options['vars_to_reconstruct']: + model_data[var][ind_for_model,:,:,:] = model_data[var][ind_for_model,:,:,:] - np.mean(model_data[var][ind_ref,:,:,:],axis=0) + model_data[var+'_annual'][ind_for_model,:,:] = model_data[var+'_annual'][ind_for_model,:,:] - np.mean(model_data[var+'_annual'][ind_ref,:,:],axis=0) + model_data[var+'_jja'][ind_for_model,:,:] = model_data[var+'_jja'][ind_for_model,:,:] - np.mean(model_data[var+'_jja'][ind_ref,:,:],axis=0) + model_data[var+'_djf'][ind_for_model,:,:] = model_data[var+'_djf'][ind_for_model,:,:] - np.mean(model_data[var+'_djf'][ind_ref,:,:],axis=0) + +# If requested, alter the proxy uncertainty values. +if options['change_uncertainty']: + if options['change_uncertainty'][0:5] == 'mult_': + uncertainty_multiplier = float(options['change_uncertainty'][5:]) + proxy_data['uncertainty'] = proxy_data['uncertainty']*uncertainty_multiplier + print(' --- Processing: All uncertainty values multiplied by '+str(uncertainty_multiplier)+' ---') + elif options['change_uncertainty'][0:4] == 'all_': + prescribed_uncertainty = float(options['change_uncertainty'][4:]) + proxy_data['uncertainty'][:] = prescribed_uncertainty + print(' --- Processing: All uncertainty values set to '+str(prescribed_uncertainty)+' ---') + else: + # If using this option, the text file below should contain TSids and MSE for every proxy record + print(' --- Processing: All uncertainty values set to values from the following file ---') + print(options['change_uncertainty']) + proxy_uncertainties_from_file = np.genfromtxt(options['change_uncertainty'],delimiter=',',dtype='str') + # + for i in range(n_proxies): + index_uncertainty = np.where(proxy_data['metadata'][i,1] == proxy_uncertainties_from_file[:,0])[0] + if len(index_uncertainty) == 0: + print('No prescribed error value in file for proxy '+str(i)+', TSid: '+str(proxy_data['metadata'][i,1])+'. Setting to NaN.') + proxy_data['uncertainty'][i] = np.nan + else: + proxy_data['uncertainty'][i] = proxy_uncertainties_from_file[index_uncertainty,1].astype(float) + +# Use PSMs to get model-based proxy estimates +proxy_estimates_all,_,proxy_data = da_psms.psm_main(model_data,proxy_data,options) + + +#%% SET UNCERTAINTIES +""" +# Temperature proxies which are not in units of degC don't have uncertainty values. +# Here, set this to the standard deviation, as in Hancock et al., in press #TODO: Update this. +n_proxies = proxy_data['values_binned'].shape[0] +for i in range(n_proxies): + units = proxy_data['units'][i] + if units != 'degC': + print(i,units,proxy_data['uncertainty'][i]) + proxy_std = np.nanstd(proxy_data['values_binned'][i,:]) + proxy_data['uncertainty'][i] = proxy_std +""" + + +#%% SET THINGS UP + +# Get more dimensions +n_vars = len(options['vars_to_reconstruct']) +n_ages = proxy_data['values_binned'].shape[1] +n_lat = len(model_data['lat']) +n_lon = len(model_data['lon']) +n_varslatlon = n_vars*n_lat*n_lon +n_state = (n_varslatlon) + n_proxies + +# Determine the total possible number of ensemble members +n_ens_possible = len(da_load_models.get_indices_for_prior(options,model_data,0)) + +# If using less than 100 percent for the ensemble members, randomly choose them here. +np.random.seed(seed=options['seed_for_prior']) +n_ens = int(round(n_ens_possible*(options['percent_of_prior']/100))) +ind_to_use = np.random.choice(n_ens_possible,n_ens,replace=False) +ind_to_use = np.sort(ind_to_use) +print(' --- Processing: Choosing '+str(options['percent_of_prior'])+'% of possible prior states, n_ens='+str(n_ens)+' ---') + +# Randomly select the ensemble members to save (max=100) to reduce output filesizes +np.random.seed(seed=0) +n_ens_to_save = min([n_ens,100]) +ind_to_save = np.random.choice(n_ens,n_ens_to_save,replace=False) +ind_to_save = np.sort(ind_to_save) + +# Set up arrays for reconstruction values and more outputs +recon_ens = np.zeros((n_state,n_ens_to_save,n_ages)); recon_ens[:] = np.nan +recon_mean = np.zeros((n_state,n_ages)); recon_mean[:] = np.nan +recon_global_all = np.zeros((n_ages,n_ens,n_vars)); recon_global_all[:] = np.nan +recon_nh_all = np.zeros((n_ages,n_ens,n_vars)); recon_nh_all[:] = np.nan +recon_sh_all = np.zeros((n_ages,n_ens,n_vars)); recon_sh_all[:] = np.nan +prior_ens = np.zeros((n_state,n_ens_to_save,n_ages)); prior_ens[:] = np.nan +prior_mean = np.zeros((n_state,n_ages)); prior_mean[:] = np.nan +prior_global_all = np.zeros((n_ages,n_ens,n_vars)); prior_global_all[:] = np.nan +prior_proxy_means = np.zeros((n_ages,n_proxies)); prior_proxy_means[:] = np.nan +prior_proxy_ens = np.zeros((n_ages,n_ens_to_save,n_proxies)); prior_proxy_ens[:] = np.nan +proxies_to_assimilate_all = np.zeros((n_ages,n_proxies)); proxies_to_assimilate_all[:] = np.nan +kmat_all = np.zeros((n_ages,n_state,n_proxies)); kmat_all[:] = np.nan +cov_over_var_all = np.zeros((n_ages,n_state,n_proxies)); cov_over_var_all[:] = np.nan +cov_over_var_with_locrad_all = np.zeros((n_ages,n_state,n_proxies)); cov_over_var_with_locrad_all[:] = np.nan +varye_all = np.zeros((n_ages,n_proxies)); varye_all[:] = np.nan + + +#%% FIND PROXIES TO ASSIMILATE AND MORE + +print(' === FINDING PROXIES TO ASSIMILATE BASED ON CHOSEN OPTIONS ===') + +# Find proxies with data in selected age range (and reference period, if doing a relative reconstruction) +proxy_ind_with_valid_values = np.isfinite(np.nanmean(proxy_data['values_binned'],axis=1)) +print(' - Number of records with valid values for the chosen experiment: '+str(sum(proxy_ind_with_valid_values))) + +# Find the proxies with uncertainty values +proxy_ind_with_uncertainty = np.isfinite(proxy_data['uncertainty']) +if len(proxy_ind_with_uncertainty.shape) == 2: proxy_ind_with_uncertainty = proxy_ind_with_uncertainty.any(axis=1) +print(' - Number of records with uncertainty values: '+str(sum(proxy_ind_with_uncertainty))) + +# If requested, select only proxies with certain seasonalities +if options['assimilate_selected_seasons'] in ['jja_preferred','djf_preferred']: + proxy_ind_of_seasonality = da_utils.find_seasons(proxy_data,options) +elif options['assimilate_selected_seasons']: + proxy_ind_of_seasonality = np.full((n_proxies),False,dtype=bool) + ind_seasons = [i for i, seasontype in enumerate(proxy_data['metadata'][:,5]) if seasontype.lower() in options['assimilate_selected_seasons']] + proxy_ind_of_seasonality[ind_seasons] = True + print(' - Number of records with seasonalities '+str(options['assimilate_selected_seasons'])+': '+str(sum(proxy_ind_of_seasonality))) +else: + proxy_ind_of_seasonality = np.full((n_proxies),True,dtype=bool) + +# If requested, select only certain archive types +if options['assimilate_selected_archives']: + proxy_ind_of_archive_type = np.full((n_proxies),False,dtype=bool) + ind_archives = [i for i, atype in enumerate(proxy_data['archivetype']) if atype in options['assimilate_selected_archives']] + proxy_ind_of_archive_type[ind_archives] = True + print(' - Number of records with archive types '+str(options['assimilate_selected_archives'])+': '+str(sum(proxy_ind_of_archive_type))) +else: + proxy_ind_of_archive_type = np.full((n_proxies),True,dtype=bool) + +# If requested, select the proxies within the specified region +if options['assimilate_selected_region']: + region_lat_min,region_lat_max,region_lon_min,region_lon_max = options['assimilate_selected_region'] + proxy_ind_in_region = (proxy_data['lats'] >= region_lat_min) & (proxy_data['lats'] <= region_lat_max) & (proxy_data['lons'] >= region_lon_min) & (proxy_data['lons'] <= region_lon_max) + print(' - Number of records in region '+str(options['assimilate_selected_region'])+': '+str(sum(proxy_ind_in_region))) +else: + proxy_ind_in_region = np.full((n_proxies),True,dtype=bool) + +# If requested, select the proxies with median resolution within a certain window +if options['assimilate_selected_resolution']: + proxy_med_res = proxy_data['metadata'][:,6].astype(float) + proxy_ind_in_resolution_band = (proxy_med_res >= options['assimilate_selected_resolution'][0]) & (proxy_med_res < options['assimilate_selected_resolution'][1]) + print(' - Number of records with median resolution in the range '+str(options['assimilate_selected_resolution'])+': '+str(sum(proxy_ind_in_resolution_band))) +else: + proxy_ind_in_resolution_band = np.full((n_proxies),True,dtype=bool) + +# Count the selected records so far +proxy_ind_chosen_criteria = proxy_ind_with_valid_values &\ + proxy_ind_with_uncertainty &\ + proxy_ind_of_seasonality &\ + proxy_ind_of_archive_type &\ + proxy_ind_in_region &\ + proxy_ind_in_resolution_band +ind_values_chosen_criteria = np.where(proxy_ind_chosen_criteria)[0] +n_proxies_meeting_criteria = sum(proxy_ind_chosen_criteria) +print(' --- Number of records meeting ALL of the above criteria: '+str(n_proxies_meeting_criteria)) + +# If requested, select the portion of the proxies which are to be assimilated +if options['percent_to_assimilate'] < 100: + print(' - Processing: Choosing only '+str(options['percent_to_assimilate'])+'% of possible proxies') + proxy_ind_selected = np.full((n_proxies),False,dtype=bool) + np.random.seed(seed=options['seed_for_proxy_choice']) + n_proxies_to_choose = int(round(n_proxies_meeting_criteria*(options['percent_to_assimilate']/100))) + proxy_ind_random = np.random.choice(n_proxies_meeting_criteria,n_proxies_to_choose,replace=False) + proxy_ind_random = np.sort(proxy_ind_random) + proxy_ind_selected[ind_values_chosen_criteria[proxy_ind_random]] = True +else: + proxy_ind_selected = proxy_ind_chosen_criteria + +print(' --- Final number of selected records: '+str(sum(proxy_ind_selected))) + +# Calculate the localization matrix (it may not be used) +if options['assimate_together'] == False: + proxy_localization_all = da_utils.loc_matrix(options,model_data,proxy_data) + + +#%% DO DATA ASSIMILATION + +# Loop through every age, doing the data assimilation with a time-varying prior +print(' === Starting data assimilation === ') +age_counter = 0; age = proxy_data['age_centers'][age_counter] +for age_counter,age in enumerate(proxy_data['age_centers']): + # + starttime_loop = time.time() + # + # Get all proxy values and resolutions for the current age + proxy_values_for_age = proxy_data['values_binned'][:,age_counter] + proxy_resolution_for_age = proxy_data['resolution_binned'][:,age_counter] + # + # Get the proxy uncertainty for the given age + if len(proxy_data['uncertainty'].shape) == 1: proxy_uncertainties_for_age = proxy_data['uncertainty'] + elif len(proxy_data['uncertainty'].shape) == 2: proxy_uncertainties_for_age = proxy_data['uncertainty'][:,age_counter] + else: print('Proxy uncertainties are an unknown shape: '+str(proxy_data['uncertainty'].shape)) + # + # Get the indices of the prior which will be used for this data assimilation step + indices_for_prior = da_load_models.get_indices_for_prior(options,model_data,age) + model_number_for_prior = model_data['number'][indices_for_prior] + if len(indices_for_prior) != n_ens_possible: print(' !!! Warning: number of prior ages selected does not match n_ens. Age='+str(age)) + # + # Select the season to reconstruct in the prior + season_txt = options['season_to_reconstruct'] + # + # Get the prior values for the variables to reconstruct + for j,var_name in enumerate(options['vars_to_reconstruct']): + var_season_for_prior = model_data[var_name+'_'+season_txt][indices_for_prior,:,:][:,None,:,:] + if j == 0: vars_season_for_prior_all = var_season_for_prior + else: vars_season_for_prior_all = np.concatenate((vars_season_for_prior_all,var_season_for_prior),axis=1) + # + # For each proxy, get the proxy estimates for the correct resolution + model_estimates_for_age = np.zeros((n_ens_possible,n_proxies)); model_estimates_for_age[:] = np.nan + for j in range(n_proxies): + res = proxy_resolution_for_age[j] + if np.isnan(proxy_values_for_age[j]) or np.isnan(res): continue + model_estimates_for_age[:,j] = proxy_estimates_all[j][int(res)][indices_for_prior] + # + # Use only the randomly selected climate states in the prior + vars_season_for_prior_all = vars_season_for_prior_all[ind_to_use,:,:,:] + model_estimates_for_age = model_estimates_for_age[ind_to_use,:] + model_number_for_prior = model_number_for_prior[ind_to_use] + # + # For a relative reconstruction, remove the means of each model seperately + if ((options['reconstruction_type'] == 'relative') and (options['prior_mean_always_0'] == True)): + for i in range(n_models_in_prior): + ind_for_model = np.where(model_number_for_prior == (i+1))[0] + vars_season_for_prior_all[ind_for_model,:,:,:] = vars_season_for_prior_all[ind_for_model,:,:,:] - np.mean(vars_season_for_prior_all[ind_for_model,:,:,:],axis=0) + model_estimates_for_age[ind_for_model,:] = model_estimates_for_age[ind_for_model,:] - np.mean(model_estimates_for_age[ind_for_model,:],axis=0) + # + # Make the prior (Xb) + prior = np.reshape(vars_season_for_prior_all,(n_ens,n_varslatlon)) + # + # Append the proxy estimate to the prior, so that proxy estimates are reconstructed too + prior = np.append(prior,model_estimates_for_age,axis=1) + Xb = np.transpose(prior) + # + # Get the mean and selected ensemble values + prior_mean[:,age_counter] = np.mean(Xb,axis=1) + prior_ens[:,:,age_counter] = Xb[:,ind_to_save] + # + # Save the prior estimates of proxies, for analysis later + prior_proxy_means[age_counter,:] = np.mean(model_estimates_for_age,axis=0) + prior_proxy_ens[age_counter,:,:] = model_estimates_for_age[ind_to_save,:] + # + # Select only the proxies which meet the criteria + proxies_to_assimilate = proxy_ind_selected & np.isfinite(proxy_values_for_age) & np.isfinite(proxy_uncertainties_for_age) & np.isfinite(proxy_resolution_for_age) & np.isfinite(prior_proxy_means[age_counter,:]) + # + # Keep a record of which proxies are assimilated + proxies_to_assimilate_all[age_counter,:] = proxies_to_assimilate + # + # If valid proxies are present for this time step, do the data assimilation + proxy_ind_to_assimilate = np.where(proxies_to_assimilate)[0] + n_proxies_at_age = proxy_ind_to_assimilate.shape[0] + if n_proxies_at_age > 0: + # + proxy_values_selected = proxy_values_for_age[proxy_ind_to_assimilate] + proxy_uncertainty_selected = proxy_uncertainties_for_age[proxy_ind_to_assimilate] + model_estimates_selected = model_estimates_for_age[:,proxy_ind_to_assimilate] + R_diagonal = np.diag(proxy_uncertainty_selected) + # + # Do the DA update, either together or one at a time. + if options['assimate_together']: + Xa,_,_ = da_utils.damup(Xb,np.transpose(model_estimates_selected),R_diagonal,proxy_values_selected) + else: + for proxy in range(n_proxies_at_age): + # + # Get values for proxy + proxy_value = proxy_values_selected[proxy] + proxy_uncertainty = proxy_uncertainty_selected[proxy] + proxy_modelbased_estimates = Xb[n_varslatlon+proxy_ind_to_assimilate[proxy],:] + if options['localization_radius'] != 'None': loc = proxy_localization_all[proxy_ind_to_assimilate[proxy],:] + else: loc = None + # + # Do data assimilation - to calculate the Kalman filter equally, the prior isn't actually updated. + _,kmat,cov_over_var,cov_over_var_with_locrad,varye = da_utils_lmr.enkf_update_array_explore(Xb,proxy_value,proxy_modelbased_estimates,proxy_uncertainty,loc=loc,inflate=None) + if np.isnan(Xb).all(): print(' !!! ERROR. ALL RECONSTRUCTION VALUES SET TO NAN. Age='+str(age)+', proxy number='+str(proxy)+' !!!') + kmat_all[age_counter,:,proxy_ind_to_assimilate[proxy]] = np.squeeze(kmat) + cov_over_var_all[age_counter,:,proxy_ind_to_assimilate[proxy]] = np.squeeze(cov_over_var) + cov_over_var_with_locrad_all[age_counter,:,proxy_ind_to_assimilate[proxy]] = np.squeeze(cov_over_var_with_locrad) + varye_all[age_counter,proxy_ind_to_assimilate[proxy]] = varye + # + # Set the final values + Xa = Xb + # + else: + # No proxies are assimilated + Xa = Xb + # + # Compute the global-mean of the prior + prior_global = da_utils.global_mean(vars_season_for_prior_all,model_data['lat'],2,3) + prior_global_all[age_counter,:,:] = prior_global + # + # Compute the global and hemispheric means of the reconstruction + Xa_latlon = np.reshape(Xa[:n_varslatlon,:],(n_vars,n_lat,n_lon,n_ens)) + recon_global = da_utils.global_mean(Xa_latlon,model_data['lat'],1,2) + recon_global_all[age_counter,:,:] = np.transpose(recon_global) + try: + recon_nh = da_utils.spatial_mean(Xa_latlon,model_data['lat'],model_data['lon'],0,90,0,360,1,2) + recon_nh_all[age_counter,:,:] = np.transpose(recon_nh) + except: + print('Warning: Cannot calculate NH mean in prior. Leaving as NaN.') + try: + recon_sh = da_utils.spatial_mean(Xa_latlon,model_data['lat'],model_data['lon'],-90,0,0,360,1,2) + recon_sh_all[age_counter,:,:] = np.transpose(recon_sh) + except: + print('Warning: Cannot calculate SH mean in prior. Leaving as NaN.') + # + # Get the mean and selected ensemble values + recon_mean[:,age_counter] = np.mean(Xa,axis=1) + recon_ens[:,:,age_counter] = Xa[:,ind_to_save] + # + # Note progression of the reconstruction + print('Time step '+str(age_counter)+'/'+str(len(proxy_data['age_centers'] ))+' complete. Time: '+str('%1.2f' % (time.time()-starttime_loop))+' sec') + +# Reshape the data arrays +recon_mean = np.transpose(recon_mean) +recon_ens = np.swapaxes(recon_ens,0,2) +prior_mean = np.transpose(prior_mean) +prior_ens = np.swapaxes(prior_ens,0,2) + +# Reshape the gridded reconstruction to a lat-lon grid +recon_mean_grid = np.reshape(recon_mean[:,:n_varslatlon], (n_ages, n_vars,n_lat,n_lon)) +recon_ens_grid = np.reshape(recon_ens[:,:,:n_varslatlon],(n_ages,n_ens_to_save,n_vars,n_lat,n_lon)) +prior_mean_grid = np.reshape(prior_mean[:,:n_varslatlon], (n_ages, n_vars,n_lat,n_lon)) +prior_ens_grid = np.reshape(prior_ens[:,:,:n_varslatlon],(n_ages,n_ens_to_save,n_vars,n_lat,n_lon)) +kmat_all_grid = np.reshape(kmat_all[:,:n_varslatlon,:], (n_ages,n_vars,n_lat,n_lon,n_proxies)) +cov_over_var_all_grid = np.reshape(cov_over_var_all[:,:n_varslatlon,:], (n_ages,n_vars,n_lat,n_lon,n_proxies)) +cov_over_var_with_locrad_all_grid = np.reshape(cov_over_var_with_locrad_all[:,:n_varslatlon,:],(n_ages,n_vars,n_lat,n_lon,n_proxies)) + +# Put the proxy reconstructions into separate variables +recon_mean_proxies = recon_mean[:,n_varslatlon:] +recon_ens_proxies = recon_ens[:,:,n_varslatlon:] + +# Store the options into as list to save +n_options = len(options.keys()) +options_list = [] +for key,value in options.items(): + options_list.append(key+':'+str(value)) + + +#%% SAVE THE OUTPUT + +time_str = str(datetime.datetime.now()).replace(' ','_') +if seed_overwrite == 'None': extra_txt = '' +else: extra_txt = '_iter_'+str(int(seed_overwrite)).zfill(2) +output_filename = 'holocene_recon_'+time_str+'_'+season_txt+'_'+str(options['exp_name'])+extra_txt +print('Saving the reconstruction as '+output_filename) + +# Save all data into a netCDF file +output_dir = options['data_dir']+'results/' +outputfile = netCDF4.Dataset(output_dir+output_filename+'.nc','w') +outputfile.createDimension('ages', n_ages) +outputfile.createDimension('ens', n_ens) +outputfile.createDimension('ens_selected',n_ens_to_save) +outputfile.createDimension('lat', n_lat) +outputfile.createDimension('lon', n_lon) +outputfile.createDimension('proxy', n_proxies) +outputfile.createDimension('metadata', proxy_data['metadata'].shape[1]) +outputfile.createDimension('exp_options', n_options) + +output_recon_mean,output_recon_ens,output_recon_global,output_recon_nh,output_recon_sh,output_prior_mean,output_prior_ens,output_prior_global = {},{},{},{},{},{},{},{} +output_kmat,output_cov_over_var,output_cov_over_var_with_locrad = {},{},{} +for i,var_name in enumerate(options['vars_to_reconstruct']): + output_recon_mean[var_name] = outputfile.createVariable('recon_'+var_name+'_mean', 'f4',('ages','lat','lon',)) + output_recon_ens[var_name] = outputfile.createVariable('recon_'+var_name+'_ens', 'f4',('ages','ens_selected','lat','lon',)) + output_recon_global[var_name] = outputfile.createVariable('recon_'+var_name+'_global_mean','f4',('ages','ens',)) + output_recon_nh[var_name] = outputfile.createVariable('recon_'+var_name+'_nh_mean', 'f4',('ages','ens',)) + output_recon_sh[var_name] = outputfile.createVariable('recon_'+var_name+'_sh_mean', 'f4',('ages','ens',)) + output_prior_mean[var_name] = outputfile.createVariable('prior_'+var_name+'_mean', 'f4',('ages','lat','lon',)) + output_prior_ens[var_name] = outputfile.createVariable('prior_'+var_name+'_ens', 'f4',('ages','ens_selected','lat','lon',)) + output_prior_global[var_name] = outputfile.createVariable('prior_'+var_name+'_global_mean','f4',('ages','ens',)) + output_kmat[var_name] = outputfile.createVariable('kmat_'+var_name+'_all', 'f4',('ages','lat','lon','proxy',)) + output_cov_over_var[var_name] = outputfile.createVariable('cov_over_var_'+var_name+'_all', 'f4',('ages','lat','lon','proxy',)) + output_cov_over_var_with_locrad[var_name] = outputfile.createVariable('cov_over_var_with_locrad_'+var_name+'_all','f4',('ages','lat','lon','proxy',)) + output_recon_mean[var_name][:] = recon_mean_grid[:,i,:,:] + output_recon_ens[var_name][:] = recon_ens_grid[:,:,i,:,:] + output_recon_global[var_name][:] = recon_global_all[:,:,i] + output_recon_nh[var_name][:] = recon_nh_all[:,:,i] + output_recon_sh[var_name][:] = recon_sh_all[:,:,i] + output_prior_mean[var_name][:] = prior_mean_grid[:,i,:,:] + output_prior_ens[var_name][:] = prior_ens_grid[:,:,i,:,:] + output_prior_global[var_name][:] = prior_global_all[:,:,i] + output_kmat[var_name][:] = kmat_all_grid[:,i,:,:,:] + output_cov_over_var[var_name][:] = cov_over_var_all_grid[:,i,:,:,:] + output_cov_over_var_with_locrad[var_name][:] = cov_over_var_with_locrad_all_grid[:,i,:,:,:] + +output_proxyprior_mean = outputfile.createVariable('proxyprior_mean', 'f4',('ages','proxy',)) +output_proxyprior_ens = outputfile.createVariable('proxyprior_ens', 'f4',('ages','ens_selected','proxy',)) +output_proxyrecon_mean = outputfile.createVariable('proxyrecon_mean', 'f4',('ages','proxy',)) +output_proxyrecon_ens = outputfile.createVariable('proxyrecon_ens', 'f4',('ages','ens_selected','proxy',)) +output_ages = outputfile.createVariable('ages', 'f4',('ages',)) +output_lat = outputfile.createVariable('lat', 'f4',('lat',)) +output_lon = outputfile.createVariable('lon', 'f4',('lon',)) +output_proxy_vals = outputfile.createVariable('proxy_values', 'f4',('ages','proxy',)) +output_proxy_res = outputfile.createVariable('proxy_resolutions', 'f4',('ages','proxy',)) +if len(proxy_data['uncertainty'].shape) == 1: output_proxy_uncer = outputfile.createVariable('proxy_uncertainty','f4',('proxy',)) +elif len(proxy_data['uncertainty'].shape) == 2: output_proxy_uncer = outputfile.createVariable('proxy_uncertainty','f4',('proxy','ages')) +output_metadata = outputfile.createVariable('proxy_metadata', 'str',('proxy','metadata',)) +output_options = outputfile.createVariable('options', 'str',('exp_options',)) +output_proxies_selected = outputfile.createVariable('proxies_selected', 'i1',('proxy',)) +output_proxies_assimilated = outputfile.createVariable('proxies_assimilated','i1',('ages','proxy',)) +output_varye_all = outputfile.createVariable('proxy_variance', 'f4',('ages','proxy',)) + +output_proxyprior_mean[:] = prior_proxy_means +output_proxyprior_ens[:] = prior_proxy_ens +output_proxyrecon_mean[:] = recon_mean_proxies +output_proxyrecon_ens[:] = recon_ens_proxies +output_ages[:] = proxy_data['age_centers'] +output_lat[:] = model_data['lat'] +output_lon[:] = model_data['lon'] +output_proxy_vals[:] = np.transpose(proxy_data['values_binned']) +output_proxy_res[:] = np.transpose(proxy_data['resolution_binned']) +output_proxy_uncer[:] = proxy_data['uncertainty'] +output_metadata[:] = proxy_data['metadata'] +output_options[:] = np.array(options_list) +output_proxies_selected[:] = proxy_ind_selected.astype(int) +output_proxies_assimilated[:] = proxies_to_assimilate_all.astype(int) +output_varye_all[:] = varye_all + +outputfile.title = 'Holocene climate reconstruction' +outputfile.close() + +endtime_total = time.time() # End timer +print('Total time: '+str('%1.2f' % ((endtime_total-starttime_total)/60))+' minutes') +print(' === Reconstruction complete ===') + diff --git a/da_psms.py b/da_psms.py index a2dbfa6..f84bbe3 100644 --- a/da_psms.py +++ b/da_psms.py @@ -1,45 +1,64 @@ #============================================================================== # Different PSMs for use in the Holocene DA project. # author: Michael P. Erb -# date : 3/16/2022 +# date : 1/14/2025 #============================================================================== import numpy as np +from scipy.stats import rankdata # Use PSMs to get model-based proxy estimates def psm_main(model_data,proxy_data,options): # n_proxies = proxy_data['values_binned'].shape[0] proxy_estimates_all = np.array([dict() for k in range(n_proxies)]) # HXb + i = 210 for i in range(n_proxies): # # Set PSMs requirements psm_requirements = {} - psm_requirements['get_tas'] = {'units':'degC'} + psm_requirements['calibrated_tas'] = {'interp':['temperature'],'units':['degc']} + psm_requirements['calibrated_precip'] = {'interp':['p','precipitation'],'units':['mm/day']} # Other possible precip units: 'mm/a','mm/yr' + psm_requirements['rank_based_tas'] = {'interp':['temperature']} + #psm_requirements['get_p_e'] = {'units':'mm/a','interp':'P-E'} #TODO: Update this. # # Set the PSMs to use - psm_types_to_use = ['get_tas'] + #psms_to_use = ['calibrated_tas','calibrated_precip','rank_based_tas'] + #psms_to_use = ['calibrated_tas','calibrated_precip'] + psms_to_use = options['psms_to_use'] + psm_if_no_match = 'use_nans' # - # The code will use the first PSM in the list above that meets the requirements + # The code will use the first PSM in the list above that meets the requirements psm_selected = None - for psm_type in psm_types_to_use: + for psm_type in psms_to_use: psm_keys = list(psm_requirements[psm_type].keys()) psm_check = np.full(len(psm_keys),False,dtype=bool) for counter,psm_key in enumerate(psm_keys): - psm_check[counter] = (proxy_data[psm_key][i] == psm_requirements[psm_type][psm_key]) + #psm_check[counter] = (proxy_data[psm_key][i].lower() == psm_requirements[psm_type][psm_key]) + psm_check[counter] = (proxy_data[psm_key][i].lower() in psm_requirements[psm_type][psm_key]) # if psm_check.all() == True: psm_selected = psm_type; break # if psm_selected == None: - print('WARNING: No PSM found. Using NaNs.') - psm_selected = 'use_nans' + #print('WARNING: No PSM found. Using PSM:',psm_if_no_match) + psm_selected = psm_if_no_match # - print('Proxy',i,'PSM selected:',psm_selected,'|',proxy_data['archivetype'][i],proxy_data['proxytype'][i],proxy_data['units'][i]) + print('Proxy',i,'PSM selected:',psm_selected,'|',proxy_data['archivetype'][i],proxy_data['proxytype'][i],proxy_data['interp'][i],proxy_data['units'][i]) # # Calculate the model-based proxy estimate depending on the PSM (or variable to compare, it the proxy is already calibrated) - if psm_selected == 'get_tas': proxy_estimate = get_model_values(model_data,proxy_data,'tas',i) - elif psm_selected == 'use_nans': proxy_estimate = use_nans(model_data) - else: proxy_estimate = use_nans(model_data) + # Model values are in units of degree C (for tas) and mm/day (for precip) + if psm_selected == 'calibrated_tas': proxy_estimate = get_model_values(model_data,proxy_data,'tas',i) + elif psm_selected == 'calibrated_precip': proxy_estimate = get_model_values(model_data,proxy_data,'precip',i) + elif psm_selected == 'rank_based_tas': + proxy_estimate,proxy_update = rank_based(model_data,proxy_data,'tas',i,options) + proxy_data['values_binned'][i,:] = proxy_update + proxy_data['metadata'][i,8] = proxy_data['metadata'][i,8]+'_percentile' + proxy_data['units'][i] = proxy_data['units'][i]+'_percentile' + elif psm_selected == 'use_nans': proxy_estimate = use_nans(model_data) + else: proxy_estimate = use_nans(model_data) + # + # If the proxy units are mm/a, convert the model-based estimates from mm/day to mm/year + if proxy_data['units'][i] == 'mm/a': proxy_estimate = proxy_estimate*365.25 #TODO: Is there a better way to account for leap years in these decadal means? # # Find all time resolutions in the record proxy_res_12ka_unique = np.unique(proxy_data['resolution_binned'][i,:]) @@ -52,19 +71,55 @@ def psm_main(model_data,proxy_data,options): proxy_estimates_all[i][res] = proxy_estimate_nyear_mean # print('Finished preprocessing proxies and making model-based proxy estimates.') - return proxy_estimates_all,proxy_estimate + return proxy_estimates_all,proxy_estimate,proxy_data # A function to get the model values at the same location and seasonality as the proxy def get_model_values(model_data,proxy_data,var_name,i,verbose=False): # - tas_model = model_data[var_name] + var_model = model_data[var_name] + lat_model = model_data['lat'] + lon_model = model_data['lon'] + ndays_model = model_data['time_ndays'] + proxy_lat = proxy_data['lats'][i] + proxy_lon = proxy_data['lons'][i] + proxy_season = proxy_data['seasonality_array'][i] + # + # Find the model gridpoint closest to the proxy location + if proxy_lon < 0: proxy_lon = proxy_lon+360 + lon_model_wrapped = np.append(lon_model,lon_model[0]+360) + j_selected = np.argmin(np.abs(lat_model-proxy_lat)) + i_selected = np.argmin(np.abs(lon_model_wrapped-proxy_lon)) + if np.abs(proxy_lat-lat_model[j_selected]) > 2: print('WARNING: Too large of a lat difference. Proxy lat: '+str(proxy_lat)+', model lat: '+str(lat_model[j_selected])) + if np.abs(proxy_lon-lon_model_wrapped[i_selected]) > 2: print('WARNING: Too large of a lon difference. Proxy lon: '+str(proxy_lon)+', model lon: '+str(lon_model_wrapped[i_selected])) + if i_selected == len(lon_model_wrapped)-1: i_selected = 0 + if verbose: print('Proxy location vs. nearest model gridpoint. Lat: '+str(proxy_lat)+', '+str(lat_model[j_selected])+'. Lon: '+str(proxy_lon)+', '+str(lon_model[i_selected])) + var_model_location = var_model[:,:,j_selected,i_selected] + # + # Compute an average over months according to the proxy seasonality + # Note: months are always taken from the current year, not from the previous year + proxy_seasonality_indices = np.abs(proxy_season)-1 + proxy_seasonality_indices[proxy_seasonality_indices > 11] = proxy_seasonality_indices[proxy_seasonality_indices > 11] - 12 + var_model_location_season = np.average(var_model_location[:,proxy_seasonality_indices],weights=ndays_model[:,proxy_seasonality_indices],axis=1) + # + return var_model_location_season + + +# A function to do rank-based comparison with temperature data +#var_name,verbose = 'tas',False +def rank_based(model_data,proxy_data,var_name,i,options,verbose=False): + # + var_model = model_data[var_name] lat_model = model_data['lat'] lon_model = model_data['lon'] ndays_model = model_data['time_ndays'] + age_model = model_data['age'] proxy_lat = proxy_data['lats'][i] proxy_lon = proxy_data['lons'][i] + proxy_ages = proxy_data['age_centers'] proxy_season = proxy_data['seasonality_array'][i] + proxy_values = proxy_data['values_binned'][i] + proxy_direction = proxy_data['direction'][i] # # Find the model gridpoint closest to the proxy location if proxy_lon < 0: proxy_lon = proxy_lon+360 @@ -75,15 +130,57 @@ def get_model_values(model_data,proxy_data,var_name,i,verbose=False): if np.abs(proxy_lon-lon_model_wrapped[i_selected]) > 2: print('WARNING: Too large of a lon difference. Proxy lon: '+str(proxy_lon)+', model lon: '+str(lon_model_wrapped[i_selected])) if i_selected == len(lon_model_wrapped)-1: i_selected = 0 if verbose: print('Proxy location vs. nearest model gridpoint. Lat: '+str(proxy_lat)+', '+str(lat_model[j_selected])+'. Lon: '+str(proxy_lon)+', '+str(lon_model[i_selected])) - tas_model_location = tas_model[:,:,j_selected,i_selected] + var_model_location = var_model[:,:,j_selected,i_selected] # # Compute an average over months according to the proxy seasonality # Note: months are always taken from the current year, not from the previous year proxy_seasonality_indices = np.abs(proxy_season)-1 proxy_seasonality_indices[proxy_seasonality_indices > 11] = proxy_seasonality_indices[proxy_seasonality_indices > 11] - 12 - tas_model_location_season = np.average(tas_model_location[:,proxy_seasonality_indices],weights=ndays_model[:,proxy_seasonality_indices],axis=1) + var_model_location_season = np.average(var_model_location[:,proxy_seasonality_indices],weights=ndays_model[:,proxy_seasonality_indices],axis=1) + if proxy_direction.lower() == 'negative': var_model_location_season = -1*var_model_location_season # Note: If interpretation direction is not given, it is assumed to be positive. + # + #TODO: Conceptually, does this work if the proxy and prior resolutions are different? + # + # Get the model values corresponding to the valid data of the proxy + ind_proxy_valid = np.isfinite(proxy_values) + proxy_values_valid = proxy_values[ind_proxy_valid] + proxy_ages_valid = proxy_ages[ind_proxy_valid] + proxy_ages_valid_start = proxy_ages_valid[0] - (options['time_resolution']/2) + proxy_ages_valid_end = proxy_ages_valid[-1] + (options['time_resolution']/2) + ind_model_selected = np.where((age_model >= proxy_ages_valid_start) & (age_model <= proxy_ages_valid_end))[0] + #var_model_location_season_selected = var_model_location_season[ind_model_selected] + # + # Calculate ranks for the proxy and prior data + ranks_proxy_values = rankdata(proxy_values_valid) - 1 + ranks_prior_values = rankdata(var_model_location_season) - 1 + percentile_proxy_values = (ranks_proxy_values / max(ranks_proxy_values)) * 100 + percentile_prior_values = ((ranks_prior_values-min(ranks_prior_values[ind_model_selected])) / (max(ranks_prior_values[ind_model_selected])-min(ranks_prior_values[ind_model_selected]))) * 100 + # + # Put the proxy data into the same length array as it was originally + percentile_proxy_values_all = np.zeros((len(proxy_values))); percentile_proxy_values_all[:] = np.nan + percentile_proxy_values_all[ind_proxy_valid] = percentile_proxy_values + # + # Remove the reference period from the proxy and prior values #TODO: Check that this is a good solution. + ind_ref_proxy = np.where((proxy_ages >= options['reference_period'][0]) & (proxy_ages < options['reference_period'][1]))[0] + ind_ref_prior = np.where((age_model >= options['reference_period'][0]) & (age_model < options['reference_period'][1]))[0] + percentile_proxy_values_all = percentile_proxy_values_all - np.nanmean(percentile_proxy_values_all[ind_ref_proxy]) + percentile_prior_values = percentile_prior_values - np.mean(percentile_prior_values[ind_ref_prior]) + if np.isnan(percentile_proxy_values_all[ind_ref_proxy]).all(): print('No data in reference period, index: '+str(i)) + # + """ + import matplotlib.pyplot as plt + plt.plot(proxy_ages,proxy_values,'k-') + plt.plot(age_model,var_model_location_season,'b-') + plt.show() + # + plt.plot(proxy_ages,percentile_proxy_values_all,'k-') + plt.plot(age_model,percentile_prior_values,'b-') + plt.show() + # + plt.scatter(proxy_values_valid,ranks_proxy_values) + """ # - return tas_model_location_season + return percentile_prior_values,percentile_proxy_values_all # A function to get the NaNs with the same length as the model data diff --git a/da_utils.py b/da_utils.py index 7ca0e3a..b1fb805 100644 --- a/da_utils.py +++ b/da_utils.py @@ -30,7 +30,59 @@ def damup(Xb,HXb,R,y): # Xam = analysis mean [n_state] # # Number of ensemble members - nens = Xb.shape[1] + nens = Xb.shape[1] + # + # Decompose Xb and HXb into mean and perturbations (for Eqs. 4 & Sec 3) + Xbm = np.mean(Xb,axis=1) + Xbp = Xb - Xbm[:,None] + # + HXbm = np.mean(HXb,axis=1) + HXbp = HXb - HXbm[:,None] + # + # Kalman gain for mean and matrix covariances (Eq. 2) + PbHT = np.dot(Xbp, np.transpose(HXbp))/(nens-1) + HPbHTR = np.dot(HXbp,np.transpose(HXbp))/(nens-1)+R + K = np.dot(PbHT,np.linalg.inv(HPbHTR)) + # + # Kalman gain for the perturbations (Eq. 10) + sHPbHTR = sqrtm(HPbHTR) + sR = sqrtm(R) + Ktn = np.dot(PbHT,np.transpose(np.linalg.inv(sHPbHTR))) + Ktd = np.linalg.inv(sHPbHTR+sR) + Kt = np.dot(Ktn,Ktd) + # + # Update mean and perturbations (Eq. 4 & Sec 3) + Xam = Xbm + np.dot(K,(y-HXbm)) + Xap = Xbp - np.dot(Kt,HXbp) + # + # Reconstitute the full ensemble state vector + Xa = Xap + Xam[:,None] + # + # Output both the full ensemble and the ensemble mean + return Xa,Xam,K + + +# A function to do the data assimilation. It is based on '2_darecon.jl', +# originally written by Nathan Steiger. +#Xb,HXb,R,y = Xb,np.transpose(model_estimates_selected),R_diagonal,proxy_values_selected +def damup_explore(Xb,HXb,R,y): + # + # Data assimilation matrix update step, assimilating all observations + # for a given time step at once. Variables with their dimensions are + # indicated by [dim1 dim2] given below. This set of update equations + # follow those from Whitaker and Hamill 2002: Eq. 2, 4, 10, & Sec 3. + # ARGUMENTS: + # Xb = background (prior) [n_state, n_ens] + # HXb = model estimate of observations H(Xb) [n_proxies_valid, n_ens] + # y = observation (with implied noise) [n_proxies_valid] + # R = diagonal observation error variance matrix [n_proxies_valid, n_proxies_valid] + # infl = inflation factor [scalar] **Note: modify code to include** + # RETURNS: + # Xa = analysis (posterior) [n_state, n_ens] + # Xam = analysis mean [n_state] + # + # Number of ensemble members + nens = Xb.shape[1] # # Decompose Xb and HXb into mean and perturbations (for Eqs. 4 & Sec 3) Xbm = np.mean(Xb,axis=1) @@ -72,6 +124,7 @@ def interpret_seasonality(seasonality_txt,lat,unknown_option): elif (str(seasonality_txt).lower() == 'tann; 2'): seasonality = '1 2 3 4 5 6 7 8 9 10 11 12' elif (str(seasonality_txt).lower() == 'year'): seasonality = '1 2 3 4 5 6 7 8 9 10 11 12' elif (str(seasonality_txt).lower() == '1 2 3 4 5 6 7 8 9 10 11 12'): seasonality = '1 2 3 4 5 6 7 8 9 10 11 12' + elif (str(seasonality_txt).lower() == '1,2,3,4,5,6,7,8,9,10,11,12'): seasonality = '1 2 3 4 5 6 7 8 9 10 11 12' elif (str(seasonality_txt).lower() == 'subannual'): seasonality = '1 2 3 4 5 6 7 8 9 10 11 12' elif (str(seasonality_txt).lower() == 'nan'): seasonality = '1 2 3 4 5 6 7 8 9 10 11 12' elif (str(seasonality_txt).lower() == 'not specified'): seasonality = '1 2 3 4 5 6 7 8 9 10 11 12' @@ -226,6 +279,52 @@ def interpret_seasonality(seasonality_txt,lat,unknown_option): return seasonality +# A function to find the proxy records which match the requested seasonality criteria +def find_seasons(proxy_data,options): + # + # Initialize an array as True everywhere + n_proxies = proxy_data['values_binned'].shape[0] + proxy_ind_of_seasonality = np.full((n_proxies),True,dtype=bool) + # + # Get indicies of all proxies with summer+ or winter+ seasonality + ind_seasonplus = [i for i, seasontype in enumerate(proxy_data['metadata'][:,5]) if seasontype in ['summer+','winter+']] + # + # For all records with summer+ or winter+ seasonality, check which proxies share the same + # dataSetName, archive type, and proxy type. Mark the ones I don't want as False in the proxy_ind variable. + for proxy_ind in ind_seasonplus: + # + # Find proxies with matching dataSetName, archivetype, and proxytype + ind_matching = np.where((proxy_data['metadata'][:,0] == proxy_data['metadata'][proxy_ind,0]) & + (proxy_data['archivetype'] == proxy_data['archivetype'][proxy_ind]) & + (proxy_data['proxytype'] == proxy_data['proxytype'][proxy_ind]))[0] + # + # Get indices of summer+, annual, and winter+ records + ind_summerplus = [ind for ind in ind_matching if proxy_data['metadata'][ind,5] == 'summer+'] + ind_annual = [ind for ind in ind_matching if proxy_data['metadata'][ind,5] == 'annual'] + ind_winterplus = [ind for ind in ind_matching if proxy_data['metadata'][ind,5] == 'winter+'] + # + # Of the proxies found above, select the one which best matches the requested seasonality + if (((options['assimilate_selected_seasons'] == 'jja_preferred') & (proxy_data['lats'][proxy_ind] >= 0)) or ((options['assimilate_selected_seasons'] == 'djf_preferred') & (proxy_data['lats'][proxy_ind] < 0))): + # + # Prioritize summer, then annual, then winter + if len(ind_summerplus) > 0: + proxy_ind_of_seasonality[ind_annual] = False + proxy_ind_of_seasonality[ind_winterplus] = False + elif len(ind_annual) > 0: + proxy_ind_of_seasonality[ind_winterplus] = False + # + elif (((options['assimilate_selected_seasons'] == 'djf_preferred') & (proxy_data['lats'][proxy_ind] >= 0)) or ((options['assimilate_selected_seasons'] == 'jja_preferred') & (proxy_data['lats'][proxy_ind] < 0))): + # + # Prioritize winter, then annual, then summer + if len(ind_winterplus) > 0: + proxy_ind_of_seasonality[ind_annual] = False + proxy_ind_of_seasonality[ind_summerplus] = False + elif len(ind_annual) > 0: + proxy_ind_of_seasonality[ind_summerplus] = False + # + return proxy_ind_of_seasonality + + # A function to regrid an age-month-lat-lon array to a standardized grid def regrid_model(var,lat,lon,age,regrid_method='conservative_normed',make_figures=False): # diff --git a/da_utils_lmr.py b/da_utils_lmr.py index 72bed0b..a3484b4 100644 --- a/da_utils_lmr.py +++ b/da_utils_lmr.py @@ -14,6 +14,82 @@ import numpy as np +""" +value1_2d = np.reshape(np.squeeze(cov_over_var)[:n_varslatlon], (n_lat,n_lon)) +value2_2d = np.reshape(np.squeeze(cov_over_var_with_locrad)[:n_varslatlon],(n_lat,n_lon)) +value3_2d = np.reshape(np.squeeze(kmat)[:n_varslatlon], (n_lat,n_lon)) +""" + +#Xb, obvalue, Ye, ob_err, inflate = Xb,proxy_value,proxy_modelbased_estimates,proxy_uncertainty,None +def enkf_update_array_explore(Xb, obvalue, Ye, ob_err, loc=None, inflate=None): + # Get ensemble size from passed array: Xb has dims [state vect.,ens. members] + Nens = Xb.shape[1] + + # ensemble mean background and perturbations + xbm = np.mean(Xb,axis=1) + Xbp = np.subtract(Xb,xbm[:,None]) # "None" means replicate in this dimension + + # ensemble mean and variance of the background estimate of the proxy + mye = np.mean(Ye) + varye = np.var(Ye,ddof=1) + + # lowercase ye has ensemble-mean removed + ye = np.subtract(Ye, mye) + + # innovation + try: + innov = obvalue - mye + except: + print('innovation error. obvalue = ' + str(obvalue) + ' mye = ' + str(mye)) + print('returning Xb unchanged...') + return Xb + + # innovation variance (denominator of serial Kalman gain) + kdenom = (varye + ob_err) + + # numerator of serial Kalman gain (cov(x,Hx)) + kcov = np.dot(Xbp,np.transpose(ye)) / (Nens-1) + + # Option to inflate the covariances by a certain factor + #if inflate is not None: + # kcov = inflate * kcov # This implementation is not correct. To be revised later. + cov_over_var = np.divide(kcov, varye) + + # Option to localize the gain + if loc is not None: + kcov = np.multiply(kcov,loc) + + # Kalman gain + kmat_to_save = np.divide(kcov, kdenom) + cov_over_var_with_locrad = np.divide(kcov, varye) + + # update ensemble mean + xam = xbm + np.multiply(kmat_to_save,innov) + + # update the ensemble members using the square-root approach + beta = 1./(1. + np.sqrt(ob_err/(varye+ob_err))) + kmat = np.multiply(beta,kmat_to_save) + ye = np.array(ye)[np.newaxis] + kmat = np.array(kmat)[np.newaxis] + Xap = Xbp - np.dot(kmat.T, ye) + + # full state + Xa = np.add(xam[:,None], Xap) + + # if masked array, making sure that fill_value = nan in the new array + if np.ma.isMaskedArray(Xa): np.ma.set_fill_value(Xa, np.nan) + + # Reshape and check variables + cov_over_var = np.array(cov_over_var)[np.newaxis] + cov_over_var = np.array(cov_over_var)[np.newaxis] + cov_over_var_with_locrad = np.array(cov_over_var_with_locrad)[np.newaxis] + kmat_to_save = np.array(kmat_to_save)[np.newaxis] + #print(Xa.shape,kmat.shape,cov_over_var.shape,cov_over_var_with_locrad.shape) + + # Return the full state + return Xa,kmat_to_save,cov_over_var,cov_over_var_with_locrad,varye + + def enkf_update_array(Xb, obvalue, Ye, ob_err, loc=None, inflate=None): """ Function to do the ensemble square-root filter (EnSRF) update diff --git a/process_output/make_combined_file_iters.py b/process_output/make_combined_file_iters.py new file mode 100644 index 0000000..f1e6c30 --- /dev/null +++ b/process_output/make_combined_file_iters.py @@ -0,0 +1,41 @@ +#============================================================================== +# Load the multiple iterations from the same DA experiment and combine the +# results into a single file. +# author: Michael Erb +# date : 1/15/2025 +#============================================================================== + +import sys +import numpy as np +import xarray as xr +import glob + +exp_txt = sys.argv[1] + + +#%% LOAD DATA + +recon_dir = '/projects/pd_lab/data/data_assimilation/results/' +filenames_all = glob.glob(recon_dir+'*'+exp_txt+'_iter*') +filenames_all = np.sort(filenames_all) +n_iters = len(filenames_all) +print('Processing '+exp_txt+'. files found:',n_iters) + + +#%% LOAD AND CONCATENATE DATA + +# Load the data files +data_xarray_all = [] +for i in range(n_iters): + data_xarray = xr.open_dataset(filenames_all[i]) + data_xarray_all.append(data_xarray) + +data_xarray_combined = xr.concat(data_xarray_all,dim='iters') + + +#%% SAVE OUTPUT + +iter_first_txt = filenames_all[0].split('_')[-1].split('.')[0] +iter_last_txt = filenames_all[-1].split('_')[-1].split('.')[0] +output_filename = '_'.join(filenames_all[-1].split('_')[:-2]) + '_combined_iters_'+iter_first_txt+'-'+iter_last_txt+'.nc' +data_xarray_combined.to_netcdf(output_filename) diff --git a/process_output/more/workinprogress_make_combined_file_iters.py b/process_output/more/workinprogress_make_combined_file_iters.py new file mode 100644 index 0000000..5e7fdb0 --- /dev/null +++ b/process_output/more/workinprogress_make_combined_file_iters.py @@ -0,0 +1,67 @@ +#============================================================================== +# Load the multiple iterations from the same DA experiment and combine the +# results into a single file. +# author: Michael Erb +# date : 1/14/2025 +#============================================================================== + +import sys +import numpy as np +import xarray as xr +import glob + +#exp_txt = 'locrad_none' +#exp_txt = 'locrad_10k' +exp_txt = 'locrad_5k' +#exp_txt = 'locrad_5k_plus_rankbased' +#exp_txt = 'locrad_4k' +#exp_txt = 'locrad_3k' +#exp_txt = 'locrad_1k' +#exp_txt = sys.argv[1] + +recon_dir = '/projects/pd_lab/data/data_assimilation/results/' +filenames_all = glob.glob(recon_dir+'*'+exp_txt+'_iter*') +filenames_all = np.sort(filenames_all) +n_iters = len(filenames_all) +print('Files found:',n_iters) + + +#%% LOAD AND CONCATENATE DATA + +data_xarray_all = [] + +# Load the Holocene reconstruction +for i in range(n_iters): + data_xarray = xr.open_dataset(filenames_all[i]) + data_xarray_all.append(data_xarray) + +data_xarray_combined = xr.concat(data_xarray_all,dim='iters') + +#%% LOAD THE FIRST FILE AND MAKE A FIRST DIMENTION TO HOLD ALL DATA + + + +# Load the Holocene reconstruction +data_xarray_first = xr.open_dataset(filenames_all[0]) +data_xarray_all = data_xarray_first.expand_dims(dim={'iter':10},axis=0) +var_list = list(data_xarray_first.keys()) + + +#%% CHECK THE DIMENSIONS + +for var_txt in var_list: + print(var_txt,data_xarray_first[var_txt].shape,data_xarray_all[var_txt].shape) + + +#%% ADD DATA FROM OTHER FILES + +file_num = 1 +var_txt = 'recon_tas_mean' +for file_num in range(1,n_iters): + print('Loading file',file_num) + data_xarray_new = xr.open_dataset(filenames_all[file_num]) + for var_txt in var_list: + n_dim = len(data_xarray_new[var_txt].shape) + data_xarray_all[var_txt][file_num,:,:,:].values = data_xarray_new[var_txt].values #TODO: Why isn't this working? + print(data_xarray_all[var_txt][file_num,:3,:3,:3].values) + print(data_xarray_new[var_txt][:3,:3,:3].values) diff --git a/process_output/output.txt b/process_output/output.txt new file mode 100644 index 0000000..ba323f7 --- /dev/null +++ b/process_output/output.txt @@ -0,0 +1,3 @@ +Processing locrad_5k_temp_precip. files found: 10 +/projects/pd_lab/mpe32/conda/envs/python3_new/lib/python3.9/site-packages/xarray/core/concat.py:527: FutureWarning: unique with argument that is not not a Series, Index, ExtensionArray, or np.ndarray is deprecated and will raise in a future version. + common_dims = tuple(pd.unique([d for v in vars for d in v.dims])) diff --git a/process_output/run_script.sh b/process_output/run_script.sh new file mode 100644 index 0000000..ea1c481 --- /dev/null +++ b/process_output/run_script.sh @@ -0,0 +1,16 @@ +#!/bin/bash +#SBATCH --job-name=da_process # Name of the job +#SBATCH --output=output.txt # File for output and errors +#SBATCH --time=10:00 # Maximum time for job to run +#SBATCH --mem=20000 # Memory (MB) + +# Run this with the command: sbatch run_script.sh. +#srun python -u make_combined_file_iters.py 'locrad_none' +#srun python -u make_combined_file_iters.py 'locrad_10k' +#srun python -u make_combined_file_iters.py 'locrad_5k' +#srun python -u make_combined_file_iters.py 'locrad_5k_plus_rankbased' +#srun python -u make_combined_file_iters.py 'locrad_4k' +#srun python -u make_combined_file_iters.py 'locrad_3k' +#srun python -u make_combined_file_iters.py 'locrad_1k' +srun python -u make_combined_file_iters.py 'locrad_5k_temp_precip' + diff --git a/run_da.sh b/run_da.sh index 135f103..e63761e 100644 --- a/run_da.sh +++ b/run_da.sh @@ -1,7 +1,7 @@ #!/bin/bash #SBATCH --job-name=da_Holocene # Name of the job #SBATCH --output=output_logfile.txt # File for output and errors -#SBATCH --time=6:00:00 # Maximum time for job to run +#SBATCH --time=30:00 # Maximum time for job to run #SBATCH --mem=50000 # Memory (MB) #SBATCH --cpus-per-task=1 # Number of processors