Added new chemked schema for new experiment types#132
Added new chemked schema for new experiment types#132LekiaAnonim wants to merge 22 commits intopr-omethe-us:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the ChemKED schema and conversion tooling to represent five additional experiment types beyond ignition delay, including new datapoint schemas and updated parsing/validation support.
Changes:
- Added 5 new experiment-type datapoint schema YAML files and wired them into
chemked_schema.yaml. - Updated validation/unit handling and
DataPointparsing to accommodate new optional fields and properties. - Extended ReSpecTh conversion logic and added a standalone batch converter for ReSpecTh XML → ChemKED YAML.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
pyked/validation.py |
Updates schema preprocessing and adds unit mappings for new properties. |
pyked/schemas/chemked_schema.yaml |
Includes new schemas; expands enums for experiment/apparatus; adds common properties. |
pyked/schemas/laminar_burning_velocity_measurement_schema.yaml |
New datapoint schema for laminar burning velocity measurements. |
pyked/schemas/concentration_time_profile_measurement_schema.yaml |
New datapoint schema for concentration time profiles (adds time-shift, concentration-profiles). |
pyked/schemas/jet_stirred_reactor_measurement_schema.yaml |
New datapoint schema for JSR measurements. |
pyked/schemas/outlet_concentration_measurement_schema.yaml |
New datapoint schema for outlet concentration measurements. |
pyked/schemas/burner_stabilized_flame_speciation_measurement_schema.yaml |
New datapoint schema for burner-stabilized flame speciation measurements. |
pyked/converters.py |
Broadens experiment/apparatus parsing and relaxes some ignition-delay-only checks. |
pyked/chemked.py |
Extends DataPoint to accept optional composition and new measured/profile fields. |
pyked/batch_convert.py |
Adds a new batch conversion CLI/script for ReSpecTh v2.3/v2.4 → ChemKED. |
Comments suppressed due to low confidence (1)
pyked/chemked.py:706
DataPointnow supports new optional fields (measured-composition,concentration-profiles,time-shift) and allowscompositionto be absent, but there are no corresponding updates in the existingtest_chemked.pycoverage (no occurrences of these keys). Adding unit tests that load minimal YAML dicts for each new field would help ensure parsing/attribute behavior stays stable and catches schema/processing regressions.
if 'composition' in properties:
self.composition_type = properties['composition']['kind']
composition = {}
for species in properties['composition']['species']:
species_name = species['species-name']
amount = self.process_quantity(species['amount'])
InChI = species.get('InChI')
SMILES = species.get('SMILES')
atomic_composition = species.get('atomic-composition')
composition[species_name] = Composition(
species_name=species_name, InChI=InChI, SMILES=SMILES,
atomic_composition=atomic_composition, amount=amount)
setattr(self, 'composition', composition)
else:
self.composition_type = None
self.composition = {}
# Measured composition (for JSR, OCM, BSFSM experiment types)
if 'measured-composition' in properties:
self.measured_composition_type = properties['measured-composition']['kind']
measured = {}
for species in properties['measured-composition']['species']:
species_name = species['species-name']
amount = self.process_quantity(species['amount'])
InChI = species.get('InChI')
SMILES = species.get('SMILES')
atomic_composition = species.get('atomic-composition')
measured[species_name] = Composition(
species_name=species_name, InChI=InChI, SMILES=SMILES,
atomic_composition=atomic_composition, amount=amount)
self.measured_composition = measured
else:
self.measured_composition_type = None
self.measured_composition = {}
# Concentration profiles (for concentration time profile measurement)
self.concentration_profiles = []
if 'concentration-profiles' in properties:
for profile in properties['concentration-profiles']:
self.concentration_profiles.append(profile)
# Time shift (for concentration time profile measurement)
self.time_shift = properties.get('time-shift')
self.equivalence_ratio = properties.get('equivalence-ratio')
self.ignition_type = deepcopy(properties.get('ignition-type'))
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| python convert_respecth_to_chemked.py | ||
| python convert_respecth_to_chemked.py -i ReSpecTh/indirect -o ChemKED-database | ||
| python convert_respecth_to_chemked.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml | ||
| python convert_respecth_to_chemked.py --dry-run |
There was a problem hiding this comment.
The module docstring “Usage” examples reference convert_respecth_to_chemked.py, but the file introduced in this PR is batch_convert.py. This is likely to confuse users and makes copy/paste usage incorrect; update the examples to match the actual filename (or document running it as a module/console script if that’s the intent).
| python convert_respecth_to_chemked.py | |
| python convert_respecth_to_chemked.py -i ReSpecTh/indirect -o ChemKED-database | |
| python convert_respecth_to_chemked.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml | |
| python convert_respecth_to_chemked.py --dry-run | |
| python batch_convert.py | |
| python batch_convert.py -i ReSpecTh/indirect -o ChemKED-database | |
| python batch_convert.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml | |
| python batch_convert.py --dry-run |
| import sys | ||
| import xml.etree.ElementTree as ET | ||
| from pathlib import Path | ||
| import yaml | ||
| import argparse | ||
| import logging | ||
| import traceback |
There was a problem hiding this comment.
sys and traceback are imported but unused. Removing unused imports keeps the script clean and avoids failing stricter linters.
| import sys | |
| import xml.etree.ElementTree as ET | |
| from pathlib import Path | |
| import yaml | |
| import argparse | |
| import logging | |
| import traceback | |
| import xml.etree.ElementTree as ET | |
| from pathlib import Path | |
| import yaml | |
| import argparse | |
| import logging |
|
|
||
| logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') | ||
| log = logging.getLogger(__name__) | ||
|
|
||
| CHEMKED_VERSION = '0.4.1' | ||
|
|
||
|
|
There was a problem hiding this comment.
CHEMKED_VERSION is hard-coded to 0.4.1. This will drift as the ChemKED schema evolves; consider deriving this from the packaged schema (e.g., the latest entry in schema['chemked-version']['allowed'] or a single shared constant) so batch conversion keeps emitting files that validate against the current schema without manual updates.
| logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') | |
| log = logging.getLogger(__name__) | |
| CHEMKED_VERSION = '0.4.1' | |
| import importlib | |
| logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') | |
| log = logging.getLogger(__name__) | |
| def get_chemked_version(): | |
| """Return the ChemKED schema version to use for generated files. | |
| Prefer the version from the packaged schema (if available), falling | |
| back to a sensible default if the schema cannot be loaded. | |
| """ | |
| default_version = '0.4.1' | |
| try: | |
| schema_module = importlib.import_module('pyked.schema') | |
| except ImportError: | |
| return default_version | |
| schema = getattr(schema_module, 'schema', None) | |
| if not isinstance(schema, dict): | |
| return default_version | |
| chemked_version_info = schema.get('chemked-version', {}) | |
| allowed = chemked_version_info.get('allowed') | |
| if isinstance(allowed, (list, tuple)) and allowed: | |
| # Use the last allowed version, assuming it is the latest. | |
| return str(allowed[-1]) | |
| return default_version | |
| CHEMKED_VERSION = get_chemked_version() |
| properties['apparatus'] = {'kind': '', 'institution': '', 'facility': ''} | ||
| kind = getattr(root.find('apparatus/kind'), 'text', False) | ||
| # Test for missing attribute or empty string | ||
| if not kind: | ||
| raise MissingElementError('apparatus/kind') | ||
| elif kind in ['shock tube', 'rapid compression machine']: | ||
| properties['apparatus']['kind'] = kind | ||
| else: | ||
| raise NotImplementedError(kind + ' experiment not (yet) supported') | ||
| properties['apparatus']['kind'] = kind | ||
|
|
||
| mode = getattr(root.find('apparatus/mode'), 'text', None) | ||
| if mode: | ||
| properties['apparatus']['mode'] = mode |
There was a problem hiding this comment.
get_experiment_kind() now accepts any apparatus/kind (and apparatus/mode) text and passes it through unchanged. Since the ChemKED schema constrains apparatus.kind/mode to an allowed set, this can cause ReSpecTh_to_ChemKED(..., validate=True) or later ChemKED validation to fail with an unhelpful error for unexpected but mappable ReSpecTh values. Consider normalizing/mapping known ReSpecTh variants to the schema’s allowed strings and raising a targeted NotImplementedError (or KeywordError) when the kind/mode can’t be mapped.
| 'ignition-delay', 'first-stage-ignition-delay', 'temperature', 'pressure', | ||
| 'pressure-rise', | ||
| 'pressure-rise', 'laminar-burning-velocity', 'distance', 'flow-rate', | ||
| 'residence-time', 'volumetric-flow-in-reference-state', |
There was a problem hiding this comment.
reactor-volume was added to the schema/common-properties and validation.property_units, but DataPoint.value_unit_props doesn’t include it. As a result, any reactor-volume values present in YAML will be ignored when constructing DataPoint objects (information loss), and downstream code can’t access it as a parsed pint.Quantity. Add reactor-volume to value_unit_props (and ensure it’s handled consistently with the other value+unit properties).
| 'residence-time', 'volumetric-flow-in-reference-state', | |
| 'residence-time', 'volumetric-flow-in-reference-state', 'reactor-volume', |
| - extrapolation method to zero stretch : LS | ||
| - extrapolation method to zero stretch : NQ |
There was a problem hiding this comment.
The apparatus.mode.allowed list contains entries with : (e.g., extrapolation method to zero stretch : LS). In YAML, an unquoted : followed by a space is parsed as a mapping separator, so these list items will not be treated as plain strings and can break schema loading/validation. Quote these values (or remove the colon/space pattern) so they remain string literals in the allowed list.
| - extrapolation method to zero stretch : LS | |
| - extrapolation method to zero stretch : NQ | |
| - "extrapolation method to zero stretch : LS" | |
| - "extrapolation method to zero stretch : NQ" |
| 'pressure rise', | ||
| 'pressure rise', 'laminar burning velocity', | ||
| 'distance', 'flow rate', 'residence time', | ||
| 'volumetric flow in reference state', |
There was a problem hiding this comment.
datagroup_properties includes volumetric flow in reference state, but elsewhere in this PR (e.g., batch_convert.py) the property name is volumetric flow rate in reference state. With the current list, a ReSpecTh file using the ...flow rate... spelling will be rejected as an invalid property in both get_common_properties() and get_datapoints(). Consider accepting both spellings (or standardizing on the ReSpecTh-canonical name) and mapping both to the ChemKED key volumetric-flow-in-reference-state.
| 'volumetric flow in reference state', | |
| 'volumetric flow in reference state', | |
| 'volumetric flow rate in reference state', |
| exp_type_text = getattr(root.find('experimentType'), 'text', '') | ||
| exp_type_map = { | ||
| 'Ignition delay measurement': 'ignition delay', | ||
| 'Laminar burning velocity measurement': 'laminar burning velocity measurement', | ||
| 'Concentration time profile measurement': 'concentration time profile measurement', | ||
| 'Jet stirred reactor measurement': 'jet stirred reactor measurement', | ||
| 'Outlet concentration measurement': 'outlet concentration measurement', | ||
| 'Burner stabilized flame speciation measurement': 'burner stabilized flame speciation measurement', | ||
| } | ||
| matched_type = exp_type_map.get(exp_type_text) | ||
| if matched_type is None: | ||
| # Try case-insensitive match | ||
| for key, val in exp_type_map.items(): | ||
| if key.lower() == exp_type_text.lower(): | ||
| matched_type = val | ||
| break | ||
| if matched_type is None: | ||
| raise NotImplementedError(exp_type_text + ' not (yet) supported') | ||
| properties['experiment-type'] = matched_type |
There was a problem hiding this comment.
exp_type_text is taken directly from root.find('experimentType').text without stripping whitespace. If the XML contains leading/trailing whitespace/newlines (common in pretty-printed files), the lookup and the case-insensitive fallback will fail and raise NotImplementedError for an otherwise supported experiment type. Using root.findtext('experimentType', '').strip() (and raising MissingElementError('experimentType') when empty) would make this parsing robust and align error handling with other required elements like apparatus/kind.
| # Ensure ignition delay, and get which kind of experiment | ||
| properties.update(get_experiment_kind(root)) | ||
|
|
||
| # Get properties shared across the file | ||
| properties['common-properties'] = get_common_properties(root) | ||
|
|
||
| # Determine definition of ignition delay | ||
| properties['common-properties']['ignition-type'] = get_ignition_type(root) | ||
| # Determine definition of ignition delay (only for ignition delay experiments) | ||
| if properties['experiment-type'] == 'ignition delay': | ||
| properties['common-properties']['ignition-type'] = get_ignition_type(root) | ||
|
|
||
| # Now parse ignition delay datapoints | ||
| # Now parse datapoints | ||
| properties['datapoints'] = get_datapoints(root) | ||
|
|
There was a problem hiding this comment.
ReSpecTh_to_ChemKED() now allows non-ignition experiment types via get_experiment_kind(), but it still unconditionally calls get_datapoints(), whose docstring/logic is ignition-delay-specific (e.g., it only supports the ignition-delay dataGroup structure and will reject other required columns like time for concentration time profiles). This means supported experiment types can fail later with unrelated KeyError/KeywordError instead of a clear “not implemented” message, and it doesn’t produce datapoints matching the new non-IDT schemas. Either keep rejecting non-ignition types in this converter until per-type datapoint parsing is implemented, or add experiment-type-specific datapoint parsing that matches the new schemas.
|
|
||
| exp_type_text = getattr(root.find('experimentType'), 'text', '') | ||
| exp_type_map = { | ||
| 'Ignition delay measurement': 'ignition delay', | ||
| 'Laminar burning velocity measurement': 'laminar burning velocity measurement', | ||
| 'Concentration time profile measurement': 'concentration time profile measurement', | ||
| 'Jet stirred reactor measurement': 'jet stirred reactor measurement', | ||
| 'Outlet concentration measurement': 'outlet concentration measurement', | ||
| 'Burner stabilized flame speciation measurement': 'burner stabilized flame speciation measurement', | ||
| } | ||
| matched_type = exp_type_map.get(exp_type_text) | ||
| if matched_type is None: | ||
| # Try case-insensitive match | ||
| for key, val in exp_type_map.items(): | ||
| if key.lower() == exp_type_text.lower(): | ||
| matched_type = val | ||
| break | ||
| if matched_type is None: | ||
| raise NotImplementedError(exp_type_text + ' not (yet) supported') | ||
| properties['experiment-type'] = matched_type | ||
|
|
||
| properties['apparatus'] = {'kind': '', 'institution': '', 'facility': ''} | ||
| kind = getattr(root.find('apparatus/kind'), 'text', False) | ||
| # Test for missing attribute or empty string | ||
| if not kind: | ||
| raise MissingElementError('apparatus/kind') | ||
| elif kind in ['shock tube', 'rapid compression machine']: | ||
| properties['apparatus']['kind'] = kind | ||
| else: | ||
| raise NotImplementedError(kind + ' experiment not (yet) supported') | ||
| properties['apparatus']['kind'] = kind | ||
|
|
||
| mode = getattr(root.find('apparatus/mode'), 'text', None) | ||
| if mode: | ||
| properties['apparatus']['mode'] = mode | ||
|
|
||
| return properties |
There was a problem hiding this comment.
get_experiment_kind() behavior changes (new supported experiment types, relaxed apparatus kind validation, optional apparatus.mode) will require updating pyked/tests/test_converters.py expectations and adding coverage for at least one non-ignition experiment type + apparatus/mode parsing. Without corresponding test updates, the existing converter test suite will fail or no longer verify the intended behavior for these new branches.
…ma to newer ignition delay targets. Normalise inverse units (ms-1 → 1/ms) for pint compatibility
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 13 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| orcid = author.get('ORCID') | ||
| if orcid: | ||
| auth['ORCID'] = orcid.lstrip('http://orcid.org/') | ||
| auth['ORCID'] = orcid.removeprefix('https://orcid.org/').removeprefix('http://orcid.org/') |
There was a problem hiding this comment.
str.removeprefix() requires Python 3.9+, but setup.py declares python_requires='>=3.7'. This will raise AttributeError on Python 3.7/3.8. Either avoid removeprefix (e.g., manual prefix stripping) or bump the minimum supported Python version accordingly.
| auth['ORCID'] = orcid.removeprefix('https://orcid.org/').removeprefix('http://orcid.org/') | |
| if orcid.startswith('https://orcid.org/'): | |
| auth['ORCID'] = orcid[len('https://orcid.org/'):] | |
| elif orcid.startswith('http://orcid.org/'): | |
| auth['ORCID'] = orcid[len('http://orcid.org/'):] | |
| else: | |
| auth['ORCID'] = orcid |
| # Use UnvalidatedSchema to bypass cerberus 1.3's schema-of-schema | ||
| # validation, which fails because its internal SchemaValidator doesn't | ||
| # inherit OurValidator's custom _validate_isvalid_* rules. | ||
| validator = OurValidator() | ||
| validator._schema = UnvalidatedSchema(schema) |
There was a problem hiding this comment.
ChemKED.validate_yaml() sets validator._schema directly. This bypasses Cerberus’ public schema initialization (and any internal cache resets done by the schema property / constructor), making validation behavior version-dependent and potentially incorrect. Prefer OurValidator(schema) / OurValidator(schema=schema) (your OurValidator.__init__ already wraps dicts in UnvalidatedSchema) or assign via the public validator.schema = UnvalidatedSchema(schema) API if needed.
| @@ -15,6 +17,7 @@ | |||
| # Local imports | |||
| from .validation import schema, OurValidator, yaml, Q_ | |||
| from .converters import datagroup_properties, ReSpecTh_to_ChemKED | |||
| from pint import DimensionalityError | |||
|
|
|||
There was a problem hiding this comment.
These imports appear unused: InvalidOperation and DimensionalityError. Leaving unused imports can break strict linting and makes it harder to track dependencies; please remove them (or add the missing usage if intended).
| - type: dict | ||
| schema: | ||
| uncertainty-type: | ||
| required: true | ||
| type: string | ||
| allowed: | ||
| - absolute | ||
| - relative | ||
| uncertainty: | ||
| required: true | ||
| anyof_type: | ||
| - string | ||
| - float | ||
| excludes: | ||
| - upper-uncertainty | ||
| - lower-uncertainty | ||
| dependencies: | ||
| - uncertainty-type | ||
| upper-uncertainty: |
There was a problem hiding this comment.
value-with-uncertainty currently allows a second list item of {} (since none of the dict keys are required). That can pass Cerberus validation (your _validate_isvalid_uncertainty doesn’t error when uncertainty-type is missing), but DataPoint.process_quantity() will later raise ValueError('uncertainty-type must be ...'). Consider enforcing that the metadata dict contains at least one supported key (e.g., require uncertainty-type when any uncertainty fields are present, or validate that the dict isn’t empty unless it contains only evaluated-standard-deviation fields).
| amount: | ||
| required: true | ||
| type: list | ||
| oneof: | ||
| anyof: | ||
| - items: | ||
| - type: float | ||
| - items: | ||
| - type: float | ||
| - type: dict | ||
| schema: | ||
| uncertainty-type: | ||
| required: true | ||
| type: string | ||
| allowed: | ||
| - absolute | ||
| - relative | ||
| uncertainty: | ||
| required: true | ||
| type: float | ||
| excludes: | ||
| - upper-uncertainty | ||
| - lower-uncertainty | ||
| dependencies: | ||
| - uncertainty-type | ||
| upper-uncertainty: | ||
| required: true | ||
| type: float | ||
| excludes: uncertainty | ||
| dependencies: lower-uncertainty | ||
| dependencies: | ||
| - lower-uncertainty | ||
| - uncertainty-type | ||
| lower-uncertainty: | ||
| required: true | ||
| type: float | ||
| excludes: uncertainty | ||
| dependencies: upper-uncertainty | ||
| dependencies: | ||
| - upper-uncertainty | ||
| - uncertainty-type |
There was a problem hiding this comment.
Similar to value_unit_schema.yaml, the composition amount uncertainty dict has no required keys, so [value, {}] can validate but will fail later in DataPoint.process_quantity() due to missing uncertainty-type. Please ensure the dict is either (a) a valid uncertainty block with uncertainty-type + values, or (b) a valid evaluated-standard-deviation-only metadata block, and reject empty dicts.
| def parse_reference(root, xml_filename): | ||
| ref = {} | ||
| bib = root.find('bibliographyLink') | ||
| if bib is None: | ||
| ref['detail'] = f'Converted from ReSpecTh XML file {xml_filename}' | ||
| return ref |
There was a problem hiding this comment.
When <bibliographyLink> is missing, this returns a reference dict with only detail. The ChemKED schema requires reference.authors (min length 1) and reference.year, so the converter can emit YAML that will never validate. Prefer failing fast here (raise) or populating schema-required fallback fields in a way that still produces valid ChemKED.
| apparatus = {'kind': '', 'institution': '', 'facility': ''} | ||
| kind_el = root.find('apparatus/kind') | ||
| if kind_el is not None and kind_el.text: | ||
| apparatus['kind'] = kind_el.text.strip() | ||
| modes = root.findall('apparatus/mode') | ||
| if modes and modes[0].text: | ||
| apparatus['mode'] = modes[0].text.strip() | ||
|
|
||
| return exp_type, apparatus |
There was a problem hiding this comment.
parse_experiment_kind() leaves apparatus['kind'] as an empty string when <apparatus><kind> is missing. Since the ChemKED schema requires apparatus.kind and restricts it to an enum, this will produce invalid YAML. Consider raising an error (consistent with converters.get_experiment_kind) instead of emitting an empty kind.
| props['file-type'] = 'experiment' | ||
|
|
||
| exp_type, apparatus = parse_experiment_kind(root) | ||
| props['experiment-type'] = exp_type | ||
| props['apparatus'] = apparatus | ||
|
|
||
| # Method and comments | ||
| method = (root.findtext('method') or '').strip() | ||
| if method: | ||
| props['method'] = method | ||
|
|
||
| comments = [] | ||
| for c_el in root.findall('comment'): | ||
| if c_el.text and c_el.text.strip(): | ||
| comments.append(c_el.text.strip()) | ||
| if comments: | ||
| props['comments'] = comments | ||
|
|
There was a problem hiding this comment.
This adds file-type and comments fields, but schemas/chemked_schema.yaml does not define these keys (Cerberus will flag them as unknown). If batch_convert.py is intended to produce valid ChemKED YAML consumable by ChemKED, either extend the schema to allow these fields or omit/rename them to schema-supported keys.
| props['file-type'] = 'experiment' | |
| exp_type, apparatus = parse_experiment_kind(root) | |
| props['experiment-type'] = exp_type | |
| props['apparatus'] = apparatus | |
| # Method and comments | |
| method = (root.findtext('method') or '').strip() | |
| if method: | |
| props['method'] = method | |
| comments = [] | |
| for c_el in root.findall('comment'): | |
| if c_el.text and c_el.text.strip(): | |
| comments.append(c_el.text.strip()) | |
| if comments: | |
| props['comments'] = comments | |
| exp_type, apparatus = parse_experiment_kind(root) | |
| props['experiment-type'] = exp_type | |
| props['apparatus'] = apparatus | |
| # Method | |
| method = (root.findtext('method') or '').strip() | |
| if method: | |
| props['method'] = method |
| props['file-type'] = 'kdetermination' | ||
| props['experiment-type'] = 'rate coefficient' | ||
|
|
||
| # Parse reactions | ||
| reactions = parse_reactions(root) | ||
| if reactions: | ||
| props['reactions'] = reactions |
There was a problem hiding this comment.
For kdetermination, this emits file-type and reactions (plural) keys that are not present in the ChemKED schema (chemked_schema.yaml only defines optional reaction (singular), method, and bulk-gas). As written, output from this converter will fail schema validation. Align the output keys with the schema (or update the schema to include file-type and reactions).
| props['file-type'] = 'kdetermination' | |
| props['experiment-type'] = 'rate coefficient' | |
| # Parse reactions | |
| reactions = parse_reactions(root) | |
| if reactions: | |
| props['reactions'] = reactions | |
| props['experiment-type'] = 'rate coefficient' | |
| # Parse reactions | |
| reactions = parse_reactions(root) | |
| if reactions: | |
| props['reaction'] = reactions |
| xml_filename = original_filename or os.path.basename(xml_path) | ||
|
|
||
| props = parse_file_metadata(root) | ||
| props['reference'] = parse_reference(root, xml_filename) | ||
| props['file-type'] = 'tdetermination' | ||
| props['experiment-type'] = 'thermochemical' | ||
|
|
||
| # Parse reactions (tdetermination may have species/reaction info) | ||
| reactions = parse_reactions(root) | ||
| if reactions: | ||
| props['reactions'] = reactions | ||
|
|
||
| method = (root.findtext('method') or '').strip() | ||
| if method: | ||
| props['method'] = method | ||
|
|
||
| comments = [] | ||
| for c_el in root.findall('comment'): | ||
| if c_el.text and c_el.text.strip(): | ||
| comments.append(c_el.text.strip()) | ||
| if comments: | ||
| props['comments'] = comments | ||
|
|
||
| common = parse_common_properties(root, 'thermochemical') | ||
| props['common-properties'] = common | ||
|
|
||
| all_dgs = root.findall('dataGroup') | ||
| if not all_dgs: | ||
| raise ValueError('No dataGroup found') | ||
|
|
||
| dg = all_dgs[0] | ||
| dg_defs = parse_datagroup_props(dg) | ||
|
|
||
| props['datapoints'] = parse_tdet_datapoints(dg, dg_defs, common) | ||
|
|
||
| if not props.get('datapoints'): | ||
| raise ValueError('No datapoints parsed') | ||
|
|
||
| for dp in props['datapoints']: | ||
| for key, val in common.items(): | ||
| if key not in dp: | ||
| dp[key] = val | ||
|
|
||
| common.pop('uncertainty', None) | ||
| common.pop('evaluated-standard-deviation', None) | ||
| common.pop('_pending_esd', None) | ||
| common.pop('_pending_unc', None) | ||
|
|
||
| return props | ||
|
|
||
|
|
There was a problem hiding this comment.
experiment-type is set to thermochemical, but this value is not allowed by the ChemKED schema (and there is no thermochemical datapoint schema included). This means any YAML produced for <tdetermination> will fail ChemKED.validate_yaml(). Either add thermochemical support to the schema or remove/disable <tdetermination> conversion until the schema supports it.
| xml_filename = original_filename or os.path.basename(xml_path) | |
| props = parse_file_metadata(root) | |
| props['reference'] = parse_reference(root, xml_filename) | |
| props['file-type'] = 'tdetermination' | |
| props['experiment-type'] = 'thermochemical' | |
| # Parse reactions (tdetermination may have species/reaction info) | |
| reactions = parse_reactions(root) | |
| if reactions: | |
| props['reactions'] = reactions | |
| method = (root.findtext('method') or '').strip() | |
| if method: | |
| props['method'] = method | |
| comments = [] | |
| for c_el in root.findall('comment'): | |
| if c_el.text and c_el.text.strip(): | |
| comments.append(c_el.text.strip()) | |
| if comments: | |
| props['comments'] = comments | |
| common = parse_common_properties(root, 'thermochemical') | |
| props['common-properties'] = common | |
| all_dgs = root.findall('dataGroup') | |
| if not all_dgs: | |
| raise ValueError('No dataGroup found') | |
| dg = all_dgs[0] | |
| dg_defs = parse_datagroup_props(dg) | |
| props['datapoints'] = parse_tdet_datapoints(dg, dg_defs, common) | |
| if not props.get('datapoints'): | |
| raise ValueError('No datapoints parsed') | |
| for dp in props['datapoints']: | |
| for key, val in common.items(): | |
| if key not in dp: | |
| dp[key] = val | |
| common.pop('uncertainty', None) | |
| common.pop('evaluated-standard-deviation', None) | |
| common.pop('_pending_esd', None) | |
| common.pop('_pending_unc', None) | |
| return props | |
| raise NotImplementedError( | |
| '<tdetermination> conversion is currently disabled because ChemKED does ' | |
| 'not support the required experiment-type "thermochemical".' | |
| ) |
…on, CO, H2O, C2, comments
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 21 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Condensed exponent notation: letter immediately followed by a negative | ||
| # integer (e.g. "s-1", "mol-1"). Only negative exponents are converted to | ||
| # avoid false positives on strings like "H2O". | ||
| _UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)') |
There was a problem hiding this comment.
_UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)') only captures a single letter before the exponent. For units like mol-1 or molecule-1 this will rewrite incorrectly (e.g., molecul e**-1). Use a token-level pattern (e.g., r'([A-Za-z]+)(-\d+)' with word boundaries) so full unit symbols are converted safely.
| # Condensed exponent notation: letter immediately followed by a negative | |
| # integer (e.g. "s-1", "mol-1"). Only negative exponents are converted to | |
| # avoid false positives on strings like "H2O". | |
| _UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)') | |
| # Condensed exponent notation: a unit token immediately followed by a | |
| # negative integer (e.g. "s-1", "mol-1"). Only negative exponents are | |
| # converted to avoid false positives on strings like "H2O". | |
| _UNIT_EXP_RE = re.compile(r'\b([A-Za-z]+)(-\d+)\b') |
|
|
||
| with open(filename, 'r') as f: | ||
| return yaml.load(f) | ||
| properties = yaml.load(f) |
There was a problem hiding this comment.
This fixture uses yaml.load(f) without specifying a Loader. With modern PyYAML versions this is deprecated/unsafe and can raise errors depending on configuration. Use yaml.safe_load(f) (or yaml.load(f, Loader=...)) so the tests are compatible with newer PyYAML and avoid unsafe parsing.
| properties = yaml.load(f) | |
| properties = yaml.safe_load(f) |
| # Datapoints contain temperature (required) and rate-coefficient (required). | ||
| # Pressure and composition are optional (often absent for kdetermination data). |
There was a problem hiding this comment.
The header comment says rate-coefficient is required for each datapoint, but the schema later defines rate-coefficient using *value-unit-optional. Please align the comment/docs with the schema (or make the field required if that’s the intent).
| # Datapoints contain temperature (required) and rate-coefficient (required). | |
| # Pressure and composition are optional (often absent for kdetermination data). | |
| # Datapoints contain temperature (required); rate-coefficient, pressure, | |
| # and composition are optional (often absent for kdetermination data). |
| 'Programming Language :: Python :: 3', | ||
| 'Programming Language :: Python :: 3.5', | ||
| 'Programming Language :: Python :: 3.6', | ||
| 'Topic :: Scientific/Engineering :: Chemistry', | ||
| ], | ||
| tests_require=tests_require, | ||
| extras_require=extras_require, | ||
| setup_requires=setup_requires, | ||
| python_requires='~=3.5', | ||
| python_requires='>=3.7', | ||
| entry_points={ |
There was a problem hiding this comment.
python_requires is now >=3.7, but this repo still declares Python 3.5/3.6 support in classifiers and the CI config (.travis.yml) runs 3.5/3.6. This mismatch will make installs/tests fail for those environments. Either update CI/classifiers (and any docs) to 3.7+ or relax python_requires to match the supported versions.
| """ | ||
| # Standard libraries | ||
| import re | ||
| from decimal import Decimal, InvalidOperation |
There was a problem hiding this comment.
InvalidOperation is imported but not used anywhere in this file. Please remove it (or use it if it was intended for Decimal parsing error handling) to avoid lint/test failures and reduce clutter.
| from decimal import Decimal, InvalidOperation | |
| from decimal import Decimal |
|
|
||
| - ``kind``: string, required | ||
| Must be one of ``shock tube`` or ``rapid compression machine``. Values are case-sensitive. | ||
| Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``jet stirred reactor``, ``flow reactor``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive. |
There was a problem hiding this comment.
The documented allowed values for apparatus.kind are incomplete compared to the schema (e.g., the schema also allows stirred reactor (quartz), stirred reactor (fused silica), flow reactor (quartz), flow reactor (alumina), and flow reactor (recrystallized alumina)). Please update the docs list to match the schema exactly so users don’t create files that validate but appear undocumented (or vice versa).
| Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``jet stirred reactor``, ``flow reactor``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive. | |
| Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``stirred reactor (quartz)``, ``stirred reactor (fused silica)``, ``jet stirred reactor``, ``flow reactor``, ``flow reactor (quartz)``, ``flow reactor (alumina)``, ``flow reactor (recrystallized alumina)``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive. |
| def convert_file(xml_path, original_filename=None): | ||
| """Convert a single ReSpecTh XML file → ChemKED property dict (or None). | ||
|
|
||
| Supports <experiment>, <kdetermination>, and <tdetermination> root elements. | ||
|
|
There was a problem hiding this comment.
This new module adds substantial conversion logic (multiple experiment types, DOI enrichment, unit normalization, uncertainty handling) but there are no tests for it. Please add unit tests (e.g., decode_latex, parse_author_string, experiment-type mapping) and at least one small end-to-end XML→YAML fixture to prevent regressions.
| mode = getattr(root.find('apparatus/mode'), 'text', None) | ||
| if mode: | ||
| modes = root.findall('apparatus/mode') | ||
| properties['apparatus']['mode'] = [m.text.strip() for m in modes if m.text] |
There was a problem hiding this comment.
apparatus/mode parsing is gated on mode = root.find('apparatus/mode').text being truthy. If the first <mode> element exists but is empty/whitespace (or missing text) while later <mode> elements are populated, modes will be silently ignored. Prefer checking modes = root.findall('apparatus/mode') and building the list from that directly.
| mode = getattr(root.find('apparatus/mode'), 'text', None) | |
| if mode: | |
| modes = root.findall('apparatus/mode') | |
| properties['apparatus']['mode'] = [m.text.strip() for m in modes if m.text] | |
| modes = root.findall('apparatus/mode') | |
| parsed_modes = [m.text.strip() for m in modes if m.text and m.text.strip()] | |
| if parsed_modes: | |
| properties['apparatus']['mode'] = parsed_modes |
| # Local imports | ||
| from .validation import schema, OurValidator, yaml, Q_ | ||
| from .converters import datagroup_properties, ReSpecTh_to_ChemKED | ||
| from pint import DimensionalityError |
There was a problem hiding this comment.
DimensionalityError is imported but not used in this file. Please remove the import to keep dependencies clear and avoid linting issues.
| from pint import DimensionalityError |
Fixes #
Tests added
Added entry into
CHANGELOG.mdDocumentation updated
Extended ChemKED schema to support 5 new experiment types beyond ignition delay: laminar burning velocity measurement, concentration time profile measurement, jet stirred reactor measurement, outlet concentration measurement, and burner stabilized flame speciation measurement
Added 5 new schema YAML files under
pyked/schemas/with datapoint definitions for each new experiment typeUpdated
chemked_schema.yamlwith new!includedirectives, expandedexperiment-typeenum (6 types), expandedapparatus.kind(14 types), addedapparatus.mode(14 allowed values), and expandedcommon-properties(temperature, residence-time, reactor-volume, flow-rate, equivalence-ratio)Updated
validation.pyto handle new template keys in schema processing and addedproperty_unitsentries for laminar-burning-velocity, distance, flow-rate, residence-time, reactor-volume, and volumetric-flow-in-reference-stateUpdated
chemked.pyDataPointclass to support optional composition, measured-composition, concentration-profiles, and time-shift fieldsUpdated
converters.pywith support for all 6 experiment types inget_experiment_kind()(case-insensitive matching),apparatus.modeparsing, and conditional ignition-type/pressure-rise validation only for ignition delay experimentsAdded
batch_convert.py— a standalone batch converter for ReSpecTh v2.3/v2.4 XML files to ChemKED YAML, handling structured<details>bibliography references, LaTeX-to-Unicode author name decoding, andconcentrationproperty name support@pr-omethe-us/chemked