Skip to content

Added new chemked schema for new experiment types#132

Open
LekiaAnonim wants to merge 22 commits intopr-omethe-us:mainfrom
LekiaAnonim:Prosper_WIP
Open

Added new chemked schema for new experiment types#132
LekiaAnonim wants to merge 22 commits intopr-omethe-us:mainfrom
LekiaAnonim:Prosper_WIP

Conversation

@LekiaAnonim
Copy link
Copy Markdown

@LekiaAnonim LekiaAnonim commented Mar 28, 2026

  • Fixes #

  • Tests added

  • Added entry into CHANGELOG.md

  • Documentation updated

  • Extended ChemKED schema to support 5 new experiment types beyond ignition delay: laminar burning velocity measurement, concentration time profile measurement, jet stirred reactor measurement, outlet concentration measurement, and burner stabilized flame speciation measurement

  • Added 5 new schema YAML files under pyked/schemas/ with datapoint definitions for each new experiment type

  • Updated chemked_schema.yaml with new !include directives, expanded experiment-type enum (6 types), expanded apparatus.kind (14 types), added apparatus.mode (14 allowed values), and expanded common-properties (temperature, residence-time, reactor-volume, flow-rate, equivalence-ratio)

  • Updated validation.py to handle new template keys in schema processing and added property_units entries for laminar-burning-velocity, distance, flow-rate, residence-time, reactor-volume, and volumetric-flow-in-reference-state

  • Updated chemked.py DataPoint class to support optional composition, measured-composition, concentration-profiles, and time-shift fields

  • Updated converters.py with support for all 6 experiment types in get_experiment_kind() (case-insensitive matching), apparatus.mode parsing, and conditional ignition-type/pressure-rise validation only for ignition delay experiments

  • Added batch_convert.py — a standalone batch converter for ReSpecTh v2.3/v2.4 XML files to ChemKED YAML, handling structured <details> bibliography references, LaTeX-to-Unicode author name decoding, and concentration property name support

@pr-omethe-us/chemked

Copilot AI review requested due to automatic review settings March 28, 2026 01:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the ChemKED schema and conversion tooling to represent five additional experiment types beyond ignition delay, including new datapoint schemas and updated parsing/validation support.

Changes:

  • Added 5 new experiment-type datapoint schema YAML files and wired them into chemked_schema.yaml.
  • Updated validation/unit handling and DataPoint parsing to accommodate new optional fields and properties.
  • Extended ReSpecTh conversion logic and added a standalone batch converter for ReSpecTh XML → ChemKED YAML.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
pyked/validation.py Updates schema preprocessing and adds unit mappings for new properties.
pyked/schemas/chemked_schema.yaml Includes new schemas; expands enums for experiment/apparatus; adds common properties.
pyked/schemas/laminar_burning_velocity_measurement_schema.yaml New datapoint schema for laminar burning velocity measurements.
pyked/schemas/concentration_time_profile_measurement_schema.yaml New datapoint schema for concentration time profiles (adds time-shift, concentration-profiles).
pyked/schemas/jet_stirred_reactor_measurement_schema.yaml New datapoint schema for JSR measurements.
pyked/schemas/outlet_concentration_measurement_schema.yaml New datapoint schema for outlet concentration measurements.
pyked/schemas/burner_stabilized_flame_speciation_measurement_schema.yaml New datapoint schema for burner-stabilized flame speciation measurements.
pyked/converters.py Broadens experiment/apparatus parsing and relaxes some ignition-delay-only checks.
pyked/chemked.py Extends DataPoint to accept optional composition and new measured/profile fields.
pyked/batch_convert.py Adds a new batch conversion CLI/script for ReSpecTh v2.3/v2.4 → ChemKED.
Comments suppressed due to low confidence (1)

pyked/chemked.py:706

  • DataPoint now supports new optional fields (measured-composition, concentration-profiles, time-shift) and allows composition to be absent, but there are no corresponding updates in the existing test_chemked.py coverage (no occurrences of these keys). Adding unit tests that load minimal YAML dicts for each new field would help ensure parsing/attribute behavior stays stable and catches schema/processing regressions.
        if 'composition' in properties:
            self.composition_type = properties['composition']['kind']
            composition = {}
            for species in properties['composition']['species']:
                species_name = species['species-name']
                amount = self.process_quantity(species['amount'])
                InChI = species.get('InChI')
                SMILES = species.get('SMILES')
                atomic_composition = species.get('atomic-composition')
                composition[species_name] = Composition(
                    species_name=species_name, InChI=InChI, SMILES=SMILES,
                    atomic_composition=atomic_composition, amount=amount)
            setattr(self, 'composition', composition)
        else:
            self.composition_type = None
            self.composition = {}

        # Measured composition (for JSR, OCM, BSFSM experiment types)
        if 'measured-composition' in properties:
            self.measured_composition_type = properties['measured-composition']['kind']
            measured = {}
            for species in properties['measured-composition']['species']:
                species_name = species['species-name']
                amount = self.process_quantity(species['amount'])
                InChI = species.get('InChI')
                SMILES = species.get('SMILES')
                atomic_composition = species.get('atomic-composition')
                measured[species_name] = Composition(
                    species_name=species_name, InChI=InChI, SMILES=SMILES,
                    atomic_composition=atomic_composition, amount=amount)
            self.measured_composition = measured
        else:
            self.measured_composition_type = None
            self.measured_composition = {}

        # Concentration profiles (for concentration time profile measurement)
        self.concentration_profiles = []
        if 'concentration-profiles' in properties:
            for profile in properties['concentration-profiles']:
                self.concentration_profiles.append(profile)

        # Time shift (for concentration time profile measurement)
        self.time_shift = properties.get('time-shift')

        self.equivalence_ratio = properties.get('equivalence-ratio')
        self.ignition_type = deepcopy(properties.get('ignition-type'))


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyked/batch_convert.py Outdated
Comment on lines +8 to +11
python convert_respecth_to_chemked.py
python convert_respecth_to_chemked.py -i ReSpecTh/indirect -o ChemKED-database
python convert_respecth_to_chemked.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml
python convert_respecth_to_chemked.py --dry-run
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring “Usage” examples reference convert_respecth_to_chemked.py, but the file introduced in this PR is batch_convert.py. This is likely to confuse users and makes copy/paste usage incorrect; update the examples to match the actual filename (or document running it as a module/console script if that’s the intent).

Suggested change
python convert_respecth_to_chemked.py
python convert_respecth_to_chemked.py -i ReSpecTh/indirect -o ChemKED-database
python convert_respecth_to_chemked.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml
python convert_respecth_to_chemked.py --dry-run
python batch_convert.py
python batch_convert.py -i ReSpecTh/indirect -o ChemKED-database
python batch_convert.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml
python batch_convert.py --dry-run

Copilot uses AI. Check for mistakes.
Comment thread pyked/batch_convert.py Outdated
Comment on lines +15 to +21
import sys
import xml.etree.ElementTree as ET
from pathlib import Path
import yaml
import argparse
import logging
import traceback
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys and traceback are imported but unused. Removing unused imports keeps the script clean and avoids failing stricter linters.

Suggested change
import sys
import xml.etree.ElementTree as ET
from pathlib import Path
import yaml
import argparse
import logging
import traceback
import xml.etree.ElementTree as ET
from pathlib import Path
import yaml
import argparse
import logging

Copilot uses AI. Check for mistakes.
Comment thread pyked/batch_convert.py
Comment on lines +22 to +28

logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
log = logging.getLogger(__name__)

CHEMKED_VERSION = '0.4.1'


Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHEMKED_VERSION is hard-coded to 0.4.1. This will drift as the ChemKED schema evolves; consider deriving this from the packaged schema (e.g., the latest entry in schema['chemked-version']['allowed'] or a single shared constant) so batch conversion keeps emitting files that validate against the current schema without manual updates.

Suggested change
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
log = logging.getLogger(__name__)
CHEMKED_VERSION = '0.4.1'
import importlib
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
log = logging.getLogger(__name__)
def get_chemked_version():
"""Return the ChemKED schema version to use for generated files.
Prefer the version from the packaged schema (if available), falling
back to a sensible default if the schema cannot be loaded.
"""
default_version = '0.4.1'
try:
schema_module = importlib.import_module('pyked.schema')
except ImportError:
return default_version
schema = getattr(schema_module, 'schema', None)
if not isinstance(schema, dict):
return default_version
chemked_version_info = schema.get('chemked-version', {})
allowed = chemked_version_info.get('allowed')
if isinstance(allowed, (list, tuple)) and allowed:
# Use the last allowed version, assuming it is the latest.
return str(allowed[-1])
return default_version
CHEMKED_VERSION = get_chemked_version()

Copilot uses AI. Check for mistakes.
Comment thread pyked/converters.py Outdated
Comment on lines +185 to +195
properties['apparatus'] = {'kind': '', 'institution': '', 'facility': ''}
kind = getattr(root.find('apparatus/kind'), 'text', False)
# Test for missing attribute or empty string
if not kind:
raise MissingElementError('apparatus/kind')
elif kind in ['shock tube', 'rapid compression machine']:
properties['apparatus']['kind'] = kind
else:
raise NotImplementedError(kind + ' experiment not (yet) supported')
properties['apparatus']['kind'] = kind

mode = getattr(root.find('apparatus/mode'), 'text', None)
if mode:
properties['apparatus']['mode'] = mode
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_experiment_kind() now accepts any apparatus/kind (and apparatus/mode) text and passes it through unchanged. Since the ChemKED schema constrains apparatus.kind/mode to an allowed set, this can cause ReSpecTh_to_ChemKED(..., validate=True) or later ChemKED validation to fail with an unhelpful error for unexpected but mappable ReSpecTh values. Consider normalizing/mapping known ReSpecTh variants to the schema’s allowed strings and raising a targeted NotImplementedError (or KeywordError) when the kind/mode can’t be mapped.

Copilot uses AI. Check for mistakes.
Comment thread pyked/chemked.py Outdated
'ignition-delay', 'first-stage-ignition-delay', 'temperature', 'pressure',
'pressure-rise',
'pressure-rise', 'laminar-burning-velocity', 'distance', 'flow-rate',
'residence-time', 'volumetric-flow-in-reference-state',
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reactor-volume was added to the schema/common-properties and validation.property_units, but DataPoint.value_unit_props doesn’t include it. As a result, any reactor-volume values present in YAML will be ignored when constructing DataPoint objects (information loss), and downstream code can’t access it as a parsed pint.Quantity. Add reactor-volume to value_unit_props (and ensure it’s handled consistently with the other value+unit properties).

Suggested change
'residence-time', 'volumetric-flow-in-reference-state',
'residence-time', 'volumetric-flow-in-reference-state', 'reactor-volume',

Copilot uses AI. Check for mistakes.
Comment thread pyked/schemas/chemked_schema.yaml Outdated
Comment on lines +77 to +78
- extrapolation method to zero stretch : LS
- extrapolation method to zero stretch : NQ
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The apparatus.mode.allowed list contains entries with : (e.g., extrapolation method to zero stretch : LS). In YAML, an unquoted : followed by a space is parsed as a mapping separator, so these list items will not be treated as plain strings and can break schema loading/validation. Quote these values (or remove the colon/space pattern) so they remain string literals in the allowed list.

Suggested change
- extrapolation method to zero stretch : LS
- extrapolation method to zero stretch : NQ
- "extrapolation method to zero stretch : LS"
- "extrapolation method to zero stretch : NQ"

Copilot uses AI. Check for mistakes.
Comment thread pyked/converters.py Outdated
'pressure rise',
'pressure rise', 'laminar burning velocity',
'distance', 'flow rate', 'residence time',
'volumetric flow in reference state',
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datagroup_properties includes volumetric flow in reference state, but elsewhere in this PR (e.g., batch_convert.py) the property name is volumetric flow rate in reference state. With the current list, a ReSpecTh file using the ...flow rate... spelling will be rejected as an invalid property in both get_common_properties() and get_datapoints(). Consider accepting both spellings (or standardizing on the ReSpecTh-canonical name) and mapping both to the ChemKED key volumetric-flow-in-reference-state.

Suggested change
'volumetric flow in reference state',
'volumetric flow in reference state',
'volumetric flow rate in reference state',

Copilot uses AI. Check for mistakes.
Comment thread pyked/converters.py Outdated
Comment on lines +165 to +183
exp_type_text = getattr(root.find('experimentType'), 'text', '')
exp_type_map = {
'Ignition delay measurement': 'ignition delay',
'Laminar burning velocity measurement': 'laminar burning velocity measurement',
'Concentration time profile measurement': 'concentration time profile measurement',
'Jet stirred reactor measurement': 'jet stirred reactor measurement',
'Outlet concentration measurement': 'outlet concentration measurement',
'Burner stabilized flame speciation measurement': 'burner stabilized flame speciation measurement',
}
matched_type = exp_type_map.get(exp_type_text)
if matched_type is None:
# Try case-insensitive match
for key, val in exp_type_map.items():
if key.lower() == exp_type_text.lower():
matched_type = val
break
if matched_type is None:
raise NotImplementedError(exp_type_text + ' not (yet) supported')
properties['experiment-type'] = matched_type
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exp_type_text is taken directly from root.find('experimentType').text without stripping whitespace. If the XML contains leading/trailing whitespace/newlines (common in pretty-printed files), the lookup and the case-insensitive fallback will fail and raise NotImplementedError for an otherwise supported experiment type. Using root.findtext('experimentType', '').strip() (and raising MissingElementError('experimentType') when empty) would make this parsing robust and align error handling with other required elements like apparatus/kind.

Copilot uses AI. Check for mistakes.
Comment thread pyked/converters.py
Comment on lines 520 to 532
# Ensure ignition delay, and get which kind of experiment
properties.update(get_experiment_kind(root))

# Get properties shared across the file
properties['common-properties'] = get_common_properties(root)

# Determine definition of ignition delay
properties['common-properties']['ignition-type'] = get_ignition_type(root)
# Determine definition of ignition delay (only for ignition delay experiments)
if properties['experiment-type'] == 'ignition delay':
properties['common-properties']['ignition-type'] = get_ignition_type(root)

# Now parse ignition delay datapoints
# Now parse datapoints
properties['datapoints'] = get_datapoints(root)

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReSpecTh_to_ChemKED() now allows non-ignition experiment types via get_experiment_kind(), but it still unconditionally calls get_datapoints(), whose docstring/logic is ignition-delay-specific (e.g., it only supports the ignition-delay dataGroup structure and will reject other required columns like time for concentration time profiles). This means supported experiment types can fail later with unrelated KeyError/KeywordError instead of a clear “not implemented” message, and it doesn’t produce datapoints matching the new non-IDT schemas. Either keep rejecting non-ignition types in this converter until per-type datapoint parsing is implemented, or add experiment-type-specific datapoint parsing that matches the new schemas.

Copilot uses AI. Check for mistakes.
Comment thread pyked/converters.py
Comment on lines +164 to 197

exp_type_text = getattr(root.find('experimentType'), 'text', '')
exp_type_map = {
'Ignition delay measurement': 'ignition delay',
'Laminar burning velocity measurement': 'laminar burning velocity measurement',
'Concentration time profile measurement': 'concentration time profile measurement',
'Jet stirred reactor measurement': 'jet stirred reactor measurement',
'Outlet concentration measurement': 'outlet concentration measurement',
'Burner stabilized flame speciation measurement': 'burner stabilized flame speciation measurement',
}
matched_type = exp_type_map.get(exp_type_text)
if matched_type is None:
# Try case-insensitive match
for key, val in exp_type_map.items():
if key.lower() == exp_type_text.lower():
matched_type = val
break
if matched_type is None:
raise NotImplementedError(exp_type_text + ' not (yet) supported')
properties['experiment-type'] = matched_type

properties['apparatus'] = {'kind': '', 'institution': '', 'facility': ''}
kind = getattr(root.find('apparatus/kind'), 'text', False)
# Test for missing attribute or empty string
if not kind:
raise MissingElementError('apparatus/kind')
elif kind in ['shock tube', 'rapid compression machine']:
properties['apparatus']['kind'] = kind
else:
raise NotImplementedError(kind + ' experiment not (yet) supported')
properties['apparatus']['kind'] = kind

mode = getattr(root.find('apparatus/mode'), 'text', None)
if mode:
properties['apparatus']['mode'] = mode

return properties
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_experiment_kind() behavior changes (new supported experiment types, relaxed apparatus kind validation, optional apparatus.mode) will require updating pyked/tests/test_converters.py expectations and adding coverage for at least one non-ignition experiment type + apparatus/mode parsing. Without corresponding test updates, the existing converter test suite will fail or no longer verify the intended behavior for these new branches.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyked/converters.py
orcid = author.get('ORCID')
if orcid:
auth['ORCID'] = orcid.lstrip('http://orcid.org/')
auth['ORCID'] = orcid.removeprefix('https://orcid.org/').removeprefix('http://orcid.org/')
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str.removeprefix() requires Python 3.9+, but setup.py declares python_requires='>=3.7'. This will raise AttributeError on Python 3.7/3.8. Either avoid removeprefix (e.g., manual prefix stripping) or bump the minimum supported Python version accordingly.

Suggested change
auth['ORCID'] = orcid.removeprefix('https://orcid.org/').removeprefix('http://orcid.org/')
if orcid.startswith('https://orcid.org/'):
auth['ORCID'] = orcid[len('https://orcid.org/'):]
elif orcid.startswith('http://orcid.org/'):
auth['ORCID'] = orcid[len('http://orcid.org/'):]
else:
auth['ORCID'] = orcid

Copilot uses AI. Check for mistakes.
Comment thread pyked/chemked.py
Comment on lines +190 to +194
# Use UnvalidatedSchema to bypass cerberus 1.3's schema-of-schema
# validation, which fails because its internal SchemaValidator doesn't
# inherit OurValidator's custom _validate_isvalid_* rules.
validator = OurValidator()
validator._schema = UnvalidatedSchema(schema)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChemKED.validate_yaml() sets validator._schema directly. This bypasses Cerberus’ public schema initialization (and any internal cache resets done by the schema property / constructor), making validation behavior version-dependent and potentially incorrect. Prefer OurValidator(schema) / OurValidator(schema=schema) (your OurValidator.__init__ already wraps dicts in UnvalidatedSchema) or assign via the public validator.schema = UnvalidatedSchema(schema) API if needed.

Copilot uses AI. Check for mistakes.
Comment thread pyked/chemked.py
Comment on lines 5 to 21
@@ -15,6 +17,7 @@
# Local imports
from .validation import schema, OurValidator, yaml, Q_
from .converters import datagroup_properties, ReSpecTh_to_ChemKED
from pint import DimensionalityError

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These imports appear unused: InvalidOperation and DimensionalityError. Leaving unused imports can break strict linting and makes it harder to track dependencies; please remove them (or add the missing usage if intended).

Copilot uses AI. Check for mistakes.
Comment on lines 8 to 24
- type: dict
schema:
uncertainty-type:
required: true
type: string
allowed:
- absolute
- relative
uncertainty:
required: true
anyof_type:
- string
- float
excludes:
- upper-uncertainty
- lower-uncertainty
dependencies:
- uncertainty-type
upper-uncertainty:
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value-with-uncertainty currently allows a second list item of {} (since none of the dict keys are required). That can pass Cerberus validation (your _validate_isvalid_uncertainty doesn’t error when uncertainty-type is missing), but DataPoint.process_quantity() will later raise ValueError('uncertainty-type must be ...'). Consider enforcing that the metadata dict contains at least one supported key (e.g., require uncertainty-type when any uncertainty fields are present, or validate that the dict isn’t empty unless it contains only evaluated-standard-deviation fields).

Copilot uses AI. Check for mistakes.
Comment on lines 85 to +118
amount:
required: true
type: list
oneof:
anyof:
- items:
- type: float
- items:
- type: float
- type: dict
schema:
uncertainty-type:
required: true
type: string
allowed:
- absolute
- relative
uncertainty:
required: true
type: float
excludes:
- upper-uncertainty
- lower-uncertainty
dependencies:
- uncertainty-type
upper-uncertainty:
required: true
type: float
excludes: uncertainty
dependencies: lower-uncertainty
dependencies:
- lower-uncertainty
- uncertainty-type
lower-uncertainty:
required: true
type: float
excludes: uncertainty
dependencies: upper-uncertainty
dependencies:
- upper-uncertainty
- uncertainty-type
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to value_unit_schema.yaml, the composition amount uncertainty dict has no required keys, so [value, {}] can validate but will fail later in DataPoint.process_quantity() due to missing uncertainty-type. Please ensure the dict is either (a) a valid uncertainty block with uncertainty-type + values, or (b) a valid evaluated-standard-deviation-only metadata block, and reject empty dicts.

Copilot uses AI. Check for mistakes.
Comment thread pyked/batch_convert.py
Comment on lines +338 to +343
def parse_reference(root, xml_filename):
ref = {}
bib = root.find('bibliographyLink')
if bib is None:
ref['detail'] = f'Converted from ReSpecTh XML file {xml_filename}'
return ref
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When <bibliographyLink> is missing, this returns a reference dict with only detail. The ChemKED schema requires reference.authors (min length 1) and reference.year, so the converter can emit YAML that will never validate. Prefer failing fast here (raise) or populating schema-required fallback fields in a way that still produces valid ChemKED.

Copilot uses AI. Check for mistakes.
Comment thread pyked/batch_convert.py
Comment on lines +397 to +405
apparatus = {'kind': '', 'institution': '', 'facility': ''}
kind_el = root.find('apparatus/kind')
if kind_el is not None and kind_el.text:
apparatus['kind'] = kind_el.text.strip()
modes = root.findall('apparatus/mode')
if modes and modes[0].text:
apparatus['mode'] = modes[0].text.strip()

return exp_type, apparatus
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_experiment_kind() leaves apparatus['kind'] as an empty string when <apparatus><kind> is missing. Since the ChemKED schema requires apparatus.kind and restricts it to an enum, this will produce invalid YAML. Consider raising an error (consistent with converters.get_experiment_kind) instead of emitting an empty kind.

Copilot uses AI. Check for mistakes.
Comment thread pyked/batch_convert.py
Comment on lines +1359 to +1376
props['file-type'] = 'experiment'

exp_type, apparatus = parse_experiment_kind(root)
props['experiment-type'] = exp_type
props['apparatus'] = apparatus

# Method and comments
method = (root.findtext('method') or '').strip()
if method:
props['method'] = method

comments = []
for c_el in root.findall('comment'):
if c_el.text and c_el.text.strip():
comments.append(c_el.text.strip())
if comments:
props['comments'] = comments

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds file-type and comments fields, but schemas/chemked_schema.yaml does not define these keys (Cerberus will flag them as unknown). If batch_convert.py is intended to produce valid ChemKED YAML consumable by ChemKED, either extend the schema to allow these fields or omit/rename them to schema-supported keys.

Suggested change
props['file-type'] = 'experiment'
exp_type, apparatus = parse_experiment_kind(root)
props['experiment-type'] = exp_type
props['apparatus'] = apparatus
# Method and comments
method = (root.findtext('method') or '').strip()
if method:
props['method'] = method
comments = []
for c_el in root.findall('comment'):
if c_el.text and c_el.text.strip():
comments.append(c_el.text.strip())
if comments:
props['comments'] = comments
exp_type, apparatus = parse_experiment_kind(root)
props['experiment-type'] = exp_type
props['apparatus'] = apparatus
# Method
method = (root.findtext('method') or '').strip()
if method:
props['method'] = method

Copilot uses AI. Check for mistakes.
Comment thread pyked/batch_convert.py Outdated
Comment on lines +1526 to +1532
props['file-type'] = 'kdetermination'
props['experiment-type'] = 'rate coefficient'

# Parse reactions
reactions = parse_reactions(root)
if reactions:
props['reactions'] = reactions
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For kdetermination, this emits file-type and reactions (plural) keys that are not present in the ChemKED schema (chemked_schema.yaml only defines optional reaction (singular), method, and bulk-gas). As written, output from this converter will fail schema validation. Align the output keys with the schema (or update the schema to include file-type and reactions).

Suggested change
props['file-type'] = 'kdetermination'
props['experiment-type'] = 'rate coefficient'
# Parse reactions
reactions = parse_reactions(root)
if reactions:
props['reactions'] = reactions
props['experiment-type'] = 'rate coefficient'
# Parse reactions
reactions = parse_reactions(root)
if reactions:
props['reaction'] = reactions

Copilot uses AI. Check for mistakes.
Comment thread pyked/batch_convert.py
Comment on lines +1626 to +1676
xml_filename = original_filename or os.path.basename(xml_path)

props = parse_file_metadata(root)
props['reference'] = parse_reference(root, xml_filename)
props['file-type'] = 'tdetermination'
props['experiment-type'] = 'thermochemical'

# Parse reactions (tdetermination may have species/reaction info)
reactions = parse_reactions(root)
if reactions:
props['reactions'] = reactions

method = (root.findtext('method') or '').strip()
if method:
props['method'] = method

comments = []
for c_el in root.findall('comment'):
if c_el.text and c_el.text.strip():
comments.append(c_el.text.strip())
if comments:
props['comments'] = comments

common = parse_common_properties(root, 'thermochemical')
props['common-properties'] = common

all_dgs = root.findall('dataGroup')
if not all_dgs:
raise ValueError('No dataGroup found')

dg = all_dgs[0]
dg_defs = parse_datagroup_props(dg)

props['datapoints'] = parse_tdet_datapoints(dg, dg_defs, common)

if not props.get('datapoints'):
raise ValueError('No datapoints parsed')

for dp in props['datapoints']:
for key, val in common.items():
if key not in dp:
dp[key] = val

common.pop('uncertainty', None)
common.pop('evaluated-standard-deviation', None)
common.pop('_pending_esd', None)
common.pop('_pending_unc', None)

return props


Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

experiment-type is set to thermochemical, but this value is not allowed by the ChemKED schema (and there is no thermochemical datapoint schema included). This means any YAML produced for <tdetermination> will fail ChemKED.validate_yaml(). Either add thermochemical support to the schema or remove/disable <tdetermination> conversion until the schema supports it.

Suggested change
xml_filename = original_filename or os.path.basename(xml_path)
props = parse_file_metadata(root)
props['reference'] = parse_reference(root, xml_filename)
props['file-type'] = 'tdetermination'
props['experiment-type'] = 'thermochemical'
# Parse reactions (tdetermination may have species/reaction info)
reactions = parse_reactions(root)
if reactions:
props['reactions'] = reactions
method = (root.findtext('method') or '').strip()
if method:
props['method'] = method
comments = []
for c_el in root.findall('comment'):
if c_el.text and c_el.text.strip():
comments.append(c_el.text.strip())
if comments:
props['comments'] = comments
common = parse_common_properties(root, 'thermochemical')
props['common-properties'] = common
all_dgs = root.findall('dataGroup')
if not all_dgs:
raise ValueError('No dataGroup found')
dg = all_dgs[0]
dg_defs = parse_datagroup_props(dg)
props['datapoints'] = parse_tdet_datapoints(dg, dg_defs, common)
if not props.get('datapoints'):
raise ValueError('No datapoints parsed')
for dp in props['datapoints']:
for key, val in common.items():
if key not in dp:
dp[key] = val
common.pop('uncertainty', None)
common.pop('evaluated-standard-deviation', None)
common.pop('_pending_esd', None)
common.pop('_pending_unc', None)
return props
raise NotImplementedError(
'<tdetermination> conversion is currently disabled because ChemKED does '
'not support the required experiment-type "thermochemical".'
)

Copilot uses AI. Check for mistakes.
@LekiaAnonim LekiaAnonim marked this pull request as draft April 29, 2026 13:55
@LekiaAnonim LekiaAnonim marked this pull request as ready for review April 29, 2026 13:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyked/chemked.py
Comment on lines +790 to +793
# Condensed exponent notation: letter immediately followed by a negative
# integer (e.g. "s-1", "mol-1"). Only negative exponents are converted to
# avoid false positives on strings like "H2O".
_UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)')
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)') only captures a single letter before the exponent. For units like mol-1 or molecule-1 this will rewrite incorrectly (e.g., molecul e**-1). Use a token-level pattern (e.g., r'([A-Za-z]+)(-\d+)' with word boundaries) so full unit symbols are converted safely.

Suggested change
# Condensed exponent notation: letter immediately followed by a negative
# integer (e.g. "s-1", "mol-1"). Only negative exponents are converted to
# avoid false positives on strings like "H2O".
_UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)')
# Condensed exponent notation: a unit token immediately followed by a
# negative integer (e.g. "s-1", "mol-1"). Only negative exponents are
# converted to avoid false positives on strings like "H2O".
_UNIT_EXP_RE = re.compile(r'\b([A-Za-z]+)(-\d+)\b')

Copilot uses AI. Check for mistakes.

with open(filename, 'r') as f:
return yaml.load(f)
properties = yaml.load(f)
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture uses yaml.load(f) without specifying a Loader. With modern PyYAML versions this is deprecated/unsafe and can raise errors depending on configuration. Use yaml.safe_load(f) (or yaml.load(f, Loader=...)) so the tests are compatible with newer PyYAML and avoid unsafe parsing.

Suggested change
properties = yaml.load(f)
properties = yaml.safe_load(f)

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +5
# Datapoints contain temperature (required) and rate-coefficient (required).
# Pressure and composition are optional (often absent for kdetermination data).
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says rate-coefficient is required for each datapoint, but the schema later defines rate-coefficient using *value-unit-optional. Please align the comment/docs with the schema (or make the field required if that’s the intent).

Suggested change
# Datapoints contain temperature (required) and rate-coefficient (required).
# Pressure and composition are optional (often absent for kdetermination data).
# Datapoints contain temperature (required); rate-coefficient, pressure,
# and composition are optional (often absent for kdetermination data).

Copilot uses AI. Check for mistakes.
Comment thread setup.py
Comment on lines 65 to 74
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Topic :: Scientific/Engineering :: Chemistry',
],
tests_require=tests_require,
extras_require=extras_require,
setup_requires=setup_requires,
python_requires='~=3.5',
python_requires='>=3.7',
entry_points={
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python_requires is now >=3.7, but this repo still declares Python 3.5/3.6 support in classifiers and the CI config (.travis.yml) runs 3.5/3.6. This mismatch will make installs/tests fail for those environments. Either update CI/classifiers (and any docs) to 3.7+ or relax python_requires to match the supported versions.

Copilot uses AI. Check for mistakes.
Comment thread pyked/chemked.py
"""
# Standard libraries
import re
from decimal import Decimal, InvalidOperation
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidOperation is imported but not used anywhere in this file. Please remove it (or use it if it was intended for Decimal parsing error handling) to avoid lint/test failures and reduce clutter.

Suggested change
from decimal import Decimal, InvalidOperation
from decimal import Decimal

Copilot uses AI. Check for mistakes.
Comment thread docs/schema-docs.rst

- ``kind``: string, required
Must be one of ``shock tube`` or ``rapid compression machine``. Values are case-sensitive.
Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``jet stirred reactor``, ``flow reactor``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive.
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented allowed values for apparatus.kind are incomplete compared to the schema (e.g., the schema also allows stirred reactor (quartz), stirred reactor (fused silica), flow reactor (quartz), flow reactor (alumina), and flow reactor (recrystallized alumina)). Please update the docs list to match the schema exactly so users don’t create files that validate but appear undocumented (or vice versa).

Suggested change
Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``jet stirred reactor``, ``flow reactor``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive.
Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``stirred reactor (quartz)``, ``stirred reactor (fused silica)``, ``jet stirred reactor``, ``flow reactor``, ``flow reactor (quartz)``, ``flow reactor (alumina)``, ``flow reactor (recrystallized alumina)``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive.

Copilot uses AI. Check for mistakes.
Comment thread pyked/batch_convert.py
Comment on lines +1560 to +1564
def convert_file(xml_path, original_filename=None):
"""Convert a single ReSpecTh XML file → ChemKED property dict (or None).

Supports <experiment>, <kdetermination>, and <tdetermination> root elements.

Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new module adds substantial conversion logic (multiple experiment types, DOI enrichment, unit normalization, uncertainty handling) but there are no tests for it. Please add unit tests (e.g., decode_latex, parse_author_string, experiment-type mapping) and at least one small end-to-end XML→YAML fixture to prevent regressions.

Copilot uses AI. Check for mistakes.
Comment thread pyked/converters.py
Comment on lines +196 to +199
mode = getattr(root.find('apparatus/mode'), 'text', None)
if mode:
modes = root.findall('apparatus/mode')
properties['apparatus']['mode'] = [m.text.strip() for m in modes if m.text]
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparatus/mode parsing is gated on mode = root.find('apparatus/mode').text being truthy. If the first <mode> element exists but is empty/whitespace (or missing text) while later <mode> elements are populated, modes will be silently ignored. Prefer checking modes = root.findall('apparatus/mode') and building the list from that directly.

Suggested change
mode = getattr(root.find('apparatus/mode'), 'text', None)
if mode:
modes = root.findall('apparatus/mode')
properties['apparatus']['mode'] = [m.text.strip() for m in modes if m.text]
modes = root.findall('apparatus/mode')
parsed_modes = [m.text.strip() for m in modes if m.text and m.text.strip()]
if parsed_modes:
properties['apparatus']['mode'] = parsed_modes

Copilot uses AI. Check for mistakes.
Comment thread pyked/chemked.py
# Local imports
from .validation import schema, OurValidator, yaml, Q_
from .converters import datagroup_properties, ReSpecTh_to_ChemKED
from pint import DimensionalityError
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DimensionalityError is imported but not used in this file. Please remove the import to keep dependencies clear and avoid linting issues.

Suggested change
from pint import DimensionalityError

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants