Added new chemked schema for new experiment types by LekiaAnonim · Pull Request #132 · pr-omethe-us/PyKED

LekiaAnonim · 2026-03-28T01:31:42Z

Fixes #
Tests added
Added entry into CHANGELOG.md
Documentation updated
Extended ChemKED schema to support 5 new experiment types beyond ignition delay: laminar burning velocity measurement, concentration time profile measurement, jet stirred reactor measurement, outlet concentration measurement, and burner stabilized flame speciation measurement
Added 5 new schema YAML files under pyked/schemas/ with datapoint definitions for each new experiment type
Updated chemked_schema.yaml with new !include directives, expanded experiment-type enum (6 types), expanded apparatus.kind (14 types), added apparatus.mode (14 allowed values), and expanded common-properties (temperature, residence-time, reactor-volume, flow-rate, equivalence-ratio)
Updated validation.py to handle new template keys in schema processing and added property_units entries for laminar-burning-velocity, distance, flow-rate, residence-time, reactor-volume, and volumetric-flow-in-reference-state
Updated chemked.py DataPoint class to support optional composition, measured-composition, concentration-profiles, and time-shift fields
Updated converters.py with support for all 6 experiment types in get_experiment_kind() (case-insensitive matching), apparatus.mode parsing, and conditional ignition-type/pressure-rise validation only for ignition delay experiments
Added batch_convert.py — a standalone batch converter for ReSpecTh v2.3/v2.4 XML files to ChemKED YAML, handling structured <details> bibliography references, LaTeX-to-Unicode author name decoding, and concentration property name support

@pr-omethe-us/chemked

Copilot

Pull request overview

This PR extends the ChemKED schema and conversion tooling to represent five additional experiment types beyond ignition delay, including new datapoint schemas and updated parsing/validation support.

Changes:

Added 5 new experiment-type datapoint schema YAML files and wired them into chemked_schema.yaml.
Updated validation/unit handling and DataPoint parsing to accommodate new optional fields and properties.
Extended ReSpecTh conversion logic and added a standalone batch converter for ReSpecTh XML → ChemKED YAML.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`pyked/validation.py`	Updates schema preprocessing and adds unit mappings for new properties.
`pyked/schemas/chemked_schema.yaml`	Includes new schemas; expands enums for experiment/apparatus; adds common properties.
`pyked/schemas/laminar_burning_velocity_measurement_schema.yaml`	New datapoint schema for laminar burning velocity measurements.
`pyked/schemas/concentration_time_profile_measurement_schema.yaml`	New datapoint schema for concentration time profiles (adds `time-shift`, `concentration-profiles`).
`pyked/schemas/jet_stirred_reactor_measurement_schema.yaml`	New datapoint schema for JSR measurements.
`pyked/schemas/outlet_concentration_measurement_schema.yaml`	New datapoint schema for outlet concentration measurements.
`pyked/schemas/burner_stabilized_flame_speciation_measurement_schema.yaml`	New datapoint schema for burner-stabilized flame speciation measurements.
`pyked/converters.py`	Broadens experiment/apparatus parsing and relaxes some ignition-delay-only checks.
`pyked/chemked.py`	Extends `DataPoint` to accept optional composition and new measured/profile fields.
`pyked/batch_convert.py`	Adds a new batch conversion CLI/script for ReSpecTh v2.3/v2.4 → ChemKED.

Comments suppressed due to low confidence (1)

pyked/chemked.py:706

DataPoint now supports new optional fields (measured-composition, concentration-profiles, time-shift) and allows composition to be absent, but there are no corresponding updates in the existing test_chemked.py coverage (no occurrences of these keys). Adding unit tests that load minimal YAML dicts for each new field would help ensure parsing/attribute behavior stays stable and catches schema/processing regressions.

        if 'composition' in properties:
            self.composition_type = properties['composition']['kind']
            composition = {}
            for species in properties['composition']['species']:
                species_name = species['species-name']
                amount = self.process_quantity(species['amount'])
                InChI = species.get('InChI')
                SMILES = species.get('SMILES')
                atomic_composition = species.get('atomic-composition')
                composition[species_name] = Composition(
                    species_name=species_name, InChI=InChI, SMILES=SMILES,
                    atomic_composition=atomic_composition, amount=amount)
            setattr(self, 'composition', composition)
        else:
            self.composition_type = None
            self.composition = {}

        # Measured composition (for JSR, OCM, BSFSM experiment types)
        if 'measured-composition' in properties:
            self.measured_composition_type = properties['measured-composition']['kind']
            measured = {}
            for species in properties['measured-composition']['species']:
                species_name = species['species-name']
                amount = self.process_quantity(species['amount'])
                InChI = species.get('InChI')
                SMILES = species.get('SMILES')
                atomic_composition = species.get('atomic-composition')
                measured[species_name] = Composition(
                    species_name=species_name, InChI=InChI, SMILES=SMILES,
                    atomic_composition=atomic_composition, amount=amount)
            self.measured_composition = measured
        else:
            self.measured_composition_type = None
            self.measured_composition = {}

        # Concentration profiles (for concentration time profile measurement)
        self.concentration_profiles = []
        if 'concentration-profiles' in properties:
            for profile in properties['concentration-profiles']:
                self.concentration_profiles.append(profile)

        # Time shift (for concentration time profile measurement)
        self.time_shift = properties.get('time-shift')

        self.equivalence_ratio = properties.get('equivalence-ratio')
        self.ignition_type = deepcopy(properties.get('ignition-type'))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-28T01:42:02Z

+    python convert_respecth_to_chemked.py
+    python convert_respecth_to_chemked.py -i ReSpecTh/indirect -o ChemKED-database
+    python convert_respecth_to_chemked.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml
+    python convert_respecth_to_chemked.py --dry-run


The module docstring “Usage” examples reference convert_respecth_to_chemked.py, but the file introduced in this PR is batch_convert.py. This is likely to confuse users and makes copy/paste usage incorrect; update the examples to match the actual filename (or document running it as a module/console script if that’s the intent).

Suggested change

python convert_respecth_to_chemked.py

python convert_respecth_to_chemked.py -i ReSpecTh/indirect -o ChemKED-database

python convert_respecth_to_chemked.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml

python convert_respecth_to_chemked.py --dry-run

python batch_convert.py

python batch_convert.py -i ReSpecTh/indirect -o ChemKED-database

python batch_convert.py --file ReSpecTh/indirect/ammonia/.../x20100057.xml

python batch_convert.py --dry-run

Copilot · 2026-03-28T01:42:02Z

+import sys
+import xml.etree.ElementTree as ET
+from pathlib import Path
+import yaml
+import argparse
+import logging
+import traceback


sys and traceback are imported but unused. Removing unused imports keeps the script clean and avoids failing stricter linters.

Suggested change

import sys

import xml.etree.ElementTree as ET

from pathlib import Path

import yaml

import argparse

import logging

import traceback

import xml.etree.ElementTree as ET

from pathlib import Path

import yaml

import argparse

import logging

Copilot · 2026-03-28T01:42:03Z

+
+logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
+log = logging.getLogger(__name__)
+
+CHEMKED_VERSION = '0.4.1'
+
+


CHEMKED_VERSION is hard-coded to 0.4.1. This will drift as the ChemKED schema evolves; consider deriving this from the packaged schema (e.g., the latest entry in schema['chemked-version']['allowed'] or a single shared constant) so batch conversion keeps emitting files that validate against the current schema without manual updates.

Suggested change

logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

log = logging.getLogger(__name__)

CHEMKED_VERSION = '0.4.1'

import importlib

logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

log = logging.getLogger(__name__)

def get_chemked_version():

"""Return the ChemKED schema version to use for generated files.

Prefer the version from the packaged schema (if available), falling

back to a sensible default if the schema cannot be loaded.

"""

default_version = '0.4.1'

try:

schema_module = importlib.import_module('pyked.schema')

except ImportError:

return default_version

schema = getattr(schema_module, 'schema', None)

if not isinstance(schema, dict):

return default_version

chemked_version_info = schema.get('chemked-version', {})

allowed = chemked_version_info.get('allowed')

if isinstance(allowed, (list, tuple)) and allowed:

# Use the last allowed version, assuming it is the latest.

return str(allowed[-1])

return default_version

CHEMKED_VERSION = get_chemked_version()

Copilot · 2026-03-28T01:42:03Z

    properties['apparatus'] = {'kind': '', 'institution': '', 'facility': ''}
    kind = getattr(root.find('apparatus/kind'), 'text', False)
    # Test for missing attribute or empty string
    if not kind:
        raise MissingElementError('apparatus/kind')
-    elif kind in ['shock tube', 'rapid compression machine']:
-        properties['apparatus']['kind'] = kind
    else:
-        raise NotImplementedError(kind + ' experiment not (yet) supported')
+        properties['apparatus']['kind'] = kind
+
+    mode = getattr(root.find('apparatus/mode'), 'text', None)
+    if mode:
+        properties['apparatus']['mode'] = mode


get_experiment_kind() now accepts any apparatus/kind (and apparatus/mode) text and passes it through unchanged. Since the ChemKED schema constrains apparatus.kind/mode to an allowed set, this can cause ReSpecTh_to_ChemKED(..., validate=True) or later ChemKED validation to fail with an unhelpful error for unexpected but mappable ReSpecTh values. Consider normalizing/mapping known ReSpecTh variants to the schema’s allowed strings and raising a targeted NotImplementedError (or KeywordError) when the kind/mode can’t be mapped.

Copilot · 2026-03-28T01:42:03Z

        'ignition-delay', 'first-stage-ignition-delay', 'temperature', 'pressure',
-        'pressure-rise',
+        'pressure-rise', 'laminar-burning-velocity', 'distance', 'flow-rate',
+        'residence-time', 'volumetric-flow-in-reference-state',


reactor-volume was added to the schema/common-properties and validation.property_units, but DataPoint.value_unit_props doesn’t include it. As a result, any reactor-volume values present in YAML will be ignored when constructing DataPoint objects (information loss), and downstream code can’t access it as a parsed pint.Quantity. Add reactor-volume to value_unit_props (and ensure it’s handled consistently with the other value+unit properties).

Suggested change

'residence-time', 'volumetric-flow-in-reference-state',

'residence-time', 'volumetric-flow-in-reference-state', 'reactor-volume',

Copilot · 2026-03-28T01:42:04Z

+        - extrapolation method to zero stretch : LS
+        - extrapolation method to zero stretch : NQ


The apparatus.mode.allowed list contains entries with : (e.g., extrapolation method to zero stretch : LS). In YAML, an unquoted : followed by a space is parsed as a mapping separator, so these list items will not be treated as plain strings and can break schema loading/validation. Quote these values (or remove the colon/space pattern) so they remain string literals in the allowed list.

Suggested change

- extrapolation method to zero stretch : LS

- extrapolation method to zero stretch : NQ

- "extrapolation method to zero stretch : LS"

- "extrapolation method to zero stretch : NQ"

Copilot · 2026-03-28T01:42:04Z

-                        'pressure rise',
+                        'pressure rise', 'laminar burning velocity',
+                        'distance', 'flow rate', 'residence time',
+                        'volumetric flow in reference state',


datagroup_properties includes volumetric flow in reference state, but elsewhere in this PR (e.g., batch_convert.py) the property name is volumetric flow rate in reference state. With the current list, a ReSpecTh file using the ...flow rate... spelling will be rejected as an invalid property in both get_common_properties() and get_datapoints(). Consider accepting both spellings (or standardizing on the ReSpecTh-canonical name) and mapping both to the ChemKED key volumetric-flow-in-reference-state.

Suggested change

'volumetric flow in reference state',

'volumetric flow in reference state',

'volumetric flow rate in reference state',

Copilot · 2026-03-28T01:42:04Z

+    exp_type_text = getattr(root.find('experimentType'), 'text', '')
+    exp_type_map = {
+        'Ignition delay measurement': 'ignition delay',
+        'Laminar burning velocity measurement': 'laminar burning velocity measurement',
+        'Concentration time profile measurement': 'concentration time profile measurement',
+        'Jet stirred reactor measurement': 'jet stirred reactor measurement',
+        'Outlet concentration measurement': 'outlet concentration measurement',
+        'Burner stabilized flame speciation measurement': 'burner stabilized flame speciation measurement',
+    }
+    matched_type = exp_type_map.get(exp_type_text)
+    if matched_type is None:
+        # Try case-insensitive match
+        for key, val in exp_type_map.items():
+            if key.lower() == exp_type_text.lower():
+                matched_type = val
+                break
+    if matched_type is None:
+        raise NotImplementedError(exp_type_text + ' not (yet) supported')
+    properties['experiment-type'] = matched_type


exp_type_text is taken directly from root.find('experimentType').text without stripping whitespace. If the XML contains leading/trailing whitespace/newlines (common in pretty-printed files), the lookup and the case-insensitive fallback will fail and raise NotImplementedError for an otherwise supported experiment type. Using root.findtext('experimentType', '').strip() (and raising MissingElementError('experimentType') when empty) would make this parsing robust and align error handling with other required elements like apparatus/kind.

Copilot · 2026-03-28T01:42:04Z

    # Ensure ignition delay, and get which kind of experiment
    properties.update(get_experiment_kind(root))

    # Get properties shared across the file
    properties['common-properties'] = get_common_properties(root)

-    # Determine definition of ignition delay
-    properties['common-properties']['ignition-type'] = get_ignition_type(root)
+    # Determine definition of ignition delay (only for ignition delay experiments)
+    if properties['experiment-type'] == 'ignition delay':
+        properties['common-properties']['ignition-type'] = get_ignition_type(root)

-    # Now parse ignition delay datapoints
+    # Now parse datapoints
    properties['datapoints'] = get_datapoints(root)



ReSpecTh_to_ChemKED() now allows non-ignition experiment types via get_experiment_kind(), but it still unconditionally calls get_datapoints(), whose docstring/logic is ignition-delay-specific (e.g., it only supports the ignition-delay dataGroup structure and will reject other required columns like time for concentration time profiles). This means supported experiment types can fail later with unrelated KeyError/KeywordError instead of a clear “not implemented” message, and it doesn’t produce datapoints matching the new non-IDT schemas. Either keep rejecting non-ignition types in this converter until per-type datapoint parsing is implemented, or add experiment-type-specific datapoint parsing that matches the new schemas.

Copilot · 2026-03-28T01:42:05Z

+
+    exp_type_text = getattr(root.find('experimentType'), 'text', '')
+    exp_type_map = {
+        'Ignition delay measurement': 'ignition delay',
+        'Laminar burning velocity measurement': 'laminar burning velocity measurement',
+        'Concentration time profile measurement': 'concentration time profile measurement',
+        'Jet stirred reactor measurement': 'jet stirred reactor measurement',
+        'Outlet concentration measurement': 'outlet concentration measurement',
+        'Burner stabilized flame speciation measurement': 'burner stabilized flame speciation measurement',
+    }
+    matched_type = exp_type_map.get(exp_type_text)
+    if matched_type is None:
+        # Try case-insensitive match
+        for key, val in exp_type_map.items():
+            if key.lower() == exp_type_text.lower():
+                matched_type = val
+                break
+    if matched_type is None:
+        raise NotImplementedError(exp_type_text + ' not (yet) supported')
+    properties['experiment-type'] = matched_type

    properties['apparatus'] = {'kind': '', 'institution': '', 'facility': ''}
    kind = getattr(root.find('apparatus/kind'), 'text', False)
    # Test for missing attribute or empty string
    if not kind:
        raise MissingElementError('apparatus/kind')
-    elif kind in ['shock tube', 'rapid compression machine']:
-        properties['apparatus']['kind'] = kind
    else:
-        raise NotImplementedError(kind + ' experiment not (yet) supported')
+        properties['apparatus']['kind'] = kind
+
+    mode = getattr(root.find('apparatus/mode'), 'text', None)
+    if mode:
+        properties['apparatus']['mode'] = mode

    return properties


get_experiment_kind() behavior changes (new supported experiment types, relaxed apparatus kind validation, optional apparatus.mode) will require updating pyked/tests/test_converters.py expectations and adding coverage for at least one non-ignition experiment type + apparatus/mode parsing. Without corresponding test updates, the existing converter test suite will fail or no longer verify the intended behavior for these new branches.

…roperties

…oring

…pace indent

…ssues

…nd converters

…ma to newer ignition delay targets. Normalise inverse units (ms-1 → 1/ms) for pint compatibility

Copilot

Pull request overview

Copilot reviewed 18 out of 19 changed files in this pull request and generated 13 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T21:26:08Z

                orcid = author.get('ORCID')
                if orcid:
-                    auth['ORCID'] = orcid.lstrip('http://orcid.org/')
+                    auth['ORCID'] = orcid.removeprefix('https://orcid.org/').removeprefix('http://orcid.org/')


str.removeprefix() requires Python 3.9+, but setup.py declares python_requires='>=3.7'. This will raise AttributeError on Python 3.7/3.8. Either avoid removeprefix (e.g., manual prefix stripping) or bump the minimum supported Python version accordingly.

Suggested change

auth['ORCID'] = orcid.removeprefix('https://orcid.org/').removeprefix('http://orcid.org/')

if orcid.startswith('https://orcid.org/'):

auth['ORCID'] = orcid[len('https://orcid.org/'):]

elif orcid.startswith('http://orcid.org/'):

auth['ORCID'] = orcid[len('http://orcid.org/'):]

else:

auth['ORCID'] = orcid

Copilot · 2026-04-02T21:26:09Z

+        # Use UnvalidatedSchema to bypass cerberus 1.3's schema-of-schema
+        # validation, which fails because its internal SchemaValidator doesn't
+        # inherit OurValidator's custom _validate_isvalid_* rules.
+        validator = OurValidator()
+        validator._schema = UnvalidatedSchema(schema)


ChemKED.validate_yaml() sets validator._schema directly. This bypasses Cerberus’ public schema initialization (and any internal cache resets done by the schema property / constructor), making validation behavior version-dependent and potentially incorrect. Prefer OurValidator(schema) / OurValidator(schema=schema) (your OurValidator.__init__ already wraps dicts in UnvalidatedSchema) or assign via the public validator.schema = UnvalidatedSchema(schema) API if needed.

Copilot · 2026-04-02T21:26:09Z

@@ -15,6 +17,7 @@
 # Local imports
 from .validation import schema, OurValidator, yaml, Q_
 from .converters import datagroup_properties, ReSpecTh_to_ChemKED
+from pint import DimensionalityError



These imports appear unused: InvalidOperation and DimensionalityError. Leaving unused imports can break strict linting and makes it harder to track dependencies; please remove them (or add the missing usage if intended).

Copilot · 2026-04-02T21:26:09Z

    - type: dict
      schema:
        uncertainty-type:
-          required: true
          type: string
          allowed:
            - absolute
            - relative
        uncertainty:
-          required: true
          anyof_type:
            - string
            - float
          excludes:
            - upper-uncertainty
            - lower-uncertainty
+          dependencies:
+            - uncertainty-type
        upper-uncertainty:


value-with-uncertainty currently allows a second list item of {} (since none of the dict keys are required). That can pass Cerberus validation (your _validate_isvalid_uncertainty doesn’t error when uncertainty-type is missing), but DataPoint.process_quantity() will later raise ValueError('uncertainty-type must be ...'). Consider enforcing that the metadata dict contains at least one supported key (e.g., require uncertainty-type when any uncertainty fields are present, or validate that the dict isn’t empty unless it contains only evaluated-standard-deviation fields).

Copilot · 2026-04-02T21:26:10Z

          amount:
            required: true
            type: list
-            oneof:
+            anyof:
              - items:
                - type: float
              - items:
                - type: float
                - type: dict
                  schema:
                    uncertainty-type:
-                      required: true
                      type: string
                      allowed:
                        - absolute
                        - relative
                    uncertainty:
-                      required: true
                      type: float
                      excludes:
                        - upper-uncertainty
                        - lower-uncertainty
+                      dependencies:
+                        - uncertainty-type
                    upper-uncertainty:
-                      required: true
                      type: float
                      excludes: uncertainty
-                      dependencies: lower-uncertainty
+                      dependencies:
+                        - lower-uncertainty
+                        - uncertainty-type
                    lower-uncertainty:
-                      required: true
                      type: float
                      excludes: uncertainty
-                      dependencies: upper-uncertainty
+                      dependencies:
+                        - upper-uncertainty
+                        - uncertainty-type


Similar to value_unit_schema.yaml, the composition amount uncertainty dict has no required keys, so [value, {}] can validate but will fail later in DataPoint.process_quantity() due to missing uncertainty-type. Please ensure the dict is either (a) a valid uncertainty block with uncertainty-type + values, or (b) a valid evaluated-standard-deviation-only metadata block, and reject empty dicts.

Copilot · 2026-04-02T21:26:10Z

+def parse_reference(root, xml_filename):
+    ref = {}
+    bib = root.find('bibliographyLink')
+    if bib is None:
+        ref['detail'] = f'Converted from ReSpecTh XML file {xml_filename}'
+        return ref


When <bibliographyLink> is missing, this returns a reference dict with only detail. The ChemKED schema requires reference.authors (min length 1) and reference.year, so the converter can emit YAML that will never validate. Prefer failing fast here (raise) or populating schema-required fallback fields in a way that still produces valid ChemKED.

Copilot · 2026-04-02T21:26:11Z

+    apparatus = {'kind': '', 'institution': '', 'facility': ''}
+    kind_el = root.find('apparatus/kind')
+    if kind_el is not None and kind_el.text:
+        apparatus['kind'] = kind_el.text.strip()
+    modes = root.findall('apparatus/mode')
+    if modes and modes[0].text:
+        apparatus['mode'] = modes[0].text.strip()
+
+    return exp_type, apparatus


parse_experiment_kind() leaves apparatus['kind'] as an empty string when <apparatus><kind> is missing. Since the ChemKED schema requires apparatus.kind and restricts it to an enum, this will produce invalid YAML. Consider raising an error (consistent with converters.get_experiment_kind) instead of emitting an empty kind.

Copilot · 2026-04-02T21:26:11Z

+    props['file-type'] = 'experiment'
+
+    exp_type, apparatus = parse_experiment_kind(root)
+    props['experiment-type'] = exp_type
+    props['apparatus'] = apparatus
+
+    # Method and comments
+    method = (root.findtext('method') or '').strip()
+    if method:
+        props['method'] = method
+
+    comments = []
+    for c_el in root.findall('comment'):
+        if c_el.text and c_el.text.strip():
+            comments.append(c_el.text.strip())
+    if comments:
+        props['comments'] = comments
+


This adds file-type and comments fields, but schemas/chemked_schema.yaml does not define these keys (Cerberus will flag them as unknown). If batch_convert.py is intended to produce valid ChemKED YAML consumable by ChemKED, either extend the schema to allow these fields or omit/rename them to schema-supported keys.

Suggested change

props['file-type'] = 'experiment'

exp_type, apparatus = parse_experiment_kind(root)

props['experiment-type'] = exp_type

props['apparatus'] = apparatus

# Method and comments

method = (root.findtext('method') or '').strip()

if method:

props['method'] = method

comments = []

for c_el in root.findall('comment'):

if c_el.text and c_el.text.strip():

comments.append(c_el.text.strip())

if comments:

props['comments'] = comments

exp_type, apparatus = parse_experiment_kind(root)

props['experiment-type'] = exp_type

props['apparatus'] = apparatus

# Method

method = (root.findtext('method') or '').strip()

if method:

props['method'] = method

Copilot · 2026-04-02T21:26:11Z

+    props['file-type'] = 'kdetermination'
+    props['experiment-type'] = 'rate coefficient'
+
+    # Parse reactions
+    reactions = parse_reactions(root)
+    if reactions:
+        props['reactions'] = reactions


For kdetermination, this emits file-type and reactions (plural) keys that are not present in the ChemKED schema (chemked_schema.yaml only defines optional reaction (singular), method, and bulk-gas). As written, output from this converter will fail schema validation. Align the output keys with the schema (or update the schema to include file-type and reactions).

Suggested change

props['file-type'] = 'kdetermination'

props['experiment-type'] = 'rate coefficient'

# Parse reactions

reactions = parse_reactions(root)

if reactions:

props['reactions'] = reactions

props['experiment-type'] = 'rate coefficient'

# Parse reactions

reactions = parse_reactions(root)

if reactions:

props['reaction'] = reactions

Copilot · 2026-04-02T21:26:11Z

+    xml_filename = original_filename or os.path.basename(xml_path)
+
+    props = parse_file_metadata(root)
+    props['reference'] = parse_reference(root, xml_filename)
+    props['file-type'] = 'tdetermination'
+    props['experiment-type'] = 'thermochemical'
+
+    # Parse reactions (tdetermination may have species/reaction info)
+    reactions = parse_reactions(root)
+    if reactions:
+        props['reactions'] = reactions
+
+    method = (root.findtext('method') or '').strip()
+    if method:
+        props['method'] = method
+
+    comments = []
+    for c_el in root.findall('comment'):
+        if c_el.text and c_el.text.strip():
+            comments.append(c_el.text.strip())
+    if comments:
+        props['comments'] = comments
+
+    common = parse_common_properties(root, 'thermochemical')
+    props['common-properties'] = common
+
+    all_dgs = root.findall('dataGroup')
+    if not all_dgs:
+        raise ValueError('No dataGroup found')
+
+    dg = all_dgs[0]
+    dg_defs = parse_datagroup_props(dg)
+
+    props['datapoints'] = parse_tdet_datapoints(dg, dg_defs, common)
+
+    if not props.get('datapoints'):
+        raise ValueError('No datapoints parsed')
+
+    for dp in props['datapoints']:
+        for key, val in common.items():
+            if key not in dp:
+                dp[key] = val
+
+    common.pop('uncertainty', None)
+    common.pop('evaluated-standard-deviation', None)
+    common.pop('_pending_esd', None)
+    common.pop('_pending_unc', None)
+
+    return props
+
+


experiment-type is set to thermochemical, but this value is not allowed by the ChemKED schema (and there is no thermochemical datapoint schema included). This means any YAML produced for <tdetermination> will fail ChemKED.validate_yaml(). Either add thermochemical support to the schema or remove/disable <tdetermination> conversion until the schema supports it.

Suggested change

xml_filename = original_filename or os.path.basename(xml_path)

props = parse_file_metadata(root)

props['reference'] = parse_reference(root, xml_filename)

props['file-type'] = 'tdetermination'

props['experiment-type'] = 'thermochemical'

# Parse reactions (tdetermination may have species/reaction info)

reactions = parse_reactions(root)

if reactions:

props['reactions'] = reactions

method = (root.findtext('method') or '').strip()

if method:

props['method'] = method

comments = []

for c_el in root.findall('comment'):

if c_el.text and c_el.text.strip():

comments.append(c_el.text.strip())

if comments:

props['comments'] = comments

common = parse_common_properties(root, 'thermochemical')

props['common-properties'] = common

all_dgs = root.findall('dataGroup')

if not all_dgs:

raise ValueError('No dataGroup found')

dg = all_dgs[0]

dg_defs = parse_datagroup_props(dg)

props['datapoints'] = parse_tdet_datapoints(dg, dg_defs, common)

if not props.get('datapoints'):

raise ValueError('No datapoints parsed')

for dp in props['datapoints']:

for key, val in common.items():

if key not in dp:

dp[key] = val

common.pop('uncertainty', None)

common.pop('evaluated-standard-deviation', None)

common.pop('_pending_esd', None)

common.pop('_pending_unc', None)

return props

raise NotImplementedError(

'<tdetermination> conversion is currently disabled because ChemKED does '

'not support the required experiment-type "thermochemical".'

)

…verters.py

…on, CO, H2O, C2, comments

Copilot

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T14:04:31Z

+    # Condensed exponent notation: letter immediately followed by a negative
+    # integer (e.g. "s-1", "mol-1").  Only negative exponents are converted to
+    # avoid false positives on strings like "H2O".
+    _UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)')


_UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)') only captures a single letter before the exponent. For units like mol-1 or molecule-1 this will rewrite incorrectly (e.g., molecul e**-1). Use a token-level pattern (e.g., r'([A-Za-z]+)(-\d+)' with word boundaries) so full unit symbols are converted safely.

Suggested change

# Condensed exponent notation: letter immediately followed by a negative

# integer (e.g. "s-1", "mol-1"). Only negative exponents are converted to

# avoid false positives on strings like "H2O".

_UNIT_EXP_RE = re.compile(r'([A-Za-z])(-\d+)')

# Condensed exponent notation: a unit token immediately followed by a

# negative integer (e.g. "s-1", "mol-1"). Only negative exponents are

# converted to avoid false positives on strings like "H2O".

_UNIT_EXP_RE = re.compile(r'\b([A-Za-z]+)(-\d+)\b')

Copilot · 2026-04-29T14:04:31Z


        with open(filename, 'r') as f:
-            return yaml.load(f)
+            properties = yaml.load(f)


This fixture uses yaml.load(f) without specifying a Loader. With modern PyYAML versions this is deprecated/unsafe and can raise errors depending on configuration. Use yaml.safe_load(f) (or yaml.load(f, Loader=...)) so the tests are compatible with newer PyYAML and avoid unsafe parsing.

Suggested change

properties = yaml.load(f)

properties = yaml.safe_load(f)

Copilot · 2026-04-29T14:04:32Z

+# Datapoints contain temperature (required) and rate-coefficient (required).
+# Pressure and composition are optional (often absent for kdetermination data).


The header comment says rate-coefficient is required for each datapoint, but the schema later defines rate-coefficient using *value-unit-optional. Please align the comment/docs with the schema (or make the field required if that’s the intent).

Suggested change

# Datapoints contain temperature (required) and rate-coefficient (required).

# Pressure and composition are optional (often absent for kdetermination data).

# Datapoints contain temperature (required); rate-coefficient, pressure,

# and composition are optional (often absent for kdetermination data).

Copilot · 2026-04-29T14:04:32Z

        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.5',
        'Programming Language :: Python :: 3.6',
        'Topic :: Scientific/Engineering :: Chemistry',
    ],
    tests_require=tests_require,
    extras_require=extras_require,
    setup_requires=setup_requires,
-    python_requires='~=3.5',
+    python_requires='>=3.7',
    entry_points={


python_requires is now >=3.7, but this repo still declares Python 3.5/3.6 support in classifiers and the CI config (.travis.yml) runs 3.5/3.6. This mismatch will make installs/tests fail for those environments. Either update CI/classifiers (and any docs) to 3.7+ or relax python_requires to match the supported versions.

Copilot · 2026-04-29T14:04:32Z

 """
 # Standard libraries
+import re
+from decimal import Decimal, InvalidOperation


InvalidOperation is imported but not used anywhere in this file. Please remove it (or use it if it was intended for Decimal parsing error handling) to avoid lint/test failures and reduce clutter.

Suggested change

from decimal import Decimal, InvalidOperation

from decimal import Decimal

Copilot · 2026-04-29T14:04:33Z


    - ``kind``: string, required
-        Must be one of ``shock tube`` or ``rapid compression machine``. Values are case-sensitive.
+        Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``jet stirred reactor``, ``flow reactor``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive.


The documented allowed values for apparatus.kind are incomplete compared to the schema (e.g., the schema also allows stirred reactor (quartz), stirred reactor (fused silica), flow reactor (quartz), flow reactor (alumina), and flow reactor (recrystallized alumina)). Please update the docs list to match the schema exactly so users don’t create files that validate but appear undocumented (or vice versa).

Suggested change

Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``jet stirred reactor``, ``flow reactor``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive.

Must be one of ``shock tube``, ``rapid compression machine``, ``stirred reactor``, ``stirred reactor (quartz)``, ``stirred reactor (fused silica)``, ``jet stirred reactor``, ``flow reactor``, ``flow reactor (quartz)``, ``flow reactor (alumina)``, ``flow reactor (recrystallized alumina)``, ``flame``, ``outwardly propagating spherical flame``, ``heat flux burner``, or ``flame cone method``. Values are case-sensitive.

Copilot · 2026-04-29T14:04:33Z

+def convert_file(xml_path, original_filename=None):
+    """Convert a single ReSpecTh XML file → ChemKED property dict (or None).
+
+    Supports <experiment>, <kdetermination>, and <tdetermination> root elements.
+


This new module adds substantial conversion logic (multiple experiment types, DOI enrichment, unit normalization, uncertainty handling) but there are no tests for it. Please add unit tests (e.g., decode_latex, parse_author_string, experiment-type mapping) and at least one small end-to-end XML→YAML fixture to prevent regressions.

Copilot · 2026-04-29T14:04:33Z

+    mode = getattr(root.find('apparatus/mode'), 'text', None)
+    if mode:
+        modes = root.findall('apparatus/mode')
+        properties['apparatus']['mode'] = [m.text.strip() for m in modes if m.text]


apparatus/mode parsing is gated on mode = root.find('apparatus/mode').text being truthy. If the first <mode> element exists but is empty/whitespace (or missing text) while later <mode> elements are populated, modes will be silently ignored. Prefer checking modes = root.findall('apparatus/mode') and building the list from that directly.

Suggested change

mode = getattr(root.find('apparatus/mode'), 'text', None)

if mode:

modes = root.findall('apparatus/mode')

properties['apparatus']['mode'] = [m.text.strip() for m in modes if m.text]

modes = root.findall('apparatus/mode')

parsed_modes = [m.text.strip() for m in modes if m.text and m.text.strip()]

if parsed_modes:

properties['apparatus']['mode'] = parsed_modes

Copilot · 2026-04-29T14:04:33Z

 # Local imports
 from .validation import schema, OurValidator, yaml, Q_
 from .converters import datagroup_properties, ReSpecTh_to_ChemKED
+from pint import DimensionalityError


DimensionalityError is imported but not used in this file. Please remove the import to keep dependencies clear and avoid linting issues.

Suggested change

from pint import DimensionalityError

Added new chemked schema for new experiment types

f794cb7

Copilot AI review requested due to automatic review settings March 28, 2026 01:31

Copilot started reviewing on behalf of LekiaAnonim March 28, 2026 01:32 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

LekiaAnonim added 11 commits March 27, 2026 22:38

fix: match PyKED convention for composition units in batch converter

d3c3807

fix: address PR review feedback

3b44a86

add uncertainty, evaluated-standard-deviation, and reactor geometry p…

c204570

…roperties

refactor: inline uncertainty on property and composition amount fields

c8b2542

docs: update uncertainty_schema.yaml comment to reflect inline refact…

b6e4383

…oring

fix: equivalence-ratio as value-unit-optional, flow-style values, 2-s…

cdd775a

…pace indent

Working on PyKED package upgrade to allow celebrus validation

0239951

feat: add rate coefficient support and fix cerberus/crossref compat i…

d327f44

…ssues

Modified pyked for direct measurement data such as rate coefficient

efae6c2

Added new ignition delay type d/dt min extrapolated to pyked schema a…

5080454

…nd converters

Fix batch_convert ReSpecTh conversion issues and extend ignition sche…

711d152

…ma to newer ignition delay targets. Normalise inverse units (ms-1 → 1/ms) for pint compatibility

LekiaAnonim requested a review from Copilot April 2, 2026 21:17

Copilot started reviewing on behalf of LekiaAnonim April 2, 2026 21:18 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

LekiaAnonim added 10 commits April 2, 2026 17:32

Changed stirred reaction to stirred reactor in chemked schema

fdfd7a9

fix: reject empty uncertainty dicts; add missing property_units entries

55f8f2a

Removed volumetric flow in reference state from dataproperties in con…

45ff61f

…verters.py

fix: strip semicolons from ignition targets; add relative concentrati…

81f06af

…on, CO, H2O, C2, comments

feat: capture amount field for relative concentration ignition type

fc40b8c

Fix HTML entity escaping in CrossRef journal names

27c2b54

Expand schemas for ReSpecTh batch conversion compatibility

0ff6e79

Make apparatus mode to accept multiple values

6f5b483

Made all re import global

8fc4ac3

Include new experiment schema to PyKED docs

09abd79

LekiaAnonim requested a review from Copilot April 29, 2026 13:52

Copilot started reviewing on behalf of LekiaAnonim April 29, 2026 13:53 View session

LekiaAnonim marked this pull request as draft April 29, 2026 13:55

LekiaAnonim marked this pull request as ready for review April 29, 2026 13:56

Copilot AI reviewed Apr 29, 2026

View reviewed changes

-logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
-log = logging.getLogger(__name__)
-CHEMKED_VERSION = '0.4.1'
+import importlib
+logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
+log = logging.getLogger(__name__)
+def get_chemked_version():
+    """Return the ChemKED schema version to use for generated files.
+    Prefer the version from the packaged schema (if available), falling
+    back to a sensible default if the schema cannot be loaded.
+    """
+    default_version = '0.4.1'
+    try:
+        schema_module = importlib.import_module('pyked.schema')
+    except ImportError:
+        return default_version
+    schema = getattr(schema_module, 'schema', None)
+    if not isinstance(schema, dict):
+        return default_version
+    chemked_version_info = schema.get('chemked-version', {})
+    allowed = chemked_version_info.get('allowed')
+    if isinstance(allowed, (list, tuple)) and allowed:
+        # Use the last allowed version, assuming it is the latest.
+        return str(allowed[-1])
+    return default_version
+CHEMKED_VERSION = get_chemked_version()

	'residence-time', 'volumetric-flow-in-reference-state',
	'residence-time', 'volumetric-flow-in-reference-state', 'reactor-volume',

		- extrapolation method to zero stretch : LS
		- extrapolation method to zero stretch : NQ

	'volumetric flow in reference state',
	'volumetric flow in reference state',
	'volumetric flow rate in reference state',

-                    auth['ORCID'] = orcid.removeprefix('https://orcid.org/').removeprefix('http://orcid.org/')
+                    if orcid.startswith('https://orcid.org/'):
+                        auth['ORCID'] = orcid[len('https://orcid.org/'):]
+                    elif orcid.startswith('http://orcid.org/'):
+                        auth['ORCID'] = orcid[len('http://orcid.org/'):]
+                    else:
+                        auth['ORCID'] = orcid

		# Datapoints contain temperature (required) and rate-coefficient (required).
		# Pressure and composition are optional (often absent for kdetermination data).

	from decimal import Decimal, InvalidOperation
	from decimal import Decimal

Conversation

LekiaAnonim commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

LekiaAnonim commented Mar 28, 2026 •

edited

Loading