-
Notifications
You must be signed in to change notification settings - Fork 6
Myst frontmatter support #899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughA new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant MystTransformer
participant ResearchObject
User->>MystTransformer: importObject(mystMarkdownString)
MystTransformer->>MystTransformer: extractFrontmatter(mystMarkdownString)
MystTransformer->>MystTransformer: parseAuthors, parseLicense, etc.
MystTransformer->>ResearchObject: Construct ResearchObjectV1
MystTransformer-->>User: Return ResearchObject
User->>MystTransformer: exportObject(ResearchObject)
MystTransformer->>MystTransformer: generateFrontmatter(ResearchObject)
MystTransformer-->>User: Return MyST Markdown string
Poem
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
desci-models/src/transformers/index.ts (1)
4-4
: Remember to keep barrel exports alphabetically sortedMinor nit: keeping the export list ordered helps avoid merge-conflicts and speeds up scanning.
export * from './BaseTransformer'; export * from './MystTransformer'; export * from './RdfTransformer'; export * from './RoCrateTransformer';desci-models/tests/transformers/MystTransformer.test.ts (2)
3-5
: Path to generated TI file may break on Windows / ts-node resolution
ResearchObject-ti
is emitted with a.ts
extension. Importing with no extension relies on Node’s module-resolution plusts-node
’s--transpileOnly
hook; this can be brittle across tooling or ESM.
Consider importing with the explicit path and extension, or re-export the checker fromsrc/index.ts
so the test can justimport { … } from '../../src'
.
71-101
: Front-matter string assertions are order-sensitiveThe tests assert the literal presence of lines like
keywords: [a, b]
which depends on array ordering and whitespace generated byMystTransformer
.
IfgenerateFrontmatter
ever sorts or pretty-prints differently, these tests will fail even though semantics are identical. Prefer parsing the emitted YAML (e.g. withjs-yaml
) and comparing objects.desci-models/src/transformers/MystTransformer.ts (3)
96-101
: Use optional chaining to silence complexity lint & avoid undefined accessThe current guard misses the case where
mainComponent.payload
is present butcontent
isundefined
. Optional chaining is cleaner and flagged by Biome.- const mainComponent = researchObject.components.find((c) => c.id === 'content'); - if (mainComponent && mainComponent.payload) { - content = mainComponent.payload.content || ''; - } + const mainComponent = researchObject.components.find((c) => c.id === 'content'); + content = mainComponent?.payload?.content ?? '';🧰 Tools
🪛 Biome (1.9.4)
[error] 98-98: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
270-272
:generateId
brings non-determinism into tests & serialisationRandom IDs make round-trip diffs noisy and unit tests brittle.
Prefercrypto.randomUUID()
(Node ≥ 14.17) or pass in a deterministic UUID generator that tests can stub.
280-310
: Author parsing silently drops unknown fieldsAny additional author keys (e.g.,
corresponding
,address
) are discarded, even though they’re captured on import.
Either:
- Preserve the raw key/values under an
extra
field, or- Extend
ResearchObjectV1Author
to include them.Otherwise information is lost on round-trip.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
desci-models/package.json
(1 hunks)desci-models/src/transformers/MystTransformer.ts
(1 hunks)desci-models/src/transformers/index.ts
(1 hunks)desci-models/tests/transformers/MystTransformer.test.ts
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
desci-models/src/transformers/MystTransformer.ts (1)
desci-models/src/ResearchObject.ts (3)
ResearchObject
(1-3)ResearchObjectV1
(18-53)ResearchObjectV1Component
(77-94)
🪛 Biome (1.9.4)
desci-models/src/transformers/MystTransformer.ts
[error] 98-98: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build-and-test
🔇 Additional comments (2)
desci-models/package.json (1)
3-3
: Version bump looks consistent with new public APIThe patch-level increment to
0.2.26
is appropriate given that you’re introducing a new, backwards-compatible transformer and tests without altering existing behaviour.desci-models/src/transformers/MystTransformer.ts (1)
246-264
:splitKeyValue
mis-parses values containing “:” that are not URLsCurrent special-case only preserves URLs but still splits on the first colon, so
subtitle: Part I: Introduction
becomessubtitle = 'Part I'
.
Using a YAML parser will fix this; if you keep the helper, split on the first:
only when not inside quotes.
let frontmatter = '---\n'; | ||
if (title) frontmatter += `title: ${title}\n`; | ||
if (description) frontmatter += `description: ${description}\n`; | ||
if (license) frontmatter += `license: ${license}\n`; | ||
|
||
if (keywords && keywords.length > 0) { | ||
frontmatter += `keywords: [${keywords.join(', ')}]\n`; | ||
} | ||
|
||
if (tags && tags.length > 0) { | ||
frontmatter += `tags: [${tags.join(', ')}]\n`; | ||
} | ||
|
||
if (authors && authors.length > 0) { | ||
frontmatter += 'authors:\n'; | ||
for (const author of authors) { | ||
frontmatter += ` - name: ${author.name}\n`; | ||
if (author.orcid) { | ||
frontmatter += ` orcid: ${author.orcid}\n`; | ||
} | ||
if (author.role) { | ||
frontmatter += ` role: ${author.role}\n`; | ||
} | ||
if (author.organizations && author.organizations.length > 0) { | ||
frontmatter += ' affiliations:\n'; | ||
for (const org of author.organizations) { | ||
frontmatter += ` - ${org.name}\n`; | ||
} | ||
} | ||
} | ||
} | ||
|
||
frontmatter += '---\n\n'; | ||
return frontmatter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Front-matter generation doesn’t escape YAML & loses rich metadata
Direct string concatenation will break when titles/descriptions contain:
:
or#
characters- leading
@
,*
, etc. - newlines
and drops optional fields like subtitle
, date
, biblio
, math
, etc. already imported.
Recommend serialising an object and letting the YAML library handle escaping:
-let frontmatter = '---\n';
-/* manual concat */
-frontmatter += '---\n\n';
-return frontmatter;
+import * as yaml from 'yaml';
+const fmObj: any = { title, description, license };
+if (keywords.length) fmObj.keywords = keywords;
+if (tags.length) fmObj.tags = tags;
+if (authors.length) fmObj.authors = authors.map(/* transform to yaml-safe */);
+return `---\n${yaml.stringify(fmObj)}---\n\n`;
Committable suggestion skipped: line range outside the PR's diff.
for (const line of lines) { | ||
// Skip empty lines | ||
if (!line.trim()) continue; | ||
|
||
const indent = line.search(/\S/); | ||
const trimmedLine = line.trim(); | ||
|
||
// Check if we're starting a new list item | ||
if (trimmedLine.startsWith('-')) { | ||
if (indent > listIndent && inList) { | ||
// This is a nested list item | ||
if (!inNestedList) { | ||
inNestedList = true; | ||
nestedListIndent = indent; | ||
if (!currentListItem.organizations) { | ||
currentListItem.organizations = []; | ||
} | ||
} | ||
const nestedItemContent = trimmedLine.slice(1).trim(); | ||
if (nestedItemContent.includes(':')) { | ||
const [key, value] = this.splitKeyValue(nestedItemContent); | ||
currentListItem.organizations.push({ | ||
id: this.generateId(), | ||
name: value, | ||
}); | ||
} else { | ||
currentListItem.organizations.push({ | ||
id: this.generateId(), | ||
name: nestedItemContent, | ||
}); | ||
} | ||
continue; | ||
} | ||
|
||
// If we're not in a list yet, start a new one | ||
if (!inList) { | ||
inList = true; | ||
currentList = []; | ||
listIndent = indent; | ||
} else if (indent === listIndent) { | ||
// Save previous list item if it exists | ||
if (Object.keys(currentListItem).length > 0) { | ||
currentList.push({ ...currentListItem }); | ||
currentListItem = {}; | ||
} | ||
inNestedList = false; | ||
} | ||
|
||
// Parse the list item | ||
const itemContent = trimmedLine.slice(1).trim(); | ||
if (itemContent.includes(':')) { | ||
const [key, value] = this.splitKeyValue(itemContent); | ||
currentListItem[key] = value; | ||
} else { | ||
currentListItem = { name: itemContent }; | ||
} | ||
continue; | ||
} | ||
|
||
// Handle nested properties in list items | ||
if (inList && indent > listIndent && !inNestedList) { | ||
const [key, value] = this.splitKeyValue(trimmedLine); | ||
if (key && value) { | ||
currentListItem[key] = value; | ||
} | ||
continue; | ||
} | ||
|
||
// If we're in a list but this line isn't indented enough, end the list | ||
if (inList && indent <= listIndent) { | ||
// Save the last list item if it exists | ||
if (Object.keys(currentListItem).length > 0) { | ||
currentList.push({ ...currentListItem }); | ||
} | ||
frontmatter[currentKey] = [...currentList]; | ||
inList = false; | ||
inNestedList = false; | ||
currentList = []; | ||
currentListItem = {}; | ||
} | ||
|
||
// Parse key-value pairs | ||
const keyValueMatch = trimmedLine.match(/^([^:]+):\s*(.*)$/); | ||
if (keyValueMatch) { | ||
const key = keyValueMatch[1].trim(); | ||
const value = keyValueMatch[2].trim(); | ||
|
||
// Handle arrays in square brackets | ||
if (value.startsWith('[') && value.endsWith(']')) { | ||
frontmatter[key] = value | ||
.slice(1, -1) | ||
.split(',') | ||
.map((item) => item.trim()); | ||
} else { | ||
frontmatter[key] = value; | ||
currentKey = key; | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hand-rolled YAML parser is fragile – use js-yaml
or yaml
instead
This 100-line loop attempts to replicate YAML parsing (lists, nested lists, scalars, URLs, etc.).
Edge-cases it currently misses:
- quoted strings with colons/comma
- multi-line scalars (
|
or>
) - arbitrary nesting beyond 2 levels
- boolean / number coercion
- duplicate keys
Maintaining this logic will be error-prone and slows the project.
Suggestion:
-import * as yaml from 'yaml'; // at top
-// ...
-const parsed = yaml.parse(frontmatterYaml) ?? {};
-return { frontmatter: parsed, content };
+import * as yaml from 'yaml';
+// ...
+let parsed: any = {};
+try {
+ parsed = yaml.parse(frontmatterYaml) ?? {};
+} catch (err) {
+ throw new Error(`Failed to parse MyST front-matter: ${(err as Error).message}`);
+}
+return { frontmatter: parsed, content };
Committable suggestion skipped: line range outside the PR's diff.
private extractFrontmatter(input: string): { frontmatter: any; content: string } { | ||
const frontmatterRegex = /^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/; | ||
const match = input.match(frontmatterRegex); | ||
|
||
if (!match) { | ||
return { frontmatter: {}, content: input }; | ||
} | ||
|
||
const frontmatterYaml = match[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regex fails on Windows line-endings or missing trailing newline
/^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/
\n
only matches LF, not CRLF.- Front-matter terminator
---
is valid at EOF without a following newline.
Consider a more tolerant expression:
-const frontmatterRegex = /^---\s*\n([\s\S]*?)\n---\s*\n?([\s\S]*)$/m;
+const frontmatterRegex = /^---\s*\r?\n([\s\S]*?)\r?\n---\s*\r?\n?([\s\S]*)$/m;
Or better, delegate parsing to a YAML/front-matter library to avoid hand-rolled edge cases.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
private extractFrontmatter(input: string): { frontmatter: any; content: string } { | |
const frontmatterRegex = /^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/; | |
const match = input.match(frontmatterRegex); | |
if (!match) { | |
return { frontmatter: {}, content: input }; | |
} | |
const frontmatterYaml = match[1]; | |
private extractFrontmatter(input: string): { frontmatter: any; content: string } { | |
- const frontmatterRegex = /^---\s*\n([\s\S]*?)\n---\s*\n([\s\S]*)$/; | |
+ const frontmatterRegex = /^---\s*\r?\n([\s\S]*?)\r?\n---\s*\r?\n?([\s\S]*)$/m; | |
const match = input.match(frontmatterRegex); | |
if (!match) { | |
return { frontmatter: {}, content: input }; | |
} | |
const frontmatterYaml = match[1]; |
Closes #<GH_issue_number>
Description of the Problem / Feature
add support to render ResearchObject metadata as MyST yaml
Summary by CodeRabbit
New Features
Tests
Chores