Skip to content

PSS: Add parsing of .tfmigrate.hcl files to define state migration operations#38526

Open
SarahFrench wants to merge 12 commits intomainfrom
pss/TF-36919-parse-tfmigrate-hcl-files
Open

PSS: Add parsing of .tfmigrate.hcl files to define state migration operations#38526
SarahFrench wants to merge 12 commits intomainfrom
pss/TF-36919-parse-tfmigrate-hcl-files

Conversation

@SarahFrench
Copy link
Copy Markdown
Member

@SarahFrench SarahFrench commented May 5, 2026

Closes https://hashicorp.atlassian.net/browse/TF-36919

This PR enables parsing .tfmigrate.hcl files and new blocks:

  • state_store_provider (like required_providers but enforces a single provider and the constraint has to be a specific value)
  • migrate_from_state_store (like a state_store block)
  • migrate_from_backend (like a backend block)

This PR also implements validation in the parser like:

  • No duplicate blocks allowed - and enforcing this across 1+ files.
  • The migrate_from_state_store and migrate_from_backend blocks are mutually exclusive.
    • Similarly, migrate_from_backend and state_store_provider are mutually exclusive.
  • If using migrate_from_state_store, state_store_provider must also be present.
  • Providers described in migrate_from_state_store and state_store_provider must match.
  • A state_store_provider block may only describe a single version (pin to a version, no range of versions) of a single provider (only one entry).

The state migrate command will eventually use this logic, and this PR hasn't connected those parts of the codebase yet. When that does happen I expect that calling code would populate the ProviderSupplyMode data.

Target Release

1.16.x

Rollback Plan

  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

CHANGELOG entry

  • This change is user-facing and I added a changelog entry.
  • This change is not user-facing.

@SarahFrench SarahFrench added the no-changelog-needed Add this to your PR if the change does not require a changelog entry label May 5, 2026
@SarahFrench SarahFrench force-pushed the pss/TF-36919-parse-tfmigrate-hcl-files branch 3 times, most recently from e6b00a6 to 7105ee7 Compare May 6, 2026 10:11
…les including `migrate_from_backend` blocks.
…er` blocks across multiple .tfmigrate.hcl files
…s missing its counterpart `migrate_from_state_store` block
…migrate.hcl files. Add tests for error paths and happy path.
…cross migrate_from_state_store and state_store_provider blocks. When they match, copy provider data across to description of the state store.

Copying that data across is similar to how we use methods called `resolveStateStoreProviderType` when paring config (via NewModule or FinalizeConfig functions).
@SarahFrench SarahFrench force-pushed the pss/TF-36919-parse-tfmigrate-hcl-files branch from 7105ee7 to 657a47f Compare May 6, 2026 10:40
…single provider. Also, add early returns to avoid repeated or incorrect error diagnostics.

Incorrect errors can occur if an error prevents a block being processed, and then downstream code reacts to the lack of that data with another error. That would be misleading (e.g. an error in a state_store_provider block meaning that later an incorrect error is raised reporting a missing state_store_provider block, when it is incorrect not missing).
Copy link
Copy Markdown
Member Author

@SarahFrench SarahFrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some self-review/contextual comments to help others with review.

Comment on lines +146 to +149
} else {
// They match, so copy across relevant data.
ss.ProviderAddr = ssp.Type
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This serves a similar purpose as this code in NewModule:

if mod.StateStore != nil {
diags = append(diags, mod.resolveStateStoreProviderType()...)
}

return file, diags
}

func decodeStateStoreProviderBlock(block *hcl.Block) (*RequiredProvider, hcl.Diagnostics) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this method was adapted from the decodeRequiredProvidersBlock function:

func decodeRequiredProvidersBlock(block *hcl.Block) (*RequiredProviders, hcl.Diagnostics) {
attrs, diags := block.Body.JustAttributes()
if diags.HasErrors() {
return nil, diags
}
ret := &RequiredProviders{
RequiredProviders: make(map[string]*RequiredProvider),
DeclRange: block.DefRange,
}
for name, attr := range attrs {
rp := &RequiredProvider{
Name: name,
DeclRange: attr.Expr.Range(),
}

In decodeStateStoreProviderBlock we don't have multiple attributes (i.e don't have multiple local names with paired objects). We return early if there isn't 1 entry. After that there's no need for looping but the code looks very similar to that original code.

Comment on lines +232 to +257
// Assert we have a single constraint, for a specific version
if len(constraints) != 1 {
diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "Invalid version constraint",
Detail: "The version attribute inside the state_store_provider block must specify a single, specific version (e.g. \"= 1.0.0\").",
Subject: kv.Value.Range().Ptr(),
})
return nil, diags
}

// A constraint to use v1.2.3 could have an = operator or no operator at all.
constraintStr = strings.TrimPrefix(constraintStr, "=") // Remove a preceding `=`, if it exists.
constraintStr = strings.TrimSpace(constraintStr) // There might have been whitespace between the operator and the version.

_, err = versions.ParseVersion(constraintStr)
if err != nil {
// Errors indicate that the constraint wasn't a specific version.
diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: `Non-specific version constraint in "state_store_provider" configuration block`,
Detail: "The version constraint defined in a state_store_provider block must specify a single, specific version (e.g. \"= 1.0.0\", or \"1.0.0\").",
Subject: kv.Value.Range().Ptr(),
})
return nil, diags
}
Copy link
Copy Markdown
Member Author

@SarahFrench SarahFrench May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I figured it was worth reusing existing types instead of creating a new type that used Version instead of VersionConstraint, so this method returns a RequiredProvider pointer.

Due to this we still use a version constraint, but I opted for validation that asserts the constraint is pinning to an exact value. The hashicorp/go-version representation of the version constraint doesn't allow calling code to inspect data and ask 'is this constraint using an = operator (implied or explicit)?'. Because of that, I landed on this approach where I check for an = operator (implied or explicit) and then attempt to parse the string as a version, and use successful parsing as a sign that the constraint was pinning a single version.

If there are other/better approaches to this I'm keen to get feedback. An alternative is updating hashicorp/go-version but I think that decision shouldn't be taken lightly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need to reuse the interface, i.e. pass a constraint downstream but I think that it doesn't mean we need to leak that all the way through to the user?

I mean, if we eventually parse a valid go-version.Version then we could construct an artificial = constraint from that, could we not?

I don't think we need to be allowing an explicit = from the end user?

@SarahFrench SarahFrench marked this pull request as ready for review May 6, 2026 12:17
@SarahFrench SarahFrench requested a review from a team as a code owner May 6, 2026 12:17
@radeksimko radeksimko changed the title PSS: Add parsing of .tfmigrate.hcl files to define state migration operations PSS: Add parsing of .tfmigrate.hcl files to define state migration operations May 7, 2026
var diags hcl.Diagnostics

if file == nil {
panic("appendStateMigrationFile receives a nil *StateMigrationFile pointer")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
panic("appendStateMigrationFile receives a nil *StateMigrationFile pointer")
panic("appendStateMigrationFile received a nil *StateMigrationFile pointer")

diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: `Duplicate "state_store_provider" configuration block`,
Detail: fmt.Sprintf(`A "state_store_provider" block was already declared at %s. Only one of these blocks can be include in a module's state migration files.`, m.StateMigrationInstructions.StateStoreProvider.DeclRange),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Detail: fmt.Sprintf(`A "state_store_provider" block was already declared at %s. Only one of these blocks can be include in a module's state migration files.`, m.StateMigrationInstructions.StateStoreProvider.DeclRange),
Detail: fmt.Sprintf(`A "state_store_provider" block was already declared at %s. Only one of these blocks can be included in a module's state migration files.`, m.StateMigrationInstructions.StateStoreProvider.DeclRange),

Comment on lines +666 to +677
backendConflictStateStoreProviderDiag := &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: `Invalid combination of "migrate_from_backend" and "state_store_provider"`,
Detail: `The "state_store_provider" block can only be used in combination with "migrate_from_state_store" blocks. Either remove the unused "state_store_provider" block, or replace the "migrate_from_backend" block with a "migrate_from_state_store" block.`,
// Sourceless because we don't know which block isn't needed.
}
backendConflictMigrateFromStateStoreDiag := &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: `Invalid combination of "migrate_from_backend" and "migrate_from_state_store"`,
Detail: `A configuration cannot include both "migrate_from_backend" and "migrate_from_state_store" blocks. Remove one of these blocks, and the remaining block should describe where your existing state should be migrated from.`,
// Sourceless because we don't know which block isn't needed.
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, not a hard suggestion 😅

I see in this function we're loading each file individually, doing validation that would prevent/conflict with loading a file (i.e. checking for duplicates), and then progressively validating invalid combinations as we load each file.

Since we already have some "post-load" validation in LoadConfigDir, can we make this function easier to read by just moving the "invalid combination" checks into that function. Then you have "file loading validation" (checking for duplicates) and "entire module validation" separated.

return mod, diags
}

if mod != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for module to be nil here? If so I'd imagine that line 90 would be at risk for a panic 😅

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell by reading through NewModule it is not possible:

func NewModule(primaryFiles, overrideFiles []*File) (*Module, hcl.Diagnostics) {
var diags hcl.Diagnostics
mod := &Module{
ProviderConfigs: map[string]*Provider{},
ProviderLocalNames: map[addrs.Provider]string{},
Variables: map[string]*Variable{},
Locals: map[string]*Local{},
Outputs: map[string]*Output{},
ModuleCalls: map[string]*ModuleCall{},
ManagedResources: map[string]*Resource{},
EphemeralResources: map[string]*Resource{},
DataResources: map[string]*Resource{},
ListResources: map[string]*Resource{},
Checks: map[string]*Check{},
ProviderMetas: map[addrs.Provider]*ProviderMeta{},
Tests: map[string]*TestFile{},
Actions: map[string]*Action{},
}
// Process the required_providers blocks first, to ensure that all
// resources have access to the correct provider FQNs
for _, file := range primaryFiles {
for _, r := range file.RequiredProviders {
if mod.ProviderRequirements != nil {
diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "Duplicate required providers configuration",
Detail: fmt.Sprintf("A module may have only one required providers configuration. The required providers were previously configured at %s.", mod.ProviderRequirements.DeclRange),
Subject: &r.DeclRange,
})
continue
}
mod.ProviderRequirements = r
}
}
// If no required_providers block is configured, create a useful empty
// state to reduce nil checks elsewhere
if mod.ProviderRequirements == nil {
mod.ProviderRequirements = &RequiredProviders{
RequiredProviders: make(map[string]*RequiredProvider),
}
}
// Any required_providers blocks in override files replace the entire
// block for each provider
for _, file := range overrideFiles {
for _, override := range file.RequiredProviders {
for name, rp := range override.RequiredProviders {
mod.ProviderRequirements.RequiredProviders[name] = rp
}
}
}
for _, file := range primaryFiles {
fileDiags := mod.appendFile(file)
diags = append(diags, fileDiags...)
}
for _, file := range overrideFiles {
fileDiags := mod.mergeFile(file)
diags = append(diags, fileDiags...)
}
diags = append(diags, checkModuleExperiments(mod)...)
// Generate the FQN -> LocalProviderName map
mod.gatherProviderLocalNames()
if mod.StateStore != nil {
diags = append(diags, mod.resolveStateStoreProviderType()...)
}
return mod, diags
}

So I think that nil check can be removed from here.

AFAICT this applies to the same check above for test files and query files.

// If there are errors they may be duplicated below, so return early.
// We return an incomplete module representation.
if diags.HasErrors() {
mod.SourceDir = path
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is here because of the logic at the bottom of this function:

if mod != nil {
	mod.SourceDir = path
}

Can these repeated lines be removed when returning diagnostics and we just move the piece of code above earlier in LoadConfigDir?

Copy link
Copy Markdown
Member

@radeksimko radeksimko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to suggest a few more changes to the schema to keep the new language closer to the existing one in *.tf

required_provider {
  // ...
}

migrate_from {
  state_store {
    // ...
  }

  backend {
    // ...
  }
}

or

required_provider {
  // ...
}

from {
  state_store {
    // ...
  }

  backend {
    // ...
  }
}

I think this could make it a bit easier to understand for users who're already familiar with the language? 🤔 Also it would reflect the reality where we parse both state_store and backend using all the existing decoder logic, i.e. we expect the same structure there.

var diags hcl.Diagnostics

if file == nil {
panic("appendStateMigrationFile receives a nil *StateMigrationFile pointer")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the explicit panic here is better than letting it panic further down on its own. Usually the native Go tracelog and messaging should be good enough that we don't need to replace it with our own?

return mod, diags
}

if mod != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell by reading through NewModule it is not possible:

func NewModule(primaryFiles, overrideFiles []*File) (*Module, hcl.Diagnostics) {
var diags hcl.Diagnostics
mod := &Module{
ProviderConfigs: map[string]*Provider{},
ProviderLocalNames: map[addrs.Provider]string{},
Variables: map[string]*Variable{},
Locals: map[string]*Local{},
Outputs: map[string]*Output{},
ModuleCalls: map[string]*ModuleCall{},
ManagedResources: map[string]*Resource{},
EphemeralResources: map[string]*Resource{},
DataResources: map[string]*Resource{},
ListResources: map[string]*Resource{},
Checks: map[string]*Check{},
ProviderMetas: map[addrs.Provider]*ProviderMeta{},
Tests: map[string]*TestFile{},
Actions: map[string]*Action{},
}
// Process the required_providers blocks first, to ensure that all
// resources have access to the correct provider FQNs
for _, file := range primaryFiles {
for _, r := range file.RequiredProviders {
if mod.ProviderRequirements != nil {
diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "Duplicate required providers configuration",
Detail: fmt.Sprintf("A module may have only one required providers configuration. The required providers were previously configured at %s.", mod.ProviderRequirements.DeclRange),
Subject: &r.DeclRange,
})
continue
}
mod.ProviderRequirements = r
}
}
// If no required_providers block is configured, create a useful empty
// state to reduce nil checks elsewhere
if mod.ProviderRequirements == nil {
mod.ProviderRequirements = &RequiredProviders{
RequiredProviders: make(map[string]*RequiredProvider),
}
}
// Any required_providers blocks in override files replace the entire
// block for each provider
for _, file := range overrideFiles {
for _, override := range file.RequiredProviders {
for name, rp := range override.RequiredProviders {
mod.ProviderRequirements.RequiredProviders[name] = rp
}
}
}
for _, file := range primaryFiles {
fileDiags := mod.appendFile(file)
diags = append(diags, fileDiags...)
}
for _, file := range overrideFiles {
fileDiags := mod.mergeFile(file)
diags = append(diags, fileDiags...)
}
diags = append(diags, checkModuleExperiments(mod)...)
// Generate the FQN -> LocalProviderName map
mod.gatherProviderLocalNames()
if mod.StateStore != nil {
diags = append(diags, mod.resolveStateStoreProviderType()...)
}
return mod, diags
}

So I think that nil check can be removed from here.

AFAICT this applies to the same check above for test files and query files.

Comment on lines +41 to +42
MigrateFromStateStore *StateStore
MigrateFromBackend *Backend
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
MigrateFromStateStore *StateStore
MigrateFromBackend *Backend
FromStateStore *StateStore
FromBackend *Backend

Considering we're already inside state_migrate_file and StateMigrationFile I think we could drop the Migrate here.

Comment on lines +218 to +246
constraintStr := constraint.AsString()
constraints, err := version.NewConstraint(constraintStr)
if err != nil {
// NewConstraint doesn't return user-friendly errors, so we'll just
// ignore the provided error and produce our own generic one.
diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "Invalid version constraint",
Detail: "This string does not use correct version constraint syntax.",
Subject: kv.Value.Range().Ptr(),
})
return nil, diags
}

// Assert we have a single constraint, for a specific version
if len(constraints) != 1 {
diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "Invalid version constraint",
Detail: "The version attribute inside the state_store_provider block must specify a single, specific version (e.g. \"= 1.0.0\").",
Subject: kv.Value.Range().Ptr(),
})
return nil, diags
}

// A constraint to use v1.2.3 could have an = operator or no operator at all.
constraintStr = strings.TrimPrefix(constraintStr, "=") // Remove a preceding `=`, if it exists.
constraintStr = strings.TrimSpace(constraintStr) // There might have been whitespace between the operator and the version.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we parsing it as a constraint at all if we're expecting an exact version? Could we not just parse it via go-version NewVersion()?

Comment on lines +232 to +257
// Assert we have a single constraint, for a specific version
if len(constraints) != 1 {
diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: "Invalid version constraint",
Detail: "The version attribute inside the state_store_provider block must specify a single, specific version (e.g. \"= 1.0.0\").",
Subject: kv.Value.Range().Ptr(),
})
return nil, diags
}

// A constraint to use v1.2.3 could have an = operator or no operator at all.
constraintStr = strings.TrimPrefix(constraintStr, "=") // Remove a preceding `=`, if it exists.
constraintStr = strings.TrimSpace(constraintStr) // There might have been whitespace between the operator and the version.

_, err = versions.ParseVersion(constraintStr)
if err != nil {
// Errors indicate that the constraint wasn't a specific version.
diags = append(diags, &hcl.Diagnostic{
Severity: hcl.DiagError,
Summary: `Non-specific version constraint in "state_store_provider" configuration block`,
Detail: "The version constraint defined in a state_store_provider block must specify a single, specific version (e.g. \"= 1.0.0\", or \"1.0.0\").",
Subject: kv.Value.Range().Ptr(),
})
return nil, diags
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need to reuse the interface, i.e. pass a constraint downstream but I think that it doesn't mean we need to leak that all the way through to the user?

I mean, if we eventually parse a valid go-version.Version then we could construct an artificial = constraint from that, could we not?

I don't think we need to be allowing an explicit = from the end user?

Comment on lines +309 to +317
func decodeMigrateFromStateStoreBlock(block *hcl.Block) (*StateStore, hcl.Diagnostics) {
// migrate_from_state_store blocks are essentially the same as state_store blocks, so reuse logic.
return decodeStateStoreBlock(block)
}

func decodeMigrateFromBackendBlock(block *hcl.Block) (*Backend, hcl.Diagnostics) {
// migrate_from_backend blocks are essentially the same as backend blocks, so reuse logic.
return decodeBackendBlock(block)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need these "proxy" functions and can't just call the existing ones directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog-needed Add this to your PR if the change does not require a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants