Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform modules source: variable support in source for git username #23948

Open
gscuderi opened this issue Jan 24, 2020 · 15 comments
Open

terraform modules source: variable support in source for git username #23948

gscuderi opened this issue Jan 24, 2020 · 15 comments

Comments

@gscuderi
Copy link

Hello terraform team, in working on a project I realized there is a feature which might be very useful within modules source, which is to support variable support for git source.

I know this has been discussed in the past already, and that this is not currently supported, I went through the various threads, anyway there was no mention about the use case I'm going to describe which is why I decided to open the feature request anyway.

Let's imagine I have a module in gerrit server, or any other git service on which you need to specify your user account in the source URL.

To use such a module, I will need to do something like:

module "my_module" {
  source = "git::ssh://[email protected]:29418/repo.git//modules/mymodule?ref=v1.0.0"

  variable = "..."
  ....
}

The need to specify in the source url myuser upfront is what is creating issue here, since this is different for each users and cannot be generalized.

As I have no way to override the source URL, it means when I develop the scripts I need to put my username, my colleagues has to change it and put theirs, and if I'm using Jenkins for the automation we ALL need to remember to change it back to the one used from Jenkins before submitting the code.

Ideally I should be able to use override.tf file and specify my own username (or even the entire URL would be ok), so that we do not risk to forget to change it back to the gerrit CI user after working on the code (which is something that happens way TOO often!)

So, at the end, having the possibility to do something like:

module "my_module" {
  source = "git::ssh://${var.gerrit_user}@gerrit.server:29418/repo.git//modules/mymodule?ref=v1.0.0"

  variable = "..."
  ....
}

Or maybe:

module "my_module" {
  source = "${var.my_module_git_source_url}"

  variable = "..."
  ....
}

Is what I'm looking for.

Any other ways to achieve the same objective is perfectly fine, I just need to stop changing it manually since this is way too fragile and prone to human error, to be honest this exactly what I'm trying to prevent by using IaaC and automation!

Thank you for your kind consideration & help,
Giordano

@apparentlymart
Copy link
Contributor

Hi @gscuderi! Thanks for sharing this use-case.

It might interest you to know that Git itself has a feature that addresses a variant of this use-case: turning references to unauthenticated URLs that might appear in locations like Terraform configuration, npm modules, Go modules, etc into authenticated ones with a username of your choice.

For example, in my .gitconfig I have the following setting:

[url "[email protected]:"]
	insteadOf = https://github.com/

This tells Git that whenever I (or some other software such as Terraform on my behalf) runs git clone https://github.com/... it should instead use [email protected]:... as the remote address. This means that I can use Terraform Modules, Go modules, npm modules, etc that contain unpersonalized GitHub repository references like https://github.com/example/foo and make authenticated requests to those over SSH instead.

Perhaps in order to smooth your current workflow you could standardize on a particular placeholder user to commit in your configurations -- the "gerrit CI user" you mentioned, maybe -- and then each developer can add a rule like the above to tell Git to use your own username instead:

[url "ssh://[email protected]:29418/"]
	insteadOf = ssh://[email protected]:29418/

I believe that would then allow you to work with your Terraform configurations without any direct modification, and let Git itself do the translation to a more appropriate username on your development systems.

@gscuderi
Copy link
Author

I appreciate the work-around, unfortunately it won't work very well on my case...

I'm using Cloud Jenkins slaves on-demand which are configured through a script when they are needed, and then destroyed when unused. A server-wide setup would require to hardcode the ci user in the auto-provisioning script, this is not good.

Doing it on the single repository is even worst, as it requires a settings in the Jenkins declarative pipeline exposing the ci user on each single project repository.
Then imagine if tomorrow I need to change the ci user, I'll have to ask each single project to make the change in their repository and previous versions will not work anymore which is a bad thing!

Frankly speaking would be much better having the feature on terraform, I'm sure you'll find many other use cases in which custom setup on the git repository won't work very well, especially since you always combine multiple tools together to achieve a full automation.

@apparentlymart
Copy link
Contributor

Hi @gscuderi! Thanks for sharing that additional information.

In the interests of gathering as much context as possible about this problem so we can weigh various options, I have a further question:

Terraform is currently following the same practices as several other language ecosystems such as the ones I mentioned in my earlier comment (Go and npm) of allowing literal Git URLs for dependencies without any means to override them or customize them. I'm curious to know if the Gerrit server you mentioned here is used exclusively for Terraform, or if you are using it with some other ecosystems that also support direct Git URLs for dependencies, and if so if any of those systems have a good solution to the problem of swapping out different usernames that we could take inspiration from in Terraform.

In the ecosystems I'm aware of it's a common constraint that dependencies are expressed totally statically because, as with Terraform today, the dependency resolution and installation is a separated subsystem (or possibly even a separated system) that is used prior to "real" execution of the program, so I'd love to hear about any ecosystems you know about that you think have done a good job of supporting your use-case here, without relying on the Git feature I described in my previous comment.

If you don't have any such examples in mind, then no worries! I just think it's good to learn from prior art if possible, so we have a few different options to weigh.

@mgrotheer
Copy link

I'm struggling right now in trying to pass in specific credentials to the Terraform Module source (private repo) in our GitLab environment. One thing we have looked at doing is leveraging a GitLab deploy token but I'm not sure how we could do this since we wouldn't want to hard code the credentials in. Any other way I've tried to do it results in "access denied" error. Passing in variables to the Module source name would be helpful

@rlove
Copy link

rlove commented Nov 22, 2020

I am fighting this as well, We have several private modules with references to other modules. The private modules are stored in GitHub.

We use GitHub Workflow Actions to run terraform.

If I check out my module with actions/checkout@v2 and a PAT (Personal Access Token) that
has access to all the other repositories that contain the referenced modules. The git config for that specific repository is changed to allow future operations on https. It will even rewrite git submodules references from ssh to https.

When I call terraform init in and I have references to a module via HTTPS Git protocol I get the following message:

Could not download module "tags" (tags.tf:1) source code from
"github.com/orgname/example-tags?ref=v0.2": error downloading
'https://github.com/orgname/example-tags.git?ref=v0.2': /usr/bin/git
exited with 128: Cloning into '.terraform/modules/tags'...
fatal: could not read Username for 'https://github.com': No such device or
address 

It's even more interesting when you have a referenced module that uses SSH and HTTPS protocol for Git to other modules, which are sometimes out of your direct control.

None of this is typically noticed locally I have both SSH Keys and Credential Helper configured for HTTPS with git.
Which is not an option for a Self Hosted Runner. As it allows other builds to use theses when they should not have the rights to do it.

So my needs would be for an ability to optionally pass a PAT on terraform CLI (or other similar mechanisms), and it will use it when checking out any GitHub references that use HTTPS.

A workaround is to never use HTTPS and only use ssh.

Another option is to be able to set customer headers in HTTPS URL, so the token could be download from a release page. Or another secure website managed by header tokens.

@GaTechThomas
Copy link

Same need here, but for Azure DevOps.

@daveth
Copy link

daveth commented Jul 28, 2021

Hit a similar use case here too, but with a GCS bucket used as the module source. Our CI environment owns such a bucket, and is parameterised and able to be deployed to a bunch of independent environments, but all other infrastructure that needs the TF modules in one of those registry-buckets end up having the GCS location hard coded since we can't have variables in module sources.

Could a registry block work for this? You could define it in the same place as a backend, tag any modules that need it with a registry attribute referring to the one you just defined, and when terraform init runs it goes and grabs the modules from the appropriate registries.

Pseudo-HCL:

terraform {
  registry "gcs" {
    name = "my-gcs-registry"
    storage_location = "gs://some-bucket/sub-dir"
  }

  backend "gcs" {
    // ...
  }
}

module "blah" {
  registry = my-gcs-registry    // sorta like specifying a provider to use?
  source = "some-path/blah"     // for a gcs registry, would just append the path (gs://some-bucket/sub-dir/some-path/blah)
  
  // ...
}

@ncdmr
Copy link

ncdmr commented Dec 13, 2021

Same need here, we'd like to have our gitlab URI as a variable so we have move flexibility in case of domain changes.

@Eugene-Trufanov
Copy link

Agree, would be very useful for many purposes. Currently have to use Terragrunt or sed in buildspec files.

@apparentlymart
Copy link
Contributor

Hi all,

The current status of this issue is that we're looking for examples of other language ecosystems that have solved this problem in a different way than Terraform has and thus can better meet the use-case. Currently Terraform is consistent with various other language ecosystems we know of which support installation directly from Git repositories, and so the git configuration approach I shared above is one that is typically recommended for other similar systems like npm in the NodeJS ecosystem.

We understand that there is friction here but in order to make further progress we need to understand what makes Terraform different than the other systems with the same design (that is: dependencies are specified statically rather than dynamically, and are installed prior to runtime), why the git configuration solution can work for those ecosystems but not for Terraform, and ideally examples of other ecosystems which have a different solution to this problem.

We don't expect to implement something in Terraform that is entirely different from any other programming language ecosystem, because we aim to be consistent with other languages so that as much as possible the same processes and practices that work for other languages can work with Terraform too.

@rlisnoff
Copy link

Hey all, I wanna add a +1 here and my current reasoning for wanting this feature.

Our terraform modules are stored in s3, but in order to meet some compliance standards our system has to tolerate a region outage in AWS. Though s3's namespace is global, the actual data is stored regionally, so we have a replicated bucket in another region that will also contain our terraform modules. In the event of a disaster, we want the terraform files that consume these modules to be able to deploy into the disaster recovery region, but since we can't reference variables in the source parameter, we are stuck with creating a repeat module call with the source pointing to the other s3 bucket and coalescing these values later. It'd be a heck of a lot more DRY to have one module defined that pulls its source in a disaster-resilient way.

If there are alternate solutions here I'm interested in hearing them, we've just been unable to come up with any that fit our needs.

@geoffo-dev
Copy link

Apologies @apparentlymart - only just saw you responded when issue #30546 closed! Apologies...! Thank you for taking the time to reply!

I think the approach you suggested will not work for our use case sadly - that said I am also not sure how best to attack it when you compare it to other languages.

So I think I have been trying to wrap my head around the issue as I didnt really understand why it couldnt just be a string... but I forgot that as part of the initial validation/init, it needs to properly resolve these which I guess it needs to do before any variable resolution.

The only ones I am familiar with would resolve these initially and then use those for the build... Which I guess is what terraform is doing! Unless we could specify dependencies/sources in different files/maps.

@apottere
Copy link

apottere commented Apr 5, 2022

@apparentlymart I know in this quote you're specifically talking about how terraform handles git authentication and not all variables in the source, but per your comment on #30546 I was redirected here and wanted to highlight how this doesn't hold for all use cases:

The current status of this issue is that we're looking for examples of other language ecosystems that have solved this problem in a different way than Terraform has and thus can better meet the use-case. Currently Terraform is consistent with various other language ecosystems we know of which support installation directly from Git repositories, and so the git configuration approach I shared above is one that is typically recommended for other similar systems like npm in the NodeJS ecosystem.

A huge point of friction for my current org and my past org is that there's no way to specify a module dependency for an entire project/module, and if we're using git refs as a module version it needs to be copied into every single module.source we write. We have a monorepo for all of our shared terraform modules that we tag with semver, so this version gets updated pretty frequently.

For our use-case, terraform differs significantly from other languages - for example take a simple NodeJS project. In NodeJS versions are declared once, in package.json, and then the dependencies can be referenced without a version later (import { ... } from '@scope/pkg/subpkg'). Imagine if you had to declare the dependency in each import in each file (import { ... } from '@scope/pkg/[email protected]'), it would make maintaining a NodeJS project with dependencies a nightmare.

Edit: Note that I'm not suggesting that variables in the source are the only solution to this problem, but it would be one of the solutions.

@apparentlymart
Copy link
Contributor

apparentlymart commented Apr 5, 2022

Thanks for sharing that difference, @apottere.

My understanding is that in the NodeJS ecosystem each package has one package.json file which specifies in a single location which version of each dependency to use. In that model, each package can specify only a single version constraint for each other package it depends on. Furthermore, in the case of dependencies that are not published in the registry the package.json file also serves to create a local mapping table from registry-like names to other sources such as Git URLs.

Terraform intentionally allowed a single module to call multiple versions of the same other module, and maintainers make use of that capability in situations where they want to roll out a new version over multiple steps: add a new module block using the new version while keeping the old one, then terraform apply to temporarily use both, then remove the old module block and terraform apply to remove the old one.

From this NodeJS example I think we can learn two main things:

  • It is convenient to have a way to centrally specify a default version of some external module package to use when a module block doesn't specify a version constraint of its own, which would then simplify the common case of a module calling the same module multiple times with the intent of using many instances of the same module.
  • It is convenient to be able to declare registry-module-style "aliases" for remote sources such as Git URLs, similarly to centralize the actual URL in a single place in a module and use it only by the declared symbolic name elsewhere in that module.

Terraform currently has no direct analog to package.json; as you observed, each module block is totally self-contained today and does not rely on any other information declared in the module.

There are some things that NodeJS and Terraform seem to have in common, though:

  • There is a registry protocol which allows adding some indirection between the dependency declarations and their physical locations. In Terraform's case, that's the module registry protocol, which can hide the implementation detail of where exactly a module package is stored by keeping that location in a remote index.
  • When a dependency isn't published in a registry, the author must specify an exact location for it in the dependency metadata, which is then used only during the dependency installation process.
  • There is no facility for dynamically selecting a version at runtime; version constraints must be specified as constant values. The version selections and installations happen in a separate step prior to runtime. (npm install or similar for NodeJS, terraform init for Terraform.)
  • The package dependencies declared by one package don't affect the declarations made by another package in the same program. Each package must declare its own dependencies. (NodeJS "package" corresponds with Terraform "module" for the sake of this comparison.)

Thanks for sharing this example!

@ayashjorden
Copy link

ayashjorden commented Sep 21, 2022

Hi @apparentlymart ,
Similar to @rlisnoff , our platform is distributed and we're evaluating different solutions.
Here is my comment on another issue:

Hi all,
In my use-case, I want to pull modules from configurable location, mostly like same-region to avoid cross region traffic.

So source = "s3::https://s3-${var.region}.amazonaws.com/artifacts-${var.region}-dev/common-aws.1.0.0.tar.xz" , makes sense to me that should be supported.

Can anyone link here to the area in the code :

  • Where I can specify input arguments? (I guess that's in the main TF binary not a provider, I'd like to experiment
  • Where is the init functionality happens so I can try to support -var or -var-file ?

My logic tells me that input variables or var-files would be similar if not identical to the input of the rest of the configuration. in most cases.

Should that not be fruitful:

  • Use the registry protocol to reply a header with the region-local s3 URL
  • Really don't like it, but we might resort to bundle all dependencies in a code-bundle

In anyway, even if not, experimenting with that would support the discussion...
Best,
Jordan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests