Skip to content

Implement configurable symlink handling in lakectl #9212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Jul 24, 2025

Conversation

nopcoder
Copy link
Contributor

@nopcoder nopcoder commented Jun 22, 2025

Close #9234

Symlink support for lakectl.
Enabled in configuration, the default is the current behaviour.

lakectl config

local:
    symlink_support: true

When enabled it will upload and download symlinks.

  • A local symlink is represented by "::lakefs::symlink-target" metadata and 0 size object.
  • The default behaviour kept as lakectl works today - no support in symlinks.
  • lakectl Download/Upload/Local commands will create local/remote symlinks and support the local SkipNonRegularFiles flag.

@nopcoder nopcoder self-assigned this Jun 22, 2025
@nopcoder nopcoder added include-changelog PR description should be included in next release changelog lakectl-local labels Jun 24, 2025
Copy link

github-actions bot commented Jun 24, 2025

📚 Documentation preview at https://pr-9212.docs-lakefs-preview.io/

(Updated: 7/24/2025, 11:35:28 AM - Commit: 29f6398)

@nopcoder nopcoder force-pushed the task/lakectl-symlink branch from ded79df to 15b6698 Compare June 26, 2025 04:57
@nopcoder nopcoder marked this pull request as ready for review June 26, 2025 11:14
@nopcoder nopcoder requested review from a team, N-o-Z and Copilot June 26, 2025 11:14
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements configurable symlink support for lakectl by adding new configuration options, updating the sync logic for uploads/downloads, and extending relevant tests.

  • Introduces a new configuration flag for symlink support (default disabled)
  • Updates sync, diff, and CLI commands to conditionally handle symlinks
  • Adds extensive tests (unit and integration) to verify correct behavior with and without symlink support

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/local/sync_test.go Adds tests for downloading and uploading symlinks based on configuration
pkg/local/sync.go Implements symlink handling in download and upload functions
pkg/local/diff_test.go Includes new test cases for symlink-related diff scenarios
pkg/local/diff.go Updates file diff logic to include/exclude symlinks as needed
pkg/local/config.go Adds the "SymlinkSupport" configuration option and updated comments
pkg/api/helpers/download.go Extends the downloader to optionally create symlinks on download
pkg/api/apiutil/constant.go Introduces new constants for symlink metadata keys
esti/lakectl_local_test.go Adds integration tests for local commands when using symlink support
cmd/lakectl/cmd/* Updates root, local, fs_upload, and fs_download commands to propagate the symlink configuration

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANKS! I think this implements a very dangerous requirement. I would really like to scope it to reduce the risk.

PreSign bool
HTTPClient *http.Client
PartSize int64
SymlinkSupport bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document this field: what does it do? (I know it "adds support for symlinks if true", I just don't know what "support for symlinks" means...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of pkg/local/config.go

tracker.UpdateTotal(0)
tracker.MarkAsDone()
}
return os.Symlink(symlinkTarget, dst)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incredibly dangerous, and I am blocking. If this is the user requirement, then we need to refine that requirement.

Sample attack

I add somewhere deep in the repository a symlink from .../write/target/_delta_log/_last_checkpoint to /Users/nopcoder/.profile. Now I convince you to copy some deltalake table over into .../write/target. You can no longer log in.

I can play this game whenever I can predict the name of a file. The destination file may or may not exist, depending on the local program I expect you to run.

What do other programs do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for all symlinks, we create them like any other application (example: tar) and we do not follow them.
I understand that creating such a symlink can help an attacker, which is also true if I will upload a virus to lakefs and ask you to sync and run it.

As the user needs to opt-in, we can warn about it in the docs.
We can enable "safe_symlinks" where we verify that the links will not cross the root directory used to download/sync. This should reduce the risk for users who like to enable this feature.
But I don't think we need to do it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course any program that creates symlinks is dangerous. lakectl local is much more dangerous, because an expected and encouraged usage pattern is for the user to edit the files they receive and reupload them.
This is rarely done by users of tar, for instance.
If we do this at all, we need to document safety properties from day 1. We also need to be very careful about what lakectl local will actually do in various "edge case" scenarios. I think we need a careful design, that lists these scenarios - I can think of multiple nasty results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this at all, we need to document safety properties from day 1. We also need to be very careful about what lakectl local will actually do in various "edge case" scenarios. I think we need a careful design, that lists these scenarios - I can think of multiple nasty results.

@nopcoder did we address this comment?

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall code looks good!
Add some minor comments.
In general I think we need to have better definition of how we want to support symlinks in lakeFS and what that means in terms of boundaries.
While this makes more sense in operations such as fs download/upload, it raises some concerns in local commands.
For example - what does it mean to support a symlink in a synced directory in lakectl local?
Today local is scoped to the local directory cloning a remote path in lakeFS. What does it mean to have a symlink to a path outside this local path? How does it make sense when this lakeFS path is cloned under different systems??
This is of course, in addition to @arielshaqed's worrying security concerns
Requesting changes since I believe these should be addressed before we introduce these changes

Comment on lines 14 to 15
"github.com/treeverse/lakefs/pkg/api/apiutil"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"github.com/treeverse/lakefs/pkg/api/apiutil"
"github.com/treeverse/lakefs/pkg/api/apiutil"

Comment on lines 9 to 10
SymlinkMetadataKey = LakeFSMetadataPrefix + "symlink-target"
ClientMtimeMetadataKey = LakeFSMetadataPrefix + "client-mtime"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we moved it into the api package. It's not being used here in any way.
Also, if we moved it here why didn't we move the permissions metadata key as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/api/helpes/download.go using this constant and it was part of local package which included the helpers, so it was cyclic import.

will move the permission metadata key

PreSign bool
HTTPClient *http.Client
PartSize int64
SkipNonRegularFiles bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what are the benefits of taking this flag under consideration in fs download.
The only "non-regular" file we have in lakeFS is a symlink so the symlink support flag should be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When SkipNonRegularFiles: true and SymlinkSupport: true I didn't want to download a file that represent symlink.
I didn't want to skip objects that represents symlinks by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are introducing 2 new flags to fs download were not considered previously. Both are under the local block in the lakectl config - I find that confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to support upload/download with the same capabilities. The skip non regular files was already part of local.

The alternatives will be not supporting symlink in upload / download operations or update the configuration structure to have skip_non_regular_files side by side with symlink_support on a different level.
Any suggestions?

@nopcoder nopcoder requested a review from a team June 28, 2025 16:20
@nopcoder
Copy link
Contributor Author

@treeverse/product can you review the change in the docs? howto/local-checkouts/

@nopcoder nopcoder requested review from N-o-Z and arielshaqed June 28, 2025 16:22
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still worried about symlink attacks. Worried about the widening scope of my review here; I would prefer a design review for this feature, it is less trivial that it appears. (Historically all sync'ing programs, including cpio, tar, pax, git, rsync, rclone, and friends, have had multiple bugs and security bugs around symlink support. There's a lot for us to read.

@@ -25,4 +22,7 @@ type Config struct {
IncludePerm bool
IncludeUID bool
IncludeGID bool
// SymlinkSupport - When enabled, symlinks are uploaded and downloaded as symlinks. When disabled (default), symlinks are treated as regular files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure: how do you treat symlinks "as a regular files"? Frankly, I would expect a refusal to sync them, and I think this is what we do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I'll fix the comment.

lakectl check the file type, for symlink it will fail unles skip non regular files are enabled. with symlink support on, and without skipping non-regular files we will read the symlink target.

tracker.UpdateTotal(0)
tracker.MarkAsDone()
}
return os.Symlink(symlinkTarget, dst)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course any program that creates symlinks is dangerous. lakectl local is much more dangerous, because an expected and encouraged usage pattern is for the user to edit the files they receive and reupload them.
This is rarely done by users of tar, for instance.
If we do this at all, we need to document safety properties from day 1. We also need to be very careful about what lakectl local will actually do in various "edge case" scenarios. I think we need a careful design, that lists these scenarios - I can think of multiple nasty results.

@@ -126,6 +127,9 @@ Use "lakectl local checkout..." to sync with the remote or run "lakectl local cl
}

func warnOnCaseInsensitiveDirectory(path string) {
if _, found := os.LookupEnv("LAKECTL_SUPPRESS_CASE_INSENSITIVE_WARNING"); found {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of this PR?

It is particularly worrying for symlinks, of course, because e.g. Windows behaves very nastily on these (also thinking about running lakectl local under WSL and/or Cygnus derivatives, for the complete compatibility nightmare).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I needed this one to run esti without getting this message as it comes out for all macOS users that use the recommended filesystem by apple. In order to record and compare golden files without this message I've added an option to hide it. can move to a different pr if you think if needed.
  • Windows users will not turn on symlink support and I don't see them using this feature. You need to opt in to use this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed this one to run esti without getting this message as it comes out for all macOS users that use the recommended filesystem by apple. In order to record and compare golden files without this message I've added an option to hide it. can move to a different pr if you think if needed.

Ouch, I forgot about case-sensitive filesystems! There may be any number of attacks hiding on MacOS. Unless we block the feature on all case-insensitive filesystems then this means we need a careful secure design before we can deliver this feature. (I understand this means few users on MacOS will be able to use it; we still need to avoid adding security holes.)

Windows users will not turn on symlink support and I don't see them using this feature. You need to opt in to use this one.

But it will kinda-work and then fail. If it is not supported, let's explicitly error out.

// Create symlink instead of downloading file content
return os.Symlink(symlinkTarget, dst)
}
// fallthrough to download the object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think my attack from above works even better, I don't need to "convince" you to do anything, because of an this line may introduce. Could you please test this scenario? (Well, maybe don't symlink to $HOME/.profile, in case it does work...).

  1. Enable symlink support for lakectl local.
  2. Sync to local.
  3. Create a symlink: ln -s $HOME/dot-profile evil.
  4. Commit it (to repo).
  5. Upload a regular object to the same path: echo bang | lakectl fs upload -s - lakefs://evil-repo/main/evil.
  6. Sync to the same local directory. Unless our 2 download* funcs are really carefully written, with strange behaviours, this overwrites your $HOME/dot-profile.

I think that on Linux you would need at least to change the os.Create call to os.OpenFile(name, os.O_RDWR|os.O_CREATE|os.O_TRUNC|os.O_NOFOLLOW, 0666). It's POSIX, so who knows - it might work on Darwin, possibly even in some cases on Windows.

But see how to attack this fix in the next comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry there is no os.O_NOFOLLOW - but I understand which flag you are talking about. It is OS specific.

The ln -s $HOME/dot-profile evil. This is true also without the symlink feature. When we write to evil we will override it. Not related to the fact that we also know how to create a symlink.
Saying that to diff between the feature and the danger that already exists and we can document and open issues to solve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is entirely related. There is a huge difference if you typed

ln -s $HOME/.profile repo/evil
cd local
lakectl local sync

or if you typed just

lakectl local sync  # sync ariels' commit which creates a symlink evil -> $HOME/.profile
sleep 3600
lakectl local sync  # sync ariels' commit which now overwrites $HOME/.profile

In the second you didn't type anything dangerous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About O_NOFOLLOW: you're right, it doesn't exist on POSIX. It does exist on both Linux and MacOS, so we might still win.

But we could still win with just POSIX! The spec says:

the open() function, when called with O_CREAT and O_EXCL, is required to fail with [EEXIST] if path names an existing symbolic link, even if the symbolic link refers to a nonexistent file. This behavior is required so that privileged applications can create a new file in a known location without the possibility that a symbolic link might cause the file to be created in a different location.

So if we always go through open(..., O_CREAT | O_EXCL) then we might be safe. We will still need to avoid concurrency along the path (i.e., get an in-process read lock on all paths to directories as we traverse them, and a write lock on path to every object to create it). And we would be unsafe for two concurrent lakectl local sync runs - but safe for a single run. We could even use POSIX advisory locks to do this - and then be safe for multiple concurrent runs.

Definitely something for a security design review.

// download object
var err error
if d.PreSign {
if d.PreSign && objectStat.SizeBytes != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay... so let's say you fixed the above issue. Naturally, O_NOFOLLOW only relates to whether the path is a symlink. That is, the last component on the path is a symlink. It does not block writing through to the symlink target when there is a symlink on the path (any previous component).

So let's also try this scenario once we add O_NOFOLLOW!

  1. Enable symlink support for lakectl local.
  2. Sync to local.
  3. Create a symlink: ln -s $HOME/ evil.
  4. Commit it (to repo).
  5. Upload a regular object under the same path: echo bang | lakectl fs upload -s - lakefs://evil-repo/main/evil/dot-profile. (to attack you I will use .../evil/.profile, of course).
  6. Sync to the same local directory. This should overwrite your $HOME/dot-profile.

I can think of multiple fixes for this bug, including not creating symlinks to directories (which can still be dangerous in a corporate environment, where things which were files can be remotely transformed into directories) and verifying that no component on the path is a symlink. But right now they all feel racy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, how is the above example related to symlink creation. It is another example of how evil can symlink can be and the above works without any of the changes in the PR.

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still having stomach pains about this. I don't feel like any of my concerns were addressed. Many open questions regarding behavior of local commands with symlinks.
I believe we should have a requirement document in place.

nopcoder added 2 commits July 6, 2025 11:37
- check we don't download under symlink
- remove RemoveFile with os.Remove
@nopcoder
Copy link
Contributor Author

nopcoder commented Jul 6, 2025

Wanted to share some insights:

The following is using gsutil but it is the same for aws s3:

# the content of my bucket
$ gsutil ls -R gs://lakefs-barak/sync
gs://lakefs-barak/sync/:
gs://lakefs-barak/sync/
gs://lakefs-barak/sync/redirect.html.tmpl

gs://lakefs-barak/sync/ttt/:
gs://lakefs-barak/sync/ttt/
gs://lakefs-barak/sync/ttt/image.png

# the content of local directory (/Users/barak/tmp/sync/test)
$ ls -l
total 0
lrwxr-xr-x  1 barak  staff  3 Jul  6 22:14 ttt@ -> ../

# copy the bucket into local
$ gsutil cp -r 'gs://lakefs-barak/sync/*' .
Copying gs://lakefs-barak/sync/redirect.html.tmpl...
Copying gs://lakefs-barak/sync/ttt/image.png...
- [2 files][  3.6 MiB/  3.6 MiB]
Operation completed over 2 objects/3.6 MiB.

# content of local folder
$ ls -l
total 8
-rw-r--r--  1 barak  staff  385 Jul  6 22:18 redirect.html.tmpl
lrwxr-xr-x  1 barak  staff    3 Jul  6 22:14 ttt -> ../

# the image under ttt is found on the upper folder

@nopcoder
Copy link
Contributor Author

nopcoder commented Jul 6, 2025

Updated the code to make it safer based on the above comments:

  • before we write content, delete the directory - there will be no truncate so we will not write into link to a file
  • before we download and check the destination directory, in case the directory is symlink, fail the download. this means this make 'lakectl local' never write to a symlink

@nopcoder
Copy link
Contributor Author

nopcoder commented Jul 6, 2025

@N-o-Z @arielshaqed
While testing lakectl local with symlinks and worried I an issue with my changes, but there is a general issue with how we sync.

The tldr is that we stream a list of changes to a pool of workers, so even if the changes comes in logical order it doesn't matter and we may fail.

In my tests I clone a repo twice each time I make a change on one folder, commit and pull on the other folder.
The flow will probably explain this one will be:

  1. Create a folder called ttt and add 5 files inside
  2. Commit and pull on the other folder
  3. Delete the folder ttt and create a file with the same name
  4. Commit and pull on the other folder

Now it all depends on your luck. if you have enough parallel and compute power.
You may see that it will delete all the files under ttt and add ttt as a file.
You may see it fail as it will try to add ttt while ttt is still a folder trying to clean from all the files.
As we list the diff, usually I see the delete comes first - I assume it is because of the merge algorithm scanning all the files under ttt/ which no longer exists and this is why we are getting the delete operations first. but at the end we the changes in the channel are distributed between the workers, so it doesn't help.

I had the same issue while trying to switch between real folder and symlink - this is how I got to the above.

@nopcoder nopcoder requested review from N-o-Z and arielshaqed July 9, 2025 06:28
@nopcoder
Copy link
Contributor Author

nopcoder commented Jul 9, 2025

I would like your input about how the new code writes the data and avoids writings to symlinks.
when the feature is enabled we capture the state and try (unless I missed something) cause issues to the user using symlinks. in some cases more than what aws s3 does.

@arielshaqed
Copy link
Contributor

I don't understand the scope of this comment. It sounds to me like there is a race during sync (I understand from the words "luck" and "parallel and compute power). Let's open an issue to fix that.

Does it affect this PR?

  • I hope it is completely unrelated - then we can pull this and fix that in whatever order.
  • It could be blocking for this PR - then we fix that on another PR and then we can continue with this PR.
  • I might simply be completely wrong here - then I will have to understand what it means for this PR.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I continue to block. I do not understand how this feature, as implemented, can be safe. It seems likely that each of us has a different threat model. Unfortunately it seems we are stuck. I suggest we agree on a design for this feature with explicit security.

@@ -126,6 +127,9 @@ Use "lakectl local checkout..." to sync with the remote or run "lakectl local cl
}

func warnOnCaseInsensitiveDirectory(path string) {
if _, found := os.LookupEnv("LAKECTL_SUPPRESS_CASE_INSENSITIVE_WARNING"); found {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed this one to run esti without getting this message as it comes out for all macOS users that use the recommended filesystem by apple. In order to record and compare golden files without this message I've added an option to hide it. can move to a different pr if you think if needed.

Ouch, I forgot about case-sensitive filesystems! There may be any number of attacks hiding on MacOS. Unless we block the feature on all case-insensitive filesystems then this means we need a careful secure design before we can deliver this feature. (I understand this means few users on MacOS will be able to use it; we still need to avoid adding security holes.)

Windows users will not turn on symlink support and I don't see them using this feature. You need to opt in to use this one.

But it will kinda-work and then fail. If it is not supported, let's explicitly error out.

Comment on lines +510 to +514
if strings.HasPrefix(k, apiutil.LakeFSMetadataPrefix) {
metaKey = apiutil.LakeFSHeaderInternalPrefix + k[len(apiutil.LakeFSMetadataPrefix):]
} else {
metaKey = apiutil.LakeFSHeaderMetadataPrefix + k
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. This encodes a potential solution for #9089, but it seems different from the proposal to resolve it. Even if this turns out to be the same as that proposal, we should make sure we accept that one before we proceed with implementing it here.

If that issue is blocking, we should resolve #9089 and then implement what we decided. The current code looks like it will introduce another incompatibility around object metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment remains open. Since I believe this PR does not depend on this fix, consider separating it and resolving each issue separately. That way neither issue holds up the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielshaqed sorry I missed this one.

The current lakectl upload implementation use the following code, which I didn't modified and use to upload object with metadata:

resp, err := client.UploadObjectWithBodyWithResponse(ctx, repoID, branchID, &apigen.UploadObjectParams{
Path: objPath,
}, mpContentType, pr, func(ctx context.Context, req *http.Request) error {
var metaKey string
for k, v := range metadata {
lowerKey := strings.ToLower(k)
if strings.HasPrefix(lowerKey, apiutil.LakeFSMetadataPrefix) {
metaKey = apiutil.LakeFSHeaderInternalPrefix + lowerKey[len(apiutil.LakeFSMetadataPrefix):]
} else {
metaKey = apiutil.LakeFSHeaderMetadataPrefix + lowerKey
}
req.Header.Set(metaKey, v)
}
return nil
})

This comment is on the esti integration test where I try to mimic the current implementation in order to have the same result when we upload object and set user metadata.

// Create symlink instead of downloading file content
return os.Symlink(symlinkTarget, dst)
}
// fallthrough to download the object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is entirely related. There is a huge difference if you typed

ln -s $HOME/.profile repo/evil
cd local
lakectl local sync

or if you typed just

lakectl local sync  # sync ariels' commit which creates a symlink evil -> $HOME/.profile
sleep 3600
lakectl local sync  # sync ariels' commit which now overwrites $HOME/.profile

In the second you didn't type anything dangerous.

// Create symlink instead of downloading file content
return os.Symlink(symlinkTarget, dst)
}
// fallthrough to download the object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About O_NOFOLLOW: you're right, it doesn't exist on POSIX. It does exist on both Linux and MacOS, so we might still win.

But we could still win with just POSIX! The spec says:

the open() function, when called with O_CREAT and O_EXCL, is required to fail with [EEXIST] if path names an existing symbolic link, even if the symbolic link refers to a nonexistent file. This behavior is required so that privileged applications can create a new file in a known location without the possibility that a symbolic link might cause the file to be created in a different location.

So if we always go through open(..., O_CREAT | O_EXCL) then we might be safe. We will still need to avoid concurrency along the path (i.e., get an in-process read lock on all paths to directories as we traverse them, and a write lock on path to every object to create it). And we would be unsafe for two concurrent lakectl local sync runs - but safe for a single run. We could even use POSIX advisory locks to do this - and then be safe for multiple concurrent runs.

Definitely something for a security design review.

@nopcoder
Copy link
Contributor Author

Created and assigned both of you to the design that explains the above implementation - https://github.com/treeverse/dev/pull/489

@nopcoder
Copy link
Contributor Author

@arielshaqed @N-o-Z after the design document merged can you review the above - the latest commits I've addressed the destination path that include symlink.

@nopcoder nopcoder requested a review from arielshaqed July 21, 2025 16:54
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANKS!

Some comments remain, nothing blocking.

Comment on lines +510 to +514
if strings.HasPrefix(k, apiutil.LakeFSMetadataPrefix) {
metaKey = apiutil.LakeFSHeaderInternalPrefix + k[len(apiutil.LakeFSMetadataPrefix):]
} else {
metaKey = apiutil.LakeFSHeaderMetadataPrefix + k
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment remains open. Since I believe this PR does not depend on this fix, consider separating it and resolving each issue separately. That way neither issue holds up the other.

@@ -214,3 +214,68 @@ func TestPruneEmptyDirectories(t *testing.T) {
})
}
}

func TestVerifyNoSymlinksInPath(t *testing.T) {
root := t.TempDir()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add another test for a path which is not under root. The result is a bit odd: it can discover that is is safe or unsafe (depending on where t.TempDir() lives; for instance /tmp/ is sometimes a symlink...). What it must not do is go into an infinite loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created something like /a/b/c where b->/d

@@ -134,6 +134,7 @@ func NewProgressPool() *ProgressPool {

type readerAtSeeker interface {
io.Seeker
io.Reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change here (in this PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit to revert the change.

And I'll explain - While this PR was open this one was created and merged:
#9283

Because in the code I wanted to provide empty content or the file itself wanted to point to a single content element, before the PR, the type was io.ReadSeeker this PR created custom type type that support ReaderAt and Seeker - so I no longer cloud take a single content type and apply it on the wrapper 'reader' and 'file' members (no longer have common ground).

but now it is just a nice story - Ive assigned the actual type to both members, reader and file and later I wrap the reader with the progress bar.

@@ -218,14 +219,19 @@ func (s *SyncManager) download(ctx context.Context, rootPath string, remote *uri
return err
}

// In all of the below lines of code, we purposefully do not use the Join methods in order to avoid the path cleaning they perform
// In all the below lines of code, we purposefully do not use the Join methods in order to avoid the path cleaning they perform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is pretty much the opposite of what the added l.231 does; so why do we still do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment I've updated reference the fact that

	destination := filepath.ToSlash(fmt.Sprintf("%s%c%s", rootPath, filepath.Separator, p))

At line 231 I'm using the destinationDirectory as the rest of the code.

	if err := fileutil.VerifyNoSymlinksInPath(destinationDirectory, rootPath); err != nil {

I'm not sure what do you mean the opposite as the previous code already used the destinationDirectory is is a cleaned path - if that what you mean?

@nopcoder
Copy link
Contributor Author

@arielshaqed updated code/comments with on the latest review

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Sorry it got delayed

@nopcoder nopcoder merged commit 1d347d0 into master Jul 24, 2025
42 checks passed
@nopcoder nopcoder deleted the task/lakectl-symlink branch July 24, 2025 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-changelog PR description should be included in next release changelog lakectl-local
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lakectl support in symlinks
3 participants