-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: ignore files not in remote when push is false #10749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: ignore files not in remote when push is false #10749
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10749 +/- ##
==========================================
+ Coverage 90.68% 91.10% +0.41%
==========================================
Files 504 504
Lines 39795 39997 +202
Branches 3141 3160 +19
==========================================
+ Hits 36087 36438 +351
+ Misses 3042 2934 -108
+ Partials 666 625 -41 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a3fe1e0
to
4537674
Compare
4537674
to
b5a6a58
Compare
@skshetry, have you had time to look at this? This feature would be really great for our team! |
data_status_parser.add_argument( | ||
"--respect-no-push", | ||
action="store_true", | ||
default=False, | ||
help="Respect the `push: false` flag in the DVC stage's outs.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can enable this by default. dvc status
does the same.
See:
not_in_remote = uncommitted_diff.pop("not_in_remote", []) | ||
|
||
if respect_no_push: | ||
logger.debug("Filtering out paths that are not pushable") | ||
not_in_remote = _filter_out_push_false_outs(repo, not_in_remote) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to extract out the "not_in_remote" checks from above _diff()
, as it has nothing to do with diff between worktree and committed changes (index). We need to calculate only for the given index (commited changes).
That is what repo.index
is. It's an index of committed changes. You can filter that index to a view, using worktree_view
:
Line 63 in 549821e
def worktree_view( |
if not_in_remote:
view = worktree_view(repo.index, push=True)
# ... existing logic
push=True
gives us
Line 73 in 549821e
push: Whether the view should be restricted to pushable data only. |
You can get access to DataIndex
using view.data["repo"]
. And then use index.iteritems(shallow=not granular)
on it.
Let me know if you need help. I can take over the PR if you prefer that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can extract the above not_in_remote
stuff. It is using Change
type, but you can make minor modification to it to instead use Entry
type. change.old
and change.new
are just Entry
type.
Something like follows, maybe:
data_index = view.data["repo"]
for key, entry in data_index.iteritems(shallow=not granular):
if not (entry and entry.hash_info):
continue
k = (*key, "") if entry.meta and entry.meta.isdir else key
try:
if not index.storage_map.remote_exists(entry, refresh=remote_refresh):
yield os.path.sep.join(k)
except StorageError:
pass
@@ -62,6 +62,7 @@ def fill_stage_outputs(stage, **kwargs): | |||
"plots_persist_no_cache", | |||
"outs_no_cache", | |||
"outs", | |||
"outs_no_push", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add these please. These are also used in CLIs.
I'll suggest a different way to create a stage in tests below with push=False
.
dvc.stage.add( | ||
name="create-foo", cmd="echo foo > foo", deps=["fixed"], outs_no_push=["foo"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit verbose, but it's better than adding outs_no_push
arg to stage.add()
function.
dvc.stage.add( | |
name="create-foo", cmd="echo foo > foo", deps=["fixed"], outs_no_push=["foo"] | |
) | |
stage = dvc.stage.create( | |
name="create-foo", cmd="echo foo > foo", deps=["fixed"], outs=["foo"] | |
) | |
stage.outs[0].can_push = False | |
stage.dump() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, you don't have to create a pipeline stage, which can be slower.
dvc.stage.add( | |
name="create-foo", cmd="echo foo > foo", deps=["fixed"], outs_no_push=["foo"] | |
) | |
stage = dvc.stage.create(single_stage=True, outs=["foo"]) | |
stage.outs[0].can_push = False | |
stage.dump() |
assert set( | ||
dvc.data_status(remote_refresh=True, not_in_remote=True)["not_in_remote"] | ||
) == {"foo", "bar"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move all of these steps into the test itself. If we get rid of --respect-no-push
, we'd only need one single test.
Fixes #10317
Enables opt-in to remove
push: false
stage outputs fromnot_in_remote
data status results.Notable changes:
outs_no_push
todvc.stage.utils.fill_stage_outputs
keys, to facilitate making outputs withpush: false
.status
, when flag enabled, filter through files reported asnot_in_remote
, and remove them if notcan_push
.--respect-no-push
flag to CLIOpen to suggestions on how to make the flag names more intuitive!
Corresponding PR for the docs: iterative/dvc.org#5373
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏