Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog

## [Unreleased]

- Enhanced _Poll files_ node with configurable detection mode
- **Name + metadata mode** (default): Detects files as new when name, lastModified, size, or etag changes. Use for files that are overwritten/replaced with same name (e.g., daily status files).
- **Name only mode**: Only detects truly new filenames (original behavior)
- Added third output for deleted files
- Updated documentation with examples for both detection modes

## [1.4.1] - 2025-11-16

- Bugfix: Don't use absolute paths for internal API calls for config setup
Expand Down
92 changes: 74 additions & 18 deletions docs/seqera_nodes/poll_files.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Poll files

**Periodically list a Seqera Data Explorer Data Link and emit messages when new objects appear.**
**Periodically list a Seqera Data Explorer Data Link and emit messages when new or modified objects appear.**

This node automatically monitors a Data Link for _changes_, making it perfect for event-driven workflows that trigger when new files are uploaded.
This node automatically monitors a Data Link for _changes_, making it perfect for event-driven workflows that trigger when new files are uploaded or existing files are modified.

!!! note

Expand All @@ -21,26 +21,32 @@ This node automatically monitors a Data Link for _changes_, making it perfect fo

- **Seqera config**: Reference to the seqera-config node containing API credentials and default workspace settings.
- **Node name**: Optional custom name for the node in the editor.
- **Poll frequency** (default **15 min**): Interval between polls.
- **Detection mode** (default **Name + metadata**): How to detect new files:
- **Name + metadata (detect changes)**: Detects files as new when name, lastModified, size, or etag changes. Use this when files are overwritten or replaced with the same name (e.g., daily status files).
- **Name only (detect new files)**: Only detects truly new filenames. Files with the same name are ignored after first detection (original behavior).
- **Data Link name** (required): Display name of the Data Link. Supports autocomplete.
- **Base path**: Path within the Data Link to start from.
- **Prefix**: Prefix filter applied to both files and folders.
- **Pattern**: Regular-expression filter applied to files after the prefix filter.
- **Return type** (default **files**): `files`, `folders` or `all`.
- **Max results** (default **100**): Maximum number of objects to return per poll.
- **Depth** (default **0**): Folder recursion depth.
- **Poll frequency** (default **15 min**): Interval between polls.
- **Workspace ID**: Override the workspace ID from the Config node.

All properties work the same as the [list files](list_files.md) node, plus automatic polling.

## Outputs (two)
## Outputs (three)

The node has two outputs that fire at different times:
The node has three outputs that fire at different times:

1. **All results** – Emitted every poll with the full, filtered list of files.
2. **New results** – Emitted only when one or more _new_ objects are detected since the last poll.
2. **New or modified** – Emitted only when one or more _new or changed_ objects are detected since the last poll. Behavior depends on the detection mode:
- **Name + metadata mode**: Fires when files are new OR when existing files have changed metadata (lastModified, size, or etag).
- **Name only mode**: Fires only for truly new filenames that haven't been seen before.
3. **Deleted** – Emitted only when files that were present in the previous poll are no longer found in the current poll.

Both messages include the same properties:
All messages include the same properties:

- `msg.payload.files` – Array of file objects from the API.
- `msg.payload.resourceType`, `msg.payload.resourceRef`, `msg.payload.provider` – Data Link metadata.
Expand All @@ -49,18 +55,45 @@ Both messages include the same properties:

## How new files are detected

The node tracks seen files in its context storage. On each poll:
The node tracks seen files in its internal state. On each poll:

1. Fetch the current list of files from the Data Link
2. Compare against the list from the previous poll
3. If new files are found, emit them on output 2
4. Update the stored list for the next comparison
3. If new, modified, or deleted files are found, emit them on the appropriate outputs
4. Update the stored state for the next comparison

The comparison behavior depends on the **detection mode**:

### Name + metadata mode (default)

Files are identified by a combination of name, lastModified timestamp, size, and etag. A file is considered "new or modified" if:

- It has a filename that wasn't seen before, OR
- It has the same filename but different lastModified, size, or etag values

**Use case**: Daily status files, completion markers, or any files that are overwritten/replaced with the same name but should trigger on each update.

**Example**: A file named `RTAComplete.txt` is deposited every night with a new timestamp. Each deposit will trigger output 2.

### Name only mode

Files are identified by name only. A file is considered "new" only if:

- It has a filename that wasn't seen before

The comparison is based on the full file path. Files that are deleted and re-uploaded will be detected as "new".
Files with the same name are ignored after the first detection, regardless of any metadata changes.

**Use case**: Monitoring for truly new files where you don't care about modifications to existing files.

**Example**: Monitoring an upload directory where each sequencing run has a unique filename.

### Deletion detection (both modes)

Files are considered "deleted" when they were present in the previous poll but are not found in the current poll. This happens in both detection modes and fires on output 3.

!!! info

The very first poll after the node is created sees everything as new and is handled as a special case. It does not output new results.
The very first poll after the node is created sees everything as new and is handled as a special case. It does not output new results on output 2.

## Required permissions

Expand All @@ -74,22 +107,45 @@ See the [configuration documentation](configuration.md#required-token-permission

1. Add a **poll-files** node and configure the Data Link
2. Set **pollFrequency** to your desired interval (e.g., `5:00` for 5 minutes)
3. Connect output 2 (New results) to a **workflow-launch** node
4. Configure the launch node to use the file paths from `msg.files`
5. Deploy
3. Set **Detection mode** based on your use case:
- Use **Name + metadata** if files can be overwritten/replaced
- Use **Name only** if all files have unique names
4. Connect output 2 (New or modified) to a **workflow-launch** node
5. Configure the launch node to use the file paths from `msg.files`
6. Deploy

Now every time a new file appears in the Data Link, a workflow will automatically launch.
Now every time a new or modified file appears in the Data Link, a workflow will automatically launch.

### Trigger only on specific file types

1. Set **pattern**: `.*\.bam$` to only detect BAM files
2. Connect output 2 to your processing logic
3. The node will only emit when new BAM files appear

### Monitor for status file updates

Use this pattern when a completion marker file is deposited with the same name but new timestamp:

1. Set **Detection mode** to **Name + metadata**
2. Set **Pattern**: `RTAComplete\.txt$` to match only the status file
3. Connect output 2 to your workflow trigger
4. Each time the file is deposited (even with the same name), the workflow will trigger

Comment on lines +125 to +133
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this one is a bit obvious

Suggested change
### Monitor for status file updates
Use this pattern when a completion marker file is deposited with the same name but new timestamp:
1. Set **Detection mode** to **Name + metadata**
2. Set **Pattern**: `RTAComplete\.txt$` to match only the status file
3. Connect output 2 to your workflow trigger
4. Each time the file is deposited (even with the same name), the workflow will trigger
### Monitor for status file updates
Use this pattern when a completion marker file is deposited with the same name but new timestamp:
1. Set **Detection mode** to **Name + metadata**
2. Set **Pattern**: `RTAComplete\.txt$` to match only the status file
3. Connect output 2 to your workflow trigger
4. Each time the file is deposited (even with the same name), the workflow will trigger

### Clean up on file deletion

Monitor for files being removed from a Data Link:

1. Configure the poll node as normal
2. Connect output 3 (Deleted) to a notification or cleanup workflow
3. When files disappear from the Data Link, the third output fires with the list of deleted files

## Notes

- The first poll after deployment/restart does **not** emit to the "New results" output (it initializes the tracking state)
- The tracking is reset on each Node-RED restart or flow redeployment
- The first poll after deployment/restart does **not** emit to the "New or modified" output (output 2) – it initializes the tracking state
- The tracking state is reset on each Node-RED restart or flow redeployment
- Choose the appropriate **detection mode** for your use case:
- Use **Name + metadata** when files can be overwritten or replaced (e.g., status files, daily reports)
- Use **Name only** when all files have unique names and you don't care about modifications
- Very frequent polling (< 30 seconds) may impact API rate limits
- Custom message properties are preserved in outputs (e.g., `msg._context`)
- Large Data Links with deep recursion may take time to process on each poll
Expand Down
33 changes: 25 additions & 8 deletions nodes/datalink-poll.html
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,13 @@
<option value="days">Days</option>
</select>
</div>
<div class="form-row">
<label for="node-input-detectionMode"><i class="fa fa-search"></i> Detection mode</label>
<select id="node-input-detectionMode">
<option value="metadata">Name + metadata (detect changes)</option>
<option value="name">Name only (detect new files)</option>
</select>
</div>

<!-- Data link parameters (same as list node) -->
<div class="form-row">
Expand Down Expand Up @@ -67,6 +74,7 @@
### Inputs

: pollFrequency (string) : Poll frequency (default `15 minutes`). Can be configured in seconds, minutes, hours, or days.
: detectionMode (string) : Detection mode for new files (default `Name + metadata`). **Name + metadata** detects files as new when name, lastModified, size, or etag changes (use for files that are overwritten/replaced with same name). **Name only** detects only truly new filenames (original behavior).
: dataLinkName (string) : The name of the data explorer link.
: basePath (string) : Path within the data link to start browsing. Leave blank for the root.
: prefix (string) : Optional prefix filter for results (applies to folders and files)
Expand All @@ -79,15 +87,21 @@

### Outputs

The node has two outputs:
The node has three outputs:

1. All results on every poll.
2. New objects since the previous poll (nothing sent if no new objects).
1. **All results** - Fires on every poll with all current files.
2. **New or modified** - Behavior depends on detection mode:
- **Name + metadata**: Fires when files are new OR when existing files have changed metadata (lastModified, size, or etag). Files with the same name but different timestamps are detected as new.
- **Name only**: Fires only for truly new filenames that haven't been seen before.
3. **Deleted** - Fires only when files that were present in the previous poll are no longer found.

Both outputs have the following properties:
All outputs have the following properties:

: payload (array) : Fle information aggregated from the API (array of objects).
: files (array) : File names (array of strings).
: payload.files (array) : File information aggregated from the API (array of objects).
: payload.resourceType (string) : Type of the Data Link resource.
: payload.resourceRef (string) : Resource reference path.
: payload.provider (string) : Cloud provider name.
: files (array) : File paths as strings (array).

All typed-input fields are identical to the _List files_ node with the addition of **poll frequency**.
</script>
Expand All @@ -97,14 +111,14 @@
category: "seqera",
color: "#A9A1C6",
inputs: 0,
outputs: 2,
outputs: 3,
icon: "icons/data-explorer.svg",
align: "left",
paletteLabel: "Poll files",
label: function () {
return this.name || "Poll files";
},
outputLabels: ["All objects", "Only new objects"],
outputLabels: ["All objects", "New or modified", "Deleted"],
defaults: {
name: { value: "" },
seqera: { value: "", type: "seqera-config" },
Expand All @@ -128,6 +142,7 @@
pollFrequency: { value: "15" },
pollUnits: { value: "minutes" },
returnType: { value: "files" },
detectionMode: { value: "metadata" },
},
oneditprepare: function () {
function ti(id, val, type, def = "str") {
Expand All @@ -149,6 +164,7 @@
$("#node-input-pollFrequency").val(this.pollFrequency || "15");
$("#node-input-pollUnits").val(this.pollUnits || "minutes");

$("#node-input-detectionMode").val(this.detectionMode || "metadata");
$("#node-input-returnType").val(this.returnType || "files");

// Add auto-complete for datalink name when type is "str"
Expand Down Expand Up @@ -260,6 +276,7 @@
this.pollFrequency = $("#node-input-pollFrequency").val();
this.pollUnits = $("#node-input-pollUnits").val();

this.detectionMode = $("#node-input-detectionMode").val();
this.returnType = $("#node-input-returnType").val();
},
});
Expand Down
71 changes: 60 additions & 11 deletions nodes/datalink-poll.js
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ module.exports = function (RED) {
node.depthProp = config.depth;
node.depthPropType = config.depthType;
node.returnType = config.returnType || "files"; // files|folders|all
node.detectionMode = config.detectionMode || "metadata"; // name|metadata

// Poll frequency configuration
const unitMultipliers = {
Expand All @@ -54,8 +55,26 @@ module.exports = function (RED) {
return `${d.getFullYear()}-${pad(d.getMonth() + 1)}-${pad(d.getDate())} ${d.toLocaleTimeString()}`;
};

// Internal cache of previously seen object names
let previousNamesSet = null;
// Helper to create unique identifier for a file
// Mode "name": Only uses filename (original behavior - only detect truly new files)
// Mode "metadata": Uses name + lastModified/size/etag (detect changes to existing files)
const getFileIdentifier = (item) => {
if (node.detectionMode === "name") {
return item.name;
}
// metadata mode (default)
const parts = [item.name];
if (item.lastModified) parts.push(item.lastModified);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think that item.lastModified was probably hallucinated.

if (item.size != null) parts.push(String(item.size));
if (item.etag) parts.push(item.etag);
return parts.join("|");
};

// Internal cache of previously seen objects
// Store both identifiers (for change detection) and a map by name (for deletion detection)
let previousIdentifiersSet = null;
let previousItemsMap = null;
let intervalId = null;

// Polling function
const executePoll = async () => {
Expand All @@ -78,28 +97,57 @@ module.exports = function (RED) {
files: result.files.map((it) => `${result.resourceRef}/${it}`),
};

// Second output: only new items since previous poll
// Build current state for comparison
const currentIdentifiers = new Set(result.items.map(getFileIdentifier));
const currentNameToItem = new Map(result.items.map((it) => [it.name, it]));

// Second output: new or modified items since previous poll
let msgNew = null;
if (previousNamesSet) {
const newItems = result.items.filter((it) => !previousNamesSet.has(it.name));
if (newItems.length) {
if (previousIdentifiersSet) {
const newOrModified = result.items.filter((it) => !previousIdentifiersSet.has(getFileIdentifier(it)));
if (newOrModified.length) {
msgNew = {
...pollMsg,
payload: {
files: newOrModified,
resourceType: result.resourceType,
resourceRef: result.resourceRef,
provider: result.provider,
},
files: newOrModified.map((it) => `${result.resourceRef}/${it.name}`),
};
}
}

// Third output: deleted items (present in previous poll but not current)
let msgDeleted = null;
if (previousItemsMap) {
const deletedItems = [];
for (const [name, item] of previousItemsMap.entries()) {
if (!currentNameToItem.has(name)) {
deletedItems.push(item);
}
}
if (deletedItems.length) {
msgDeleted = {
...pollMsg,
payload: {
files: newItems,
files: deletedItems,
resourceType: result.resourceType,
resourceRef: result.resourceRef,
provider: result.provider,
},
files: newItems.map((it) => `${result.resourceRef}/${it.name}`),
files: deletedItems.map((it) => `${result.resourceRef}/${it.name}`),
};
}
}

// Update cache
previousNamesSet = new Set(result.items.map((it) => it.name));
previousIdentifiersSet = currentIdentifiers;
previousItemsMap = currentNameToItem;

node.status({ fill: "green", shape: "dot", text: `${result.items.length} items: ${formatDateTime()}` });
node.send([msgAll, msgNew]);
node.send([msgAll, msgNew, msgDeleted]);
} catch (err) {
node.error(`Seqera datalink poll failed: ${err.message}`);
node.status({ fill: "red", shape: "dot", text: `error: ${formatDateTime()}` });
Expand All @@ -109,7 +157,7 @@ module.exports = function (RED) {
// Start the polling interval
if (node.seqeraConfig && config.dataLinkName && config.dataLinkName.trim() !== "") {
const intervalMs = node.pollFrequencySec * 1000;
const intervalId = setInterval(executePoll, intervalMs);
intervalId = setInterval(executePoll, intervalMs);
// run once immediately
executePoll();
}
Expand Down Expand Up @@ -146,6 +194,7 @@ module.exports = function (RED) {
returnType: { value: "files" },
// poll specific
pollFrequency: { value: "15" },
detectionMode: { value: "metadata" },
},
});
};