diff --git a/CHANGELOG.md b/CHANGELOG.md index bccbb5b..0590ab5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,13 @@ # Changelog +## [Unreleased] + +- Enhanced _Poll files_ node with configurable detection mode + - **Name + metadata mode** (default): Detects files as new when name, lastModified, size, or etag changes. Use for files that are overwritten/replaced with same name (e.g., daily status files). + - **Name only mode**: Only detects truly new filenames (original behavior) + - Added third output for deleted files + - Updated documentation with examples for both detection modes + ## [1.4.1] - 2025-11-16 - Bugfix: Don't use absolute paths for internal API calls for config setup diff --git a/docs/seqera_nodes/poll_files.md b/docs/seqera_nodes/poll_files.md index be7c93c..02fc233 100644 --- a/docs/seqera_nodes/poll_files.md +++ b/docs/seqera_nodes/poll_files.md @@ -1,8 +1,8 @@ # Poll files -**Periodically list a Seqera Data Explorer Data Link and emit messages when new objects appear.** +**Periodically list a Seqera Data Explorer Data Link and emit messages when new or modified objects appear.** -This node automatically monitors a Data Link for _changes_, making it perfect for event-driven workflows that trigger when new files are uploaded. +This node automatically monitors a Data Link for _changes_, making it perfect for event-driven workflows that trigger when new files are uploaded or existing files are modified. !!! note @@ -21,6 +21,10 @@ This node automatically monitors a Data Link for _changes_, making it perfect fo - **Seqera config**: Reference to the seqera-config node containing API credentials and default workspace settings. - **Node name**: Optional custom name for the node in the editor. +- **Poll frequency** (default **15 min**): Interval between polls. +- **Detection mode** (default **Name + metadata**): How to detect new files: + - **Name + metadata (detect changes)**: Detects files as new when name, lastModified, size, or etag changes. Use this when files are overwritten or replaced with the same name (e.g., daily status files). + - **Name only (detect new files)**: Only detects truly new filenames. Files with the same name are ignored after first detection (original behavior). - **Data Link name** (required): Display name of the Data Link. Supports autocomplete. - **Base path**: Path within the Data Link to start from. - **Prefix**: Prefix filter applied to both files and folders. @@ -28,19 +32,21 @@ This node automatically monitors a Data Link for _changes_, making it perfect fo - **Return type** (default **files**): `files`, `folders` or `all`. - **Max results** (default **100**): Maximum number of objects to return per poll. - **Depth** (default **0**): Folder recursion depth. -- **Poll frequency** (default **15 min**): Interval between polls. - **Workspace ID**: Override the workspace ID from the Config node. All properties work the same as the [list files](list_files.md) node, plus automatic polling. -## Outputs (two) +## Outputs (three) -The node has two outputs that fire at different times: +The node has three outputs that fire at different times: 1. **All results** – Emitted every poll with the full, filtered list of files. -2. **New results** – Emitted only when one or more _new_ objects are detected since the last poll. +2. **New or modified** – Emitted only when one or more _new or changed_ objects are detected since the last poll. Behavior depends on the detection mode: + - **Name + metadata mode**: Fires when files are new OR when existing files have changed metadata (lastModified, size, or etag). + - **Name only mode**: Fires only for truly new filenames that haven't been seen before. +3. **Deleted** – Emitted only when files that were present in the previous poll are no longer found in the current poll. -Both messages include the same properties: +All messages include the same properties: - `msg.payload.files` – Array of file objects from the API. - `msg.payload.resourceType`, `msg.payload.resourceRef`, `msg.payload.provider` – Data Link metadata. @@ -49,18 +55,45 @@ Both messages include the same properties: ## How new files are detected -The node tracks seen files in its context storage. On each poll: +The node tracks seen files in its internal state. On each poll: 1. Fetch the current list of files from the Data Link 2. Compare against the list from the previous poll -3. If new files are found, emit them on output 2 -4. Update the stored list for the next comparison +3. If new, modified, or deleted files are found, emit them on the appropriate outputs +4. Update the stored state for the next comparison + +The comparison behavior depends on the **detection mode**: + +### Name + metadata mode (default) + +Files are identified by a combination of name, lastModified timestamp, size, and etag. A file is considered "new or modified" if: + +- It has a filename that wasn't seen before, OR +- It has the same filename but different lastModified, size, or etag values + +**Use case**: Daily status files, completion markers, or any files that are overwritten/replaced with the same name but should trigger on each update. + +**Example**: A file named `RTAComplete.txt` is deposited every night with a new timestamp. Each deposit will trigger output 2. + +### Name only mode + +Files are identified by name only. A file is considered "new" only if: + +- It has a filename that wasn't seen before -The comparison is based on the full file path. Files that are deleted and re-uploaded will be detected as "new". +Files with the same name are ignored after the first detection, regardless of any metadata changes. + +**Use case**: Monitoring for truly new files where you don't care about modifications to existing files. + +**Example**: Monitoring an upload directory where each sequencing run has a unique filename. + +### Deletion detection (both modes) + +Files are considered "deleted" when they were present in the previous poll but are not found in the current poll. This happens in both detection modes and fires on output 3. !!! info - The very first poll after the node is created sees everything as new and is handled as a special case. It does not output new results. + The very first poll after the node is created sees everything as new and is handled as a special case. It does not output new results on output 2. ## Required permissions @@ -74,11 +107,14 @@ See the [configuration documentation](configuration.md#required-token-permission 1. Add a **poll-files** node and configure the Data Link 2. Set **pollFrequency** to your desired interval (e.g., `5:00` for 5 minutes) -3. Connect output 2 (New results) to a **workflow-launch** node -4. Configure the launch node to use the file paths from `msg.files` -5. Deploy +3. Set **Detection mode** based on your use case: + - Use **Name + metadata** if files can be overwritten/replaced + - Use **Name only** if all files have unique names +4. Connect output 2 (New or modified) to a **workflow-launch** node +5. Configure the launch node to use the file paths from `msg.files` +6. Deploy -Now every time a new file appears in the Data Link, a workflow will automatically launch. +Now every time a new or modified file appears in the Data Link, a workflow will automatically launch. ### Trigger only on specific file types @@ -86,10 +122,30 @@ Now every time a new file appears in the Data Link, a workflow will automaticall 2. Connect output 2 to your processing logic 3. The node will only emit when new BAM files appear +### Monitor for status file updates + +Use this pattern when a completion marker file is deposited with the same name but new timestamp: + +1. Set **Detection mode** to **Name + metadata** +2. Set **Pattern**: `RTAComplete\.txt$` to match only the status file +3. Connect output 2 to your workflow trigger +4. Each time the file is deposited (even with the same name), the workflow will trigger + +### Clean up on file deletion + +Monitor for files being removed from a Data Link: + +1. Configure the poll node as normal +2. Connect output 3 (Deleted) to a notification or cleanup workflow +3. When files disappear from the Data Link, the third output fires with the list of deleted files + ## Notes -- The first poll after deployment/restart does **not** emit to the "New results" output (it initializes the tracking state) -- The tracking is reset on each Node-RED restart or flow redeployment +- The first poll after deployment/restart does **not** emit to the "New or modified" output (output 2) – it initializes the tracking state +- The tracking state is reset on each Node-RED restart or flow redeployment +- Choose the appropriate **detection mode** for your use case: + - Use **Name + metadata** when files can be overwritten or replaced (e.g., status files, daily reports) + - Use **Name only** when all files have unique names and you don't care about modifications - Very frequent polling (< 30 seconds) may impact API rate limits - Custom message properties are preserved in outputs (e.g., `msg._context`) - Large Data Links with deep recursion may take time to process on each poll diff --git a/nodes/datalink-poll.html b/nodes/datalink-poll.html index c2b7075..7189a36 100644 --- a/nodes/datalink-poll.html +++ b/nodes/datalink-poll.html @@ -20,6 +20,13 @@ +
+ + +
@@ -67,6 +74,7 @@ ### Inputs : pollFrequency (string) : Poll frequency (default `15 minutes`). Can be configured in seconds, minutes, hours, or days. +: detectionMode (string) : Detection mode for new files (default `Name + metadata`). **Name + metadata** detects files as new when name, lastModified, size, or etag changes (use for files that are overwritten/replaced with same name). **Name only** detects only truly new filenames (original behavior). : dataLinkName (string) : The name of the data explorer link. : basePath (string) : Path within the data link to start browsing. Leave blank for the root. : prefix (string) : Optional prefix filter for results (applies to folders and files) @@ -79,15 +87,21 @@ ### Outputs -The node has two outputs: +The node has three outputs: -1. All results on every poll. -2. New objects since the previous poll (nothing sent if no new objects). +1. **All results** - Fires on every poll with all current files. +2. **New or modified** - Behavior depends on detection mode: + - **Name + metadata**: Fires when files are new OR when existing files have changed metadata (lastModified, size, or etag). Files with the same name but different timestamps are detected as new. + - **Name only**: Fires only for truly new filenames that haven't been seen before. +3. **Deleted** - Fires only when files that were present in the previous poll are no longer found. -Both outputs have the following properties: +All outputs have the following properties: -: payload (array) : Fle information aggregated from the API (array of objects). -: files (array) : File names (array of strings). +: payload.files (array) : File information aggregated from the API (array of objects). +: payload.resourceType (string) : Type of the Data Link resource. +: payload.resourceRef (string) : Resource reference path. +: payload.provider (string) : Cloud provider name. +: files (array) : File paths as strings (array). All typed-input fields are identical to the _List files_ node with the addition of **poll frequency**. @@ -97,14 +111,14 @@ category: "seqera", color: "#A9A1C6", inputs: 0, - outputs: 2, + outputs: 3, icon: "icons/data-explorer.svg", align: "left", paletteLabel: "Poll files", label: function () { return this.name || "Poll files"; }, - outputLabels: ["All objects", "Only new objects"], + outputLabels: ["All objects", "New or modified", "Deleted"], defaults: { name: { value: "" }, seqera: { value: "", type: "seqera-config" }, @@ -128,6 +142,7 @@ pollFrequency: { value: "15" }, pollUnits: { value: "minutes" }, returnType: { value: "files" }, + detectionMode: { value: "metadata" }, }, oneditprepare: function () { function ti(id, val, type, def = "str") { @@ -149,6 +164,7 @@ $("#node-input-pollFrequency").val(this.pollFrequency || "15"); $("#node-input-pollUnits").val(this.pollUnits || "minutes"); + $("#node-input-detectionMode").val(this.detectionMode || "metadata"); $("#node-input-returnType").val(this.returnType || "files"); // Add auto-complete for datalink name when type is "str" @@ -260,6 +276,7 @@ this.pollFrequency = $("#node-input-pollFrequency").val(); this.pollUnits = $("#node-input-pollUnits").val(); + this.detectionMode = $("#node-input-detectionMode").val(); this.returnType = $("#node-input-returnType").val(); }, }); diff --git a/nodes/datalink-poll.js b/nodes/datalink-poll.js index 75775d7..b7319d6 100644 --- a/nodes/datalink-poll.js +++ b/nodes/datalink-poll.js @@ -30,6 +30,7 @@ module.exports = function (RED) { node.depthProp = config.depth; node.depthPropType = config.depthType; node.returnType = config.returnType || "files"; // files|folders|all + node.detectionMode = config.detectionMode || "metadata"; // name|metadata // Poll frequency configuration const unitMultipliers = { @@ -54,8 +55,26 @@ module.exports = function (RED) { return `${d.getFullYear()}-${pad(d.getMonth() + 1)}-${pad(d.getDate())} ${d.toLocaleTimeString()}`; }; - // Internal cache of previously seen object names - let previousNamesSet = null; + // Helper to create unique identifier for a file + // Mode "name": Only uses filename (original behavior - only detect truly new files) + // Mode "metadata": Uses name + lastModified/size/etag (detect changes to existing files) + const getFileIdentifier = (item) => { + if (node.detectionMode === "name") { + return item.name; + } + // metadata mode (default) + const parts = [item.name]; + if (item.lastModified) parts.push(item.lastModified); + if (item.size != null) parts.push(String(item.size)); + if (item.etag) parts.push(item.etag); + return parts.join("|"); + }; + + // Internal cache of previously seen objects + // Store both identifiers (for change detection) and a map by name (for deletion detection) + let previousIdentifiersSet = null; + let previousItemsMap = null; + let intervalId = null; // Polling function const executePoll = async () => { @@ -78,28 +97,57 @@ module.exports = function (RED) { files: result.files.map((it) => `${result.resourceRef}/${it}`), }; - // Second output: only new items since previous poll + // Build current state for comparison + const currentIdentifiers = new Set(result.items.map(getFileIdentifier)); + const currentNameToItem = new Map(result.items.map((it) => [it.name, it])); + + // Second output: new or modified items since previous poll let msgNew = null; - if (previousNamesSet) { - const newItems = result.items.filter((it) => !previousNamesSet.has(it.name)); - if (newItems.length) { + if (previousIdentifiersSet) { + const newOrModified = result.items.filter((it) => !previousIdentifiersSet.has(getFileIdentifier(it))); + if (newOrModified.length) { msgNew = { + ...pollMsg, + payload: { + files: newOrModified, + resourceType: result.resourceType, + resourceRef: result.resourceRef, + provider: result.provider, + }, + files: newOrModified.map((it) => `${result.resourceRef}/${it.name}`), + }; + } + } + + // Third output: deleted items (present in previous poll but not current) + let msgDeleted = null; + if (previousItemsMap) { + const deletedItems = []; + for (const [name, item] of previousItemsMap.entries()) { + if (!currentNameToItem.has(name)) { + deletedItems.push(item); + } + } + if (deletedItems.length) { + msgDeleted = { + ...pollMsg, payload: { - files: newItems, + files: deletedItems, resourceType: result.resourceType, resourceRef: result.resourceRef, provider: result.provider, }, - files: newItems.map((it) => `${result.resourceRef}/${it.name}`), + files: deletedItems.map((it) => `${result.resourceRef}/${it.name}`), }; } } // Update cache - previousNamesSet = new Set(result.items.map((it) => it.name)); + previousIdentifiersSet = currentIdentifiers; + previousItemsMap = currentNameToItem; node.status({ fill: "green", shape: "dot", text: `${result.items.length} items: ${formatDateTime()}` }); - node.send([msgAll, msgNew]); + node.send([msgAll, msgNew, msgDeleted]); } catch (err) { node.error(`Seqera datalink poll failed: ${err.message}`); node.status({ fill: "red", shape: "dot", text: `error: ${formatDateTime()}` }); @@ -109,7 +157,7 @@ module.exports = function (RED) { // Start the polling interval if (node.seqeraConfig && config.dataLinkName && config.dataLinkName.trim() !== "") { const intervalMs = node.pollFrequencySec * 1000; - const intervalId = setInterval(executePoll, intervalMs); + intervalId = setInterval(executePoll, intervalMs); // run once immediately executePoll(); } @@ -146,6 +194,7 @@ module.exports = function (RED) { returnType: { value: "files" }, // poll specific pollFrequency: { value: "15" }, + detectionMode: { value: "metadata" }, }, }); };