Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions backend/src/routes/upload.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ import { asyncHandler, ApiError } from '../middleware/errorHandler.js';
import { validateUpload } from '../middleware/uploadValidator.js';

const router = express.Router();
const parseWithTimeout = (buffer, ms = 8000) =>
Promise.race([
pdfParse(buffer),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('PDF parsing timed out')), ms)
)
]);
Comment on lines +11 to +17

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The timeout wrapper does not actually stop pdfParse; Promise.race only rejects early, but the parser continues running in the background. If parsing is CPU-bound (the exact DoS case described), this still blocks the event loop and the timeout cannot preempt it, so the DoS mitigation is ineffective. Run parsing in an isolated worker/process and terminate that worker on timeout (or use a parser that supports abort/cancellation). [incomplete implementation]

Severity Level: Critical 🚨
- ❌ Malicious PDF can still block /api/upload handlers.
- ❌ Event loop blocked preventing all other API responses.
- ⚠️ DoS risk remains despite added parse timeout wrapper.
Steps of Reproduction ✅
1. Start the backend server so that `backend/src/index.js` mounts the upload router at
`app.use('/api/upload', uploadRoutes);` (lines 78–81), which exposes `POST /api/upload/`
and `POST /api/upload/extract-text` via `backend/src/routes/upload.js`.

2. From a client, send a crafted, pathologically complex PDF (e.g., deeply nested objects
/ huge page count) to `POST /api/upload/` so the request is handled by the route at
`backend/src/routes/upload.js:19-55`, which reads the file buffer at line 25 and calls
`const pdfData = await parseWithTimeout(fileBuffer);` at line 26.

3. Inside `parseWithTimeout` at `backend/src/routes/upload.js:11-17`, `pdfParse(buffer)`
(line 13) begins heavy, CPU-bound parsing in the main Node.js process; while this
synchronous work is executing, the event loop is occupied and the competing timeout
created by `setTimeout(..., ms)` at lines 14-15 cannot actually run until parsing yields
back to the event loop.

4. Under these conditions, observe that other requests (e.g., `GET /health` implemented in
`backend/src/index.js:68-73`) hang and the process CPU remains saturated even after the
intended 8-second timeout, demonstrating that the `Promise.race` wrapper does not preempt
or cancel the underlying `pdfParse` work and the DoS risk from worst-case PDFs remains.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** backend/src/routes/upload.js
**Line:** 11:17
**Comment:**
	*Incomplete Implementation: The timeout wrapper does not actually stop `pdfParse`; `Promise.race` only rejects early, but the parser continues running in the background. If parsing is CPU-bound (the exact DoS case described), this still blocks the event loop and the timeout cannot preempt it, so the DoS mitigation is ineffective. Run parsing in an isolated worker/process and terminate that worker on timeout (or use a parser that supports abort/cancellation).

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +11 to +17

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clear the setTimeout timer to prevent minor resource leak.

When pdfParse completes before the timeout, the setTimeout timer continues running until it fires (8 seconds later), at which point it attempts to reject an already-settled promise (a no-op). While Node.js will eventually garbage-collect the timer, explicitly clearing it is cleaner and prevents unnecessary timer callbacks from piling up under high load.

⏱️ Proposed fix to clear the timer
-const parseWithTimeout = (buffer, ms = 8000) =>
-  Promise.race([
-    pdfParse(buffer),
-    new Promise((_, reject) =>
-      setTimeout(() => reject(new Error('PDF parsing timed out')), ms)
-    )
-  ]);
+const parseWithTimeout = (buffer, ms = 8000) => {
+  let timeoutId;
+  const timeoutPromise = new Promise((_, reject) => {
+    timeoutId = setTimeout(() => reject(new Error('PDF parsing timed out')), ms);
+  });
+  
+  return Promise.race([
+    pdfParse(buffer).finally(() => clearTimeout(timeoutId)),
+    timeoutPromise
+  ]);
+};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const parseWithTimeout = (buffer, ms = 8000) =>
Promise.race([
pdfParse(buffer),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('PDF parsing timed out')), ms)
)
]);
const parseWithTimeout = (buffer, ms = 8000) => {
let timeoutId;
const timeoutPromise = new Promise((_, reject) => {
timeoutId = setTimeout(() => reject(new Error('PDF parsing timed out')), ms);
});
return Promise.race([
pdfParse(buffer).finally(() => clearTimeout(timeoutId)),
timeoutPromise
]);
};
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/routes/upload.js` around lines 11 - 17, The parseWithTimeout
function has a resource leak where the setTimeout timer continues running even
after pdfParse completes successfully, causing unnecessary timer callbacks to
accumulate under high load. Store the timeout ID returned by setTimeout, then
use Promise.then() or a finally block to explicitly clear the timer with
clearTimeout once the Promise.race settles (regardless of whether pdfParse
succeeds or the timeout fires), ensuring the timer callback is never executed
after the race is resolved.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check for existing worker thread usage and pdf-parse internals

# Check if worker_threads are used anywhere
echo "=== Checking for worker_threads usage ==="
rg -n "worker_threads|Worker\(" --type=js

# Check pdf-parse package details and dependencies
echo -e "\n=== Checking pdf-parse in package.json ==="
fd package.json --exec cat {} \; | jq '.dependencies["pdf-parse"], .devDependencies["pdf-parse"]'

# Look for any existing timeout or worker patterns in other routes
echo -e "\n=== Checking for similar timeout patterns ==="
rg -n "Promise\.race|setTimeout.*reject|parseWith" --type=js -g '!node_modules'

Repository: anurag3407/career-pilot

Length of output: 1477


🏁 Script executed:

#!/bin/bash

# Check pdf-parse library details
echo "=== Checking package.json for queue/worker dependencies ==="
cat backend/package.json | jq '.dependencies | keys[] | select(. | contains("bull") or contains("queue") or contains("worker"))'

# Look at error handling in the upload routes to verify claim about generic ApiError(400)
echo -e "\n=== Checking error handling in upload.js ==="
sed -n '43,45p;73,75p' backend/src/routes/upload.js

# Check asyncHandler implementation
echo -e "\n=== Checking asyncHandler wrapper ==="
rg -A 5 "const asyncHandler" backend/src --type=js

# Look for any HTTP status codes used for timeout errors elsewhere
echo -e "\n=== Checking HTTP status code patterns for timeout ==="
rg "408|422|timeout" backend/src/routes --type=js -i

Repository: anurag3407/career-pilot

Length of output: 1341


🏁 Script executed:

#!/bin/bash

# Check what pdfParse actually is and how it's imported
echo "=== Checking pdfParse import and usage ==="
head -20 backend/src/routes/upload.js | grep -A 2 "pdfParse"

# Look for any cleanup logic or resource management in finally blocks
echo -e "\n=== Checking cleanup in finally blocks (lines 46-54, 76-84) ==="
sed -n '46,54p;76,84p' backend/src/routes/upload.js

# Check if there are any comments about pdf-parse behavior or performance
echo -e "\n=== Searching for pdf-parse related comments ==="
rg -B 2 -A 2 "pdf-parse|pdfParse" backend/src/routes/upload.js --type=js

# Check actual Promise.race implementation to see timer cleanup
echo -e "\n=== Checking Promise.race implementation (lines 11-17) ==="
sed -n '11,17p' backend/src/routes/upload.js

Repository: anurag3407/career-pilot

Length of output: 1486


🏁 Script executed:

#!/bin/bash

# Check pdf-parse npm package info to understand its actual behavior
echo "=== Checking pdf-parse implementation details ==="
rg "pdfParse" backend/src/routes/upload.js -B 3 -A 8

# Look for any other files that might handle PDF processing differently
echo -e "\n=== Searching for other PDF handling in codebase ==="
fd -e js -e ts | xargs rg "pdf-parse|pdfParse" --type=js 2>/dev/null

# Check if there are any documented performance issues or TODOs
echo -e "\n=== Checking for performance TODOs or comments ==="
rg "TODO|FIXME|NOTE|BUG|PERF" backend/src/routes/upload.js --type=js

# Verify actual file structure and test if parseWithTimeout is used
echo -e "\n=== Verifying complete upload.js structure ==="
wc -l backend/src/routes/upload.js

Repository: anurag3407/career-pilot

Length of output: 1358


🏁 Script executed:

#!/bin/bash

# Check if there's any documentation or comments about pdf-parse's async behavior
# Also check how pdfParse is used in the input.controller.js
echo "=== Checking pdf-parse usage in input.controller.js ==="
sed -n '1,50p' backend/src/controllers/input.controller.js | grep -A 10 "pdfParse"

# Search for any performance tests or metrics
echo -e "\n=== Checking test files for pdf-parse behavior ==="
fd -e test.js -e spec.js -e .test.js | xargs rg "pdf-parse|pdfParse" --type=js 2>/dev/null | head -20

# Let's verify what the actual API error status codes are in the errorHandler
echo -e "\n=== Checking ApiError implementation ==="
rg -A 15 "class ApiError|export.*ApiError" backend/src/middleware/errorHandler.js --type=js

Repository: anurag3407/career-pilot

Length of output: 1156


🏁 Script executed:

#!/bin/bash

# Check git history or PR/issue context for status code requirements
echo "=== Checking for PR objectives or issue context ==="
git log --all --oneline --grep="408\|422\|timeout\|status" | head -20

# Look for any existing timeout error handling patterns
echo -e "\n=== Checking how other timeouts are handled ==="
rg "new Error.*timeout|timeout.*Error" backend/src --type=js -i -B 2 -A 2

# Check if there's a pattern for setting specific HTTP status codes for timeouts
echo -e "\n=== Checking status codes in error handler ==="
rg "408|422|\.statusCode" backend/src/middleware/errorHandler.js --type=js

# Verify the exact behavior - does the timer callback fire after resolution?
echo -e "\n=== Checking Node.js documentation comment or any mitigation ==="
rg -B 5 "Promise.race" backend/src/routes/upload.js --type=js

Repository: anurag3407/career-pilot

Length of output: 4174


Promise.race only partially mitigates the DoS vulnerability; timer cleanup also leaks.

After the timeout fires and rejects, the pdfParse(buffer) operation continues executing in the background because Promise.race cannot cancel the losing promise. This allows an attacker to degrade server performance by uploading multiple crafted PDFs that trigger timeouts—each parse operation will run to completion in the background while blocking the event loop.

Additionally, the setTimeout is never cleared when pdfParse completes first, causing the timer to fire later and attempt to reject an already-settled promise (a minor resource leak). Compare this to the timeout pattern used in backend/src/config/redis.js, which explicitly clears the timeout on success.

The current implementation improves user experience by returning an error within 8 seconds rather than hanging indefinitely, but it does not prevent resource exhaustion. Consider offloading PDF parsing to a background job queue (similar to the existing BullMQ pattern used elsewhere in the codebase) or implement explicit timer cleanup at minimum.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/src/routes/upload.js` around lines 11 - 17, The parseWithTimeout
function does not clean up the setTimeout when pdfParse completes successfully,
and it does not cancel the pdfParse operation when the timeout fires, both of
which cause resource leaks. Modify parseWithTimeout to explicitly clear the
timeout when pdfParse resolves first, following the cleanup pattern used in
backend/src/config/redis.js. Additionally, investigate if pdfParse supports
cancellation (such as through AbortSignal) and pass an abort signal to stop the
operation when the timeout fires. If cancellation is not supported by the PDF
parsing library, consider offloading PDF parsing to a background job queue using
the existing BullMQ pattern elsewhere in the codebase to prevent background
operations from blocking the event loop and degrading server performance during
attacks.


// Upload and extract text from PDF
router.post('/', verifyToken, handleUpload, validateUpload, asyncHandler(async (req, res) => {
Expand All @@ -16,7 +23,7 @@ router.post('/', verifyToken, handleUpload, validateUpload, asyncHandler(async (
}
try {
const fileBuffer = await fs.readFile(req.file.path);
const pdfData = await pdfParse(fileBuffer);
const pdfData = await parseWithTimeout(fileBuffer);
const resumeId = uuidv4();

res.json({
Expand Down Expand Up @@ -54,7 +61,7 @@ router.post('/extract-text', verifyToken, handleUpload, validateUpload, asyncHan
}
try {
const fileBuffer = await fs.readFile(req.file.path);
const pdfData = await pdfParse(fileBuffer);
const pdfData = await parseWithTimeout(fileBuffer);

res.json({
success: true,
Expand Down