-
Notifications
You must be signed in to change notification settings - Fork 763
fix: wrap pdfParse in timeout to prevent event loop DoS #4138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -8,6 +8,13 @@ import { asyncHandler, ApiError } from '../middleware/errorHandler.js'; | |||||||||||||||||||||||||||||||||||||
| import { validateUpload } from '../middleware/uploadValidator.js'; | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| const router = express.Router(); | ||||||||||||||||||||||||||||||||||||||
| const parseWithTimeout = (buffer, ms = 8000) => | ||||||||||||||||||||||||||||||||||||||
| Promise.race([ | ||||||||||||||||||||||||||||||||||||||
| pdfParse(buffer), | ||||||||||||||||||||||||||||||||||||||
| new Promise((_, reject) => | ||||||||||||||||||||||||||||||||||||||
| setTimeout(() => reject(new Error('PDF parsing timed out')), ms) | ||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||
| ]); | ||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+11
to
+17
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clear the setTimeout timer to prevent minor resource leak. When ⏱️ Proposed fix to clear the timer-const parseWithTimeout = (buffer, ms = 8000) =>
- Promise.race([
- pdfParse(buffer),
- new Promise((_, reject) =>
- setTimeout(() => reject(new Error('PDF parsing timed out')), ms)
- )
- ]);
+const parseWithTimeout = (buffer, ms = 8000) => {
+ let timeoutId;
+ const timeoutPromise = new Promise((_, reject) => {
+ timeoutId = setTimeout(() => reject(new Error('PDF parsing timed out')), ms);
+ });
+
+ return Promise.race([
+ pdfParse(buffer).finally(() => clearTimeout(timeoutId)),
+ timeoutPromise
+ ]);
+};📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Description: Check for existing worker thread usage and pdf-parse internals
# Check if worker_threads are used anywhere
echo "=== Checking for worker_threads usage ==="
rg -n "worker_threads|Worker\(" --type=js
# Check pdf-parse package details and dependencies
echo -e "\n=== Checking pdf-parse in package.json ==="
fd package.json --exec cat {} \; | jq '.dependencies["pdf-parse"], .devDependencies["pdf-parse"]'
# Look for any existing timeout or worker patterns in other routes
echo -e "\n=== Checking for similar timeout patterns ==="
rg -n "Promise\.race|setTimeout.*reject|parseWith" --type=js -g '!node_modules'Repository: anurag3407/career-pilot Length of output: 1477 🏁 Script executed: #!/bin/bash
# Check pdf-parse library details
echo "=== Checking package.json for queue/worker dependencies ==="
cat backend/package.json | jq '.dependencies | keys[] | select(. | contains("bull") or contains("queue") or contains("worker"))'
# Look at error handling in the upload routes to verify claim about generic ApiError(400)
echo -e "\n=== Checking error handling in upload.js ==="
sed -n '43,45p;73,75p' backend/src/routes/upload.js
# Check asyncHandler implementation
echo -e "\n=== Checking asyncHandler wrapper ==="
rg -A 5 "const asyncHandler" backend/src --type=js
# Look for any HTTP status codes used for timeout errors elsewhere
echo -e "\n=== Checking HTTP status code patterns for timeout ==="
rg "408|422|timeout" backend/src/routes --type=js -iRepository: anurag3407/career-pilot Length of output: 1341 🏁 Script executed: #!/bin/bash
# Check what pdfParse actually is and how it's imported
echo "=== Checking pdfParse import and usage ==="
head -20 backend/src/routes/upload.js | grep -A 2 "pdfParse"
# Look for any cleanup logic or resource management in finally blocks
echo -e "\n=== Checking cleanup in finally blocks (lines 46-54, 76-84) ==="
sed -n '46,54p;76,84p' backend/src/routes/upload.js
# Check if there are any comments about pdf-parse behavior or performance
echo -e "\n=== Searching for pdf-parse related comments ==="
rg -B 2 -A 2 "pdf-parse|pdfParse" backend/src/routes/upload.js --type=js
# Check actual Promise.race implementation to see timer cleanup
echo -e "\n=== Checking Promise.race implementation (lines 11-17) ==="
sed -n '11,17p' backend/src/routes/upload.jsRepository: anurag3407/career-pilot Length of output: 1486 🏁 Script executed: #!/bin/bash
# Check pdf-parse npm package info to understand its actual behavior
echo "=== Checking pdf-parse implementation details ==="
rg "pdfParse" backend/src/routes/upload.js -B 3 -A 8
# Look for any other files that might handle PDF processing differently
echo -e "\n=== Searching for other PDF handling in codebase ==="
fd -e js -e ts | xargs rg "pdf-parse|pdfParse" --type=js 2>/dev/null
# Check if there are any documented performance issues or TODOs
echo -e "\n=== Checking for performance TODOs or comments ==="
rg "TODO|FIXME|NOTE|BUG|PERF" backend/src/routes/upload.js --type=js
# Verify actual file structure and test if parseWithTimeout is used
echo -e "\n=== Verifying complete upload.js structure ==="
wc -l backend/src/routes/upload.jsRepository: anurag3407/career-pilot Length of output: 1358 🏁 Script executed: #!/bin/bash
# Check if there's any documentation or comments about pdf-parse's async behavior
# Also check how pdfParse is used in the input.controller.js
echo "=== Checking pdf-parse usage in input.controller.js ==="
sed -n '1,50p' backend/src/controllers/input.controller.js | grep -A 10 "pdfParse"
# Search for any performance tests or metrics
echo -e "\n=== Checking test files for pdf-parse behavior ==="
fd -e test.js -e spec.js -e .test.js | xargs rg "pdf-parse|pdfParse" --type=js 2>/dev/null | head -20
# Let's verify what the actual API error status codes are in the errorHandler
echo -e "\n=== Checking ApiError implementation ==="
rg -A 15 "class ApiError|export.*ApiError" backend/src/middleware/errorHandler.js --type=jsRepository: anurag3407/career-pilot Length of output: 1156 🏁 Script executed: #!/bin/bash
# Check git history or PR/issue context for status code requirements
echo "=== Checking for PR objectives or issue context ==="
git log --all --oneline --grep="408\|422\|timeout\|status" | head -20
# Look for any existing timeout error handling patterns
echo -e "\n=== Checking how other timeouts are handled ==="
rg "new Error.*timeout|timeout.*Error" backend/src --type=js -i -B 2 -A 2
# Check if there's a pattern for setting specific HTTP status codes for timeouts
echo -e "\n=== Checking status codes in error handler ==="
rg "408|422|\.statusCode" backend/src/middleware/errorHandler.js --type=js
# Verify the exact behavior - does the timer callback fire after resolution?
echo -e "\n=== Checking Node.js documentation comment or any mitigation ==="
rg -B 5 "Promise.race" backend/src/routes/upload.js --type=jsRepository: anurag3407/career-pilot Length of output: 4174 Promise.race only partially mitigates the DoS vulnerability; timer cleanup also leaks. After the timeout fires and rejects, the Additionally, the The current implementation improves user experience by returning an error within 8 seconds rather than hanging indefinitely, but it does not prevent resource exhaustion. Consider offloading PDF parsing to a background job queue (similar to the existing BullMQ pattern used elsewhere in the codebase) or implement explicit timer cleanup at minimum. 🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| // Upload and extract text from PDF | ||||||||||||||||||||||||||||||||||||||
| router.post('/', verifyToken, handleUpload, validateUpload, asyncHandler(async (req, res) => { | ||||||||||||||||||||||||||||||||||||||
|
|
@@ -16,7 +23,7 @@ router.post('/', verifyToken, handleUpload, validateUpload, asyncHandler(async ( | |||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||
| try { | ||||||||||||||||||||||||||||||||||||||
| const fileBuffer = await fs.readFile(req.file.path); | ||||||||||||||||||||||||||||||||||||||
| const pdfData = await pdfParse(fileBuffer); | ||||||||||||||||||||||||||||||||||||||
| const pdfData = await parseWithTimeout(fileBuffer); | ||||||||||||||||||||||||||||||||||||||
| const resumeId = uuidv4(); | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| res.json({ | ||||||||||||||||||||||||||||||||||||||
|
|
@@ -54,7 +61,7 @@ router.post('/extract-text', verifyToken, handleUpload, validateUpload, asyncHan | |||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||
| try { | ||||||||||||||||||||||||||||||||||||||
| const fileBuffer = await fs.readFile(req.file.path); | ||||||||||||||||||||||||||||||||||||||
| const pdfData = await pdfParse(fileBuffer); | ||||||||||||||||||||||||||||||||||||||
| const pdfData = await parseWithTimeout(fileBuffer); | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| res.json({ | ||||||||||||||||||||||||||||||||||||||
| success: true, | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The timeout wrapper does not actually stop
pdfParse;Promise.raceonly rejects early, but the parser continues running in the background. If parsing is CPU-bound (the exact DoS case described), this still blocks the event loop and the timeout cannot preempt it, so the DoS mitigation is ineffective. Run parsing in an isolated worker/process and terminate that worker on timeout (or use a parser that supports abort/cancellation). [incomplete implementation]Severity Level: Critical 🚨
Steps of Reproduction ✅
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖