[Bug] Web Scraper Hangs on Large Documents - Memory Leak in HTML Parser

## Description

Scraper hangs indefinitely when parsing large HTML documents (>50MB). Memory usage grows unbounded, eventually causing out-of-memory crash. No timeout or resource limits implemented.

## Steps to Reproduce

1. Scrape website with large HTML (50MB+)
2. Parser processes entire document into memory
3. Memory grows to 2-4 GB
4. Process crashes

## Environment Information

- Framework: Node.js
- Parser: Cheerio/jsdom
- Memory: Unbounded
- Application version: Current main branch

## Expected Behavior

Parser uses streaming for large documents. Max file size enforced (10MB). Request timeout (30s). Memory-efficient chunked parsing.

## Actual Behavior

File: src/scrapers/parser.js
Loads entire HTML into memory: fs.readFileSync(large_file)

## Code Reference

File: src/scrapers/parser.js
Missing: Stream processing, memory limits, timeout

## Additional Context

Use streams:
```javascript
fs.createReadStream('large.html')
  .pipe(parseStream)
  .on('error', () => timeout());
```

**GSSoC Points Estimate:** Level 2 (Performance/Memory)

## Suggested Labels

- gssoc:approved
- type:bug
- severity:high
- area:performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Web Scraper Hangs on Large Documents - Memory Leak in HTML Parser #134

Description

Steps to Reproduce

Environment Information

Expected Behavior

Actual Behavior

Code Reference

Additional Context

Suggested Labels

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Web Scraper Hangs on Large Documents - Memory Leak in HTML Parser #134

Description

Description

Steps to Reproduce

Environment Information

Expected Behavior

Actual Behavior

Code Reference

Additional Context

Suggested Labels

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions