Skip to content

Check downlodable link with get header#258

Open
dd-jy wants to merge 1 commit intomainfrom
dn_err
Open

Check downlodable link with get header#258
dd-jy wants to merge 1 commit intomainfrom
dn_err

Conversation

@dd-jy
Copy link
Contributor

@dd-jy dd-jy commented Mar 20, 2026

Description

check downlodable link with get header not a header

Summary by CodeRabbit

Bug Fixes

  • Improved accuracy of link downloadability detection by enhancing HTTP status code validation and filtering out HTML pages. Enhanced diagnostic warning messages for better troubleshooting.

Signed-off-by: 석지영/책임연구원/SW공학(연)Open Source TP <[email protected]>
@dd-jy dd-jy self-assigned this Mar 20, 2026
@dd-jy dd-jy added the bug fix [PR] Fix the bug label Mar 20, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

The is_downloadable() function now validates downloadability using a streamed GET request instead of a HEAD request, checks for HTTP error status codes (≥400), inspects Content-Type headers, and explicitly rejects HTML content with warning logs. The previous content-disposition attachment indicator check has been removed.

Changes

Cohort / File(s) Summary
HTTP Downloadability Check
src/fosslight_util/download.py
Modified is_downloadable() to use GET requests with redirects and timeout instead of HEAD requests, reject responses with status >= 400, filter text/html content types with warning logs, and remove content-disposition attachment matching logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Check downlodable link with get header' is partially related to the changeset. It mentions using GET headers instead of HEAD, which aligns with the main technical change, but contains a typo ('downlodable' instead of 'downloadable') and is slightly vague about the specific behavioral change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dn_err
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Flake8 can be used to improve the quality of Python code reviews.

Flake8 is a Python linter that wraps PyFlakes, pycodestyle and Ned Batchelder's McCabe script.

To configure Flake8, add a '.flake8' or 'setup.cfg' file to your project root.

See Flake8 Documentation for more details.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/fosslight_util/download.py (1)

61-73: LGTM! Clean implementation using GET with streaming.

The change from HEAD to GET with stream=True is a good approach—some servers return different headers for HEAD requests or don't support it well. Key observations:

  • Context manager ensures proper connection cleanup even with stream=True
  • Only headers are fetched since the body isn't accessed
  • Status code check and HTML content-type rejection are sensible safeguards
💡 Optional: Consider adding a brief inline comment

A short comment explaining the rationale could help future maintainers understand the design choice:

 def is_downloadable(url):
     try:
+        # Use GET with stream=True instead of HEAD for better server compatibility
         with requests.get(url, stream=True, allow_redirects=True, timeout=10) as r:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fosslight_util/download.py` around lines 61 - 73, Add a short inline
comment inside the is_downloadable function explaining why GET with stream=True
is used instead of HEAD (some servers mis-handle HEAD or return different
headers) and noting that response content is not consumed so the connection is
cleaned up by the context manager; place the comment near the requests.get(...)
call and keep it concise so future maintainers understand the design choice
without changing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/fosslight_util/download.py`:
- Around line 61-73: Add a short inline comment inside the is_downloadable
function explaining why GET with stream=True is used instead of HEAD (some
servers mis-handle HEAD or return different headers) and noting that response
content is not consumed so the connection is cleaned up by the context manager;
place the comment near the requests.get(...) call and keep it concise so future
maintainers understand the design choice without changing behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ed3eead0-ef8d-48d1-8956-19878cc3d7d0

📥 Commits

Reviewing files that changed from the base of the PR and between fd65ca1 and f032fc3.

📒 Files selected for processing (1)
  • src/fosslight_util/download.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix [PR] Fix the bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant