fix: sources_count + MCP v1.26 compat#157
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses two critical issues within the NotebookLM Model Context Protocol (MCP) server: ensuring compatibility with recent Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Glad I could help with this, it is an awesome, awesome thing and I appreciate all of you. |
There was a problem hiding this comment.
Code Review
This pull request primarily addresses two bug fixes: correctly calculating sources_count for notebooks and ensuring compatibility with MCP v1.26.0 by updating a parameter name. It also introduces a new mcp_server.py file to expose the NotebookLM API as MCP tools, granting full programmatic access to notebooks, sources, chat, and artifacts. Critically, this new MCP server introduces several path traversal vulnerabilities in its artifact download tools. These tools accept an output_path from the LLM and write files to the local filesystem without sanitization, which could be exploited via prompt injection to overwrite sensitive user files. Beyond this, the mcp_server.py file could benefit from improvements in client lifecycle management, removal of direct sys.path manipulation, refactoring to reduce code duplication, and ensuring proper serialization of all data types to enhance its overall maintainability and robustness.
| async def download_audio( | ||
| notebook_id: str, output_path: str, artifact_id: str | None = None | ||
| ) -> str: | ||
| """Download an Audio Overview to a local file. | ||
|
|
||
| Args: | ||
| notebook_id: The notebook ID. | ||
| output_path: Local file path to save the audio (e.g., "podcast.mp3"). | ||
| artifact_id: Specific artifact ID. If None, downloads the first completed audio. | ||
|
|
||
| Returns the output file path. | ||
| """ | ||
| client = await get_client() | ||
| path = await client.artifacts.download_audio(notebook_id, output_path, artifact_id=artifact_id) | ||
| return json.dumps({"downloaded": str(path)}) | ||
|
|
There was a problem hiding this comment.
The download_audio tool accepts an output_path argument without any sanitization. This allows an attacker to perform a path traversal attack via prompt injection, potentially overwriting sensitive files on the user's machine (e.g., ~/.bashrc, ~/.ssh/authorized_keys). Please sanitize the output_path to ensure it remains within a safe directory and does not contain directory traversal sequences like ...
There was a problem hiding this comment.
Fixed in 127c76d — added
_sanitize_output_path()
to all 8 download tools. It rejects .. traversal, strips absolute paths/drive letters, confines writes to a configurable ALLOWED_DOWNLOAD_DIR (default ./downloads/), and blocks sensitive directories (.ssh, .gnupg, System32, etc.). Thanks for catching this — it was a critical oversight.
| async def download_video( | ||
| notebook_id: str, output_path: str, artifact_id: str | None = None | ||
| ) -> str: | ||
| """Download a Video Overview to a local file. | ||
|
|
||
| Args: | ||
| notebook_id: The notebook ID. | ||
| output_path: Local file path to save the video (e.g., "overview.mp4"). | ||
| artifact_id: Specific artifact ID. If None, downloads the first completed video. | ||
|
|
||
| Returns the output file path. | ||
| """ | ||
| client = await get_client() | ||
| path = await client.artifacts.download_video(notebook_id, output_path, artifact_id=artifact_id) | ||
| return json.dumps({"downloaded": str(path)}) | ||
|
|
There was a problem hiding this comment.
The download_video tool accepts an output_path argument without any sanitization. This allows an attacker to perform a path traversal attack via prompt injection, potentially overwriting sensitive files on the user's machine. Please sanitize the output_path to ensure it remains within a safe directory.
There was a problem hiding this comment.
Addressed together with all download tools in 127c76d.
| async def download_quiz( | ||
| notebook_id: str, | ||
| output_path: str, | ||
| output_format: str = "json", | ||
| artifact_id: str | None = None, | ||
| ) -> str: | ||
| """Download a quiz to a local file. | ||
|
|
||
| Args: | ||
| notebook_id: The notebook ID. | ||
| output_path: Local file path to save the quiz. | ||
| output_format: Format — "json", "markdown", or "html". | ||
| artifact_id: Specific artifact ID. If None, downloads the first quiz. | ||
|
|
||
| Returns the output file path. | ||
| """ | ||
| client = await get_client() | ||
| path = await client.artifacts.download_quiz( | ||
| notebook_id, output_path, output_format=output_format, artifact_id=artifact_id | ||
| ) | ||
| return json.dumps({"downloaded": str(path)}) | ||
|
|
||
|
|
There was a problem hiding this comment.
The download_quiz tool accepts an output_path argument without any sanitization. This allows an attacker to perform a path traversal attack via prompt injection, potentially overwriting sensitive files on the user's machine. Please sanitize the output_path to ensure it remains within a safe directory.
There was a problem hiding this comment.
Addressed together with all download tools in 127c76d.
| async def download_flashcards( | ||
| notebook_id: str, | ||
| output_path: str, | ||
| output_format: str = "json", | ||
| artifact_id: str | None = None, | ||
| ) -> str: | ||
| """Download flashcards to a local file. | ||
|
|
||
| Args: | ||
| notebook_id: The notebook ID. | ||
| output_path: Local file path to save the flashcards. | ||
| output_format: Format — "json", "markdown", or "html". | ||
| artifact_id: Specific artifact ID. If None, downloads the first flashcards. | ||
|
|
||
| Returns the output file path. | ||
| """ | ||
| client = await get_client() | ||
| path = await client.artifacts.download_flashcards( | ||
| notebook_id, output_path, output_format=output_format, artifact_id=artifact_id | ||
| ) | ||
| return json.dumps({"downloaded": str(path)}) | ||
|
|
||
|
|
There was a problem hiding this comment.
The download_flashcards tool accepts an output_path argument without any sanitization. This allows an attacker to perform a path traversal attack via prompt injection, potentially overwriting sensitive files on the user's machine. Please sanitize the output_path to ensure it remains within a safe directory.
There was a problem hiding this comment.
Addressed together with all download tools in 127c76d.
| async def download_slide_deck( | ||
| notebook_id: str, output_path: str, artifact_id: str | None = None | ||
| ) -> str: | ||
| """Download a slide deck to a local file (PDF or PPTX). | ||
|
|
||
| Args: | ||
| notebook_id: The notebook ID. | ||
| output_path: Local file path to save the slides (e.g., "slides.pdf" or "slides.pptx"). | ||
| artifact_id: Specific artifact ID. If None, downloads the first slide deck. | ||
|
|
||
| Returns the output file path. | ||
| """ | ||
| client = await get_client() | ||
| path = await client.artifacts.download_slide_deck( | ||
| notebook_id, output_path, artifact_id=artifact_id | ||
| ) | ||
| return json.dumps({"downloaded": str(path)}) | ||
|
|
||
|
|
There was a problem hiding this comment.
The download_slide_deck tool accepts an output_path argument without any sanitization. This allows an attacker to perform a path traversal attack via prompt injection, potentially overwriting sensitive files on the user's machine. Please sanitize the output_path to ensure it remains within a safe directory.
There was a problem hiding this comment.
Addressed together with all download tools in 127c76d.
| # Add src to path so we can import notebooklm | ||
| sys.path.insert(0, str(__import__("pathlib").Path(__file__).parent / "src")) |
| async def get_client() -> NotebookLMClient: | ||
| """Get or create the NotebookLM client singleton.""" | ||
| global _client | ||
| if _client is None or not _client.is_connected: | ||
| _client = await NotebookLMClient.from_storage() | ||
| await _client.__aenter__() | ||
| return _client |
There was a problem hiding this comment.
The get_client function calls _client.__aenter__() to open the client connection but there is no corresponding call to _client.__aexit__() to close it. This can lead to resource leaks, such as open network connections not being properly terminated when the server shuts down. A better approach is to manage the client's lifecycle along with the server's lifecycle.
Many modern async frameworks (like FastAPI, which FastMCP may be based on) support a lifespan context manager. Consider using one to manage the client's connection.
from contextlib import asynccontextmanager
from mcp.server.fastmcp import FastMCP
@asynccontextmanager
async def lifespan(app: FastMCP):
"""Manage the NotebookLM client lifecycle."""
global _client
_client = await NotebookLMClient.from_storage()
await _client.__aenter__()
yield
if _client:
await _client.__aexit__(None, None, None)
# When creating the mcp server
mcp = FastMCP(
"NotebookLM",
instructions="...",
lifespan=lifespan,
)
# And simplify get_client
async def get_client() -> NotebookLMClient:
"""Get the initialized NotebookLM client singleton."""
global _client
if _client is None:
# This should not happen if lifespan is used correctly
raise RuntimeError("NotebookLMClient not initialized.")
return _clientIf FastMCP doesn't support a lifespan argument, you should find another way to ensure _client.__aexit__ is called on server shutdown.
| def _serialize(obj: Any) -> Any: | ||
| """Serialize dataclass/object to JSON-safe dict.""" | ||
| if obj is None: | ||
| return None | ||
| if isinstance(obj, (str, int, float, bool)): | ||
| return obj | ||
| if isinstance(obj, list): | ||
| return [_serialize(item) for item in obj] | ||
| if isinstance(obj, dict): | ||
| return {k: _serialize(v) for k, v in obj.items()} | ||
| if hasattr(obj, "__dataclass_fields__"): | ||
| return {k: _serialize(getattr(obj, k)) for k in obj.__dataclass_fields__} | ||
| if hasattr(obj, "__dict__"): | ||
| return {k: _serialize(v) for k, v in obj.__dict__.items() if not k.startswith("_")} | ||
| return str(obj) |
There was a problem hiding this comment.
The _serialize function does not explicitly handle datetime objects. It will fall back to str(obj), which produces a string representation that is not a standard format like ISO 8601. This can make parsing the JSON on the client side more difficult. It's better to add specific handling for datetime objects to serialize them to ISO 8601 format.
| def _serialize(obj: Any) -> Any: | |
| """Serialize dataclass/object to JSON-safe dict.""" | |
| if obj is None: | |
| return None | |
| if isinstance(obj, (str, int, float, bool)): | |
| return obj | |
| if isinstance(obj, list): | |
| return [_serialize(item) for item in obj] | |
| if isinstance(obj, dict): | |
| return {k: _serialize(v) for k, v in obj.items()} | |
| if hasattr(obj, "__dataclass_fields__"): | |
| return {k: _serialize(getattr(obj, k)) for k in obj.__dataclass_fields__} | |
| if hasattr(obj, "__dict__"): | |
| return {k: _serialize(v) for k, v in obj.__dict__.items() if not k.startswith("_")} | |
| return str(obj) | |
| def _serialize(obj: Any) -> Any: | |
| """Serialize dataclass/object to JSON-safe dict.""" | |
| if obj is None: | |
| return None | |
| if isinstance(obj, (str, int, float, bool)): | |
| return obj | |
| if isinstance(obj, __import__("datetime").datetime): | |
| return obj.isoformat() | |
| if isinstance(obj, list): | |
| return [_serialize(item) for item in obj] | |
| if isinstance(obj, dict): | |
| return {k: _serialize(v) for k, v in obj.items()} | |
| if hasattr(obj, "__dataclass_fields__"): | |
| return {k: _serialize(getattr(obj, k)) for k in obj.__dataclass_fields__} | |
| if hasattr(obj, "__dict__"): | |
| return {k: _serialize(v) for k, v in obj.__dict__.items() if not k.startswith("_")} | |
| return str(obj) |
There was a problem hiding this comment.
"Good catch — that's a valid improvement. I kept this PR scoped to the two bug fixes (sources_count + MCP v1.26 compat) to keep it clean, but happy to address this in a follow-up PR if the maintainer wants."
|
|
||
| Returns success status. | ||
| """ | ||
| from notebooklm import ChatGoal, ChatResponseLength |
There was a problem hiding this comment.
Placing imports inside functions is generally discouraged. It can lead to repeated import overhead and makes it harder to see all of the file's dependencies at a glance. These imports should be moved to the top of the file. This applies to several other functions in this file as well (e.g., list_artifacts, generate_audio).
| qty = None | ||
| if quantity: | ||
| qty_map = { | ||
| "less": QuizQuantity.LESS, | ||
| "default": QuizQuantity.DEFAULT, | ||
| "more": QuizQuantity.MORE, | ||
| } | ||
| qty = qty_map.get(quantity.lower()) | ||
|
|
||
| diff = None | ||
| if difficulty: | ||
| diff_map = { | ||
| "easy": QuizDifficulty.EASY, | ||
| "medium": QuizDifficulty.MEDIUM, | ||
| "hard": QuizDifficulty.HARD, | ||
| } | ||
| diff = diff_map.get(difficulty.lower()) |
There was a problem hiding this comment.
The logic for mapping quantity and difficulty strings to their corresponding enums is duplicated in generate_quiz and generate_flashcards. This duplicated code can be extracted into a helper function to improve maintainability.
For example:
def _parse_quiz_options(quantity: str | None, difficulty: str | None) -> tuple[QuizQuantity | None, QuizDifficulty | None]:
"""Parse quantity and difficulty strings into enums."""
from notebooklm import QuizDifficulty, QuizQuantity
qty = None
if quantity:
qty_map = {
"less": QuizQuantity.LESS,
"default": QuizQuantity.DEFAULT,
"more": QuizQuantity.MORE,
}
qty = qty_map.get(quantity.lower())
diff = None
if difficulty:
diff_map = {
"easy": QuizDifficulty.EASY,
"medium": QuizDifficulty.MEDIUM,
"hard": QuizDifficulty.HARD,
}
diff = diff_map.get(difficulty.lower())
return qty, diffYou can then call this helper from both generate_quiz and generate_flashcards.
Addresses critical CWE-22 path traversal vulnerability identified by Gemini Code Assist in PR review. All 8 download_* functions now pass output_path through _sanitize_output_path() which: - Rejects '..' traversal sequences - Strips absolute paths (drive letters, leading slashes) - Confines writes to ALLOWED_DOWNLOAD_DIR (default: ./downloads/) - Blocks writes to sensitive directories (.ssh, .gnupg, System32, etc.) - Configurable via NOTEBOOKLM_DOWNLOAD_DIR env var
Summary
Two bug fixes for the NotebookLM MCP server:
1. Fix
sources_countalways returning 0 inNotebook.from_api_responseThe list_notebooks response includes the full sources array at
data[1], butNotebook.from_api_response()never extracted it. This causedsources_countto always default to0, even when notebooks had sources.Fix: Added extraction of
len(data[1])to populatesources_countcorrectly.2. Fix MCP server startup on
mcpv1.26.0+The
FastMCPconstructor inmcpv1.26.0 renamed the description parameter toinstructions, causing aTypeErroron server startup.Fix: Updated mcp_server.py to use
instructionsinstead of description.Testing