Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving Standard Input Encoding Issues: Wrapping sys.stdin with UTF-8 #112

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

blackwhite084
Copy link

This change ensures that the standard input stream (sys.stdin) is read with UTF-8 encoding by re-wrapping it using io.TextIOWrapper. This addresses potential encoding issues where the default system encoding might not be UTF-8 (e.g., GBK on some systems), leading to incorrect character interpretation.

Motivation and Context

In certain environments, the default encoding for sys.stdin might be something other than UTF-8 (like GBK). When the application expects UTF-8 encoded input, this discrepancy can lead to UnicodeDecodeError or incorrect interpretation of characters. This change ensures that regardless of the system's default locale, the input stream is treated as UTF-8, which is a more universal and recommended encoding for modern applications. This fixes a potential bug where the application might fail or behave unexpectedly when receiving non-ASCII characters through standard input in such environments.

How Has This Been Tested?

This change has been tested by:

  • Manually testing with input containing non-ASCII characters (e.g., 中文) in an environment where the default locale is set to GBK.
  • Verifying that the application correctly reads and processes these characters without encoding errors.
  • Confirming that the change does not negatively impact environments where the default locale is already UTF-8.

Ideally, more comprehensive testing would involve setting up CI jobs with different locales to ensure consistent behavior across various environments.

Breaking Changes

No, this is a non-breaking change. It addresses a potential issue with encoding and makes the application more robust. Users do not need to update their code or configurations.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

The decision to re-wrap sys.stdin with io.TextIOWrapper was made to ensure consistent UTF-8 encoding without modifying the underlying file descriptor or relying on environment variables. This approach is generally considered a safe and effective way to handle encoding issues with standard input in Python. It's important to note that the input source should ideally be sending UTF-8 encoded data for this fix to be fully effective. This change ensures that the application interprets the input as UTF-8.

fix encoding errr in some case
@blackwhite084 blackwhite084 changed the title Update stdio.py Resolving Standard Input Encoding Issues: Wrapping sys.stdin with UTF-8 Dec 24, 2024
@dsp-ant dsp-ant changed the base branch from main to v1.1.x January 2, 2025 08:52
@dsp-ant dsp-ant changed the base branch from v1.1.x to main January 2, 2025 08:52
@dsp-ant
Copy link
Member

dsp-ant commented Jan 2, 2025

This looks good. Thank you. I am curious if we can add a test for this. Also do you mind rebasing it on top of the v1.1.x branch since this is a bugfix?

In addition I am wondering if we need to also make sure that we write from the client side.

@dsp-ant dsp-ant self-requested a review January 2, 2025 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants