Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode support for non-latin language #107

Open
Sivan22 opened this issue Dec 18, 2024 · 4 comments
Open

Unicode support for non-latin language #107

Sivan22 opened this issue Dec 18, 2024 · 4 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@Sivan22
Copy link

Sivan22 commented Dec 18, 2024

Describe the bug
when sending or recieving a non-latin text, the text becomes Jibrish (Non-sense or ?????)

To Reproduce
send a request to one of the tools with non-latin characters

Expected behavior
it should handle all unicode-supported languages

Desktop (please complete the following information):

  • OS: windows 11 home
@dsp-ant
Copy link
Member

dsp-ant commented Jan 3, 2025

Do you have a repro? I am trying this and for me it works. What is your System encoding? Maybe it's releated to #123

image

@dsp-ant dsp-ant added bug Something isn't working question Further information is requested labels Jan 3, 2025
@dsp-ant dsp-ant self-assigned this Jan 3, 2025
@Richard-Weiss
Copy link

Richard-Weiss commented Jan 4, 2025

TL;DR:
I use

stdin.reconfigure(encoding='utf-8')
stdout.reconfigure(encoding='utf-8')

at the top of my server and encoding='utf-8' for file operations for example.


Have a similar issue in my Python MCP server for other unicode stuff, I added this line at the top of my server.py which fixes it:

stdout.reconfigure(encoding='utf-8')

With mitigation suggested in #112:
image

With my mitigation:
image

There's a different issue when it's unicode only, the message doesn't even hit my own server before erroring out from the SDK I believe:
image

Code is just this inside the tool:

path = arguments.path
content = arguments.content

await PathValidator.validate_file_path(path)
file_path = await PathValidator.resolve_absolute_path(path)

logger.debug(f"Writing content to {file_path}")
logger.debug(f"Content: {content}")
async with aiofiles.open(file_path, 'w') as f:
    await f.write(content)
logger.debug(f"Wrote content to {file_path}")

The mitigation in #112 fixes that particular issue, but I get the other encoding problem back, even with my own mitigation.
What did work for me is including encoding='utf-8' for file operations when using both mitigations, the debug log in the inspector won't decode it correctly though, but at least it works:
image

I'm on Windows 10 and SDK version 1.2.0 and using the lowlevel server.

@Sivan22
Copy link
Author

Sivan22 commented Jan 4, 2025

thank you for that solution.
i've found another workaround using the environment variable PYTHONIOENCODING= "utf-8"

or using it in the mcp config :

{
  "mcpServers": {        
      "mcp_server": {
          "command": "uv",
          "args": [
              "--directory",
              "your/path/to/directory",
              "run",
              "mcp_server"
          ],
          "env": {
            "PYTHONIOENCODING": "utf-8" 
          }
      }
  }
}

this solution comes from modelcontextprotocol/servers#57

@Richard-Weiss
Copy link

Richard-Weiss commented Jan 4, 2025

Yeah, that's smart too. Claude suggested that, but I was unsure of where I could put the env variable because in the code would be too late for it to work.
Still need to include encoding='utf-8' when reading a file though.

Just noticed that it's actually better, because the MCP inspector output is then correct too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants