Skip to content

Add an example to show how to handle a non-UTF-8 page #39385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

def00111
Copy link
Contributor

@def00111 def00111 commented May 4, 2025

Description

The example shows two things:

  1. How to get the charset from the page meta tag (using normal utf-8 TextDecoder)
  2. Using an external library because TextEncoder will not work

Motivation

All other examples are for a UTF-8 page.

Additional details

Related issues and pull requests

@def00111 def00111 requested a review from a team as a code owner May 4, 2025 14:11
@def00111 def00111 requested review from dotproto and removed request for a team May 4, 2025 14:11
@github-actions github-actions bot added Content:WebExt WebExtensions docs size/m [PR only] 51-500 LoC changed labels May 4, 2025
// Get the charset from the page meta tag
const part = decoder.decode(combinedArray.slice(0, 1000));
const charset = part.match(/<meta charset="(.+?)">/)[1];
// Creates a new TextDecoder object with the label "shift_jis"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mdn-linter] reported by reviewdog 🐶

Suggested change
// Creates a new TextDecoder object with the label "shift_jis"
// Creates a new TextDecoder object with the label "shift_jis"

Copy link
Contributor

github-actions bot commented May 4, 2025

Preview URLs

(comment last updated: 2025-05-10 12:14:21)

@dotproto
Copy link
Collaborator

This is the first I've heard of fast-sjis-encoder and I'm extremely uncomfortable with including a third party library in example documentation.

You indirectly mentioned that TextEncoder only supports encoding content in UTF-8. Rather than try to encode the modified document back into shift_jis, have you considered updating the charset of the document to utf-8 and using TextEncoder to re-encode the data stream?

Also tagging @rebloor and @pepelsbey in case there are MDN editorial considerations that I'm not aware of here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content:WebExt WebExtensions docs size/m [PR only] 51-500 LoC changed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants