Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified the examples for mb_convert_kana, mb_detect_encoding, mb_encode_numericentity #4510

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 15 additions & 10 deletions reference/mbstring/functions/mb-convert-kana.xml
Original file line number Diff line number Diff line change
Expand Up @@ -200,23 +200,28 @@

<refsect1 role="examples">
&reftitle.examples;
<para>
<example>
<title><function>mb_convert_kana</function> example</title>
<programlisting role="php">
<example>
<title><function>mb_convert_kana</function> example</title>
<programlisting role="php">
<![CDATA[
<?php
/* Convert all "kana" to "zen-kaku" "kata-kana" */
$str = mb_convert_kana($str, "KVC");
/* Convert all "han-kaku" "kata-kana" to "zen-kaku" "hira-gana" */
echo mb_convert_kana('ヤマダ ハナコ', "HV") . "\n";

/* Convert "han-kaku" "kata-kana" to "zen-kaku" "kata-kana"
and "zen-kaku" alphanumeric to "han-kaku" */
$str = mb_convert_kana($str, "KVa");
echo mb_convert_kana('コウザバンゴウ 0123456', "KVa") . "\n";
?>
]]>
</programlisting>
</example>
</para>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
やまだ はなこ
コウザバンゴウ 0123456
]]>
</screen>
</example>
</refsect1>

</refentry>
Expand Down
31 changes: 27 additions & 4 deletions reference/mbstring/functions/mb-detect-encoding.xml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,17 @@
bytes form a valid string. If the input string contains such a sequence, that
encoding will be rejected, and the next encoding checked.
</para>

<warning>
<title>The result is not accurate</title>
<para>
The name of this function is misleading, it performs "guessing" rather than "detection".
</para>
<para>
The guesses are far from accurate, and therefore you cannot use this function to accurately
detect the correct character encoding.
</para>
</warning>
</refsect1>

<refsect1 role="parameters">
Expand Down Expand Up @@ -121,25 +132,37 @@
<programlisting role="php">
<![CDATA[
<?php

$str = "\x95\xB6\x8E\x9A\x83\x52\x81\x5B\x83\x68";

// Detect character encoding with current detect_order
echo mb_detect_encoding($str);
var_dump(mb_detect_encoding($str));

// "auto" is expanded according to mbstring.language
echo mb_detect_encoding($str, "auto");
var_dump(mb_detect_encoding($str, "auto"));

// Specify "encodings" parameter by list separated by comma
echo mb_detect_encoding($str, "JIS, eucjp-win, sjis-win");
var_dump(mb_detect_encoding($str, "JIS, eucjp-win, sjis-win"));

// Use array to specify "encodings" parameter
$encodings = [
"ASCII",
"JIS",
"EUC-JP"
];
echo mb_detect_encoding($str, $encodings);
var_dump(mb_detect_encoding($str, $encodings));
?>
]]>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
string(5) "ASCII"
string(5) "ASCII"
string(8) "SJIS-win"
string(5) "ASCII"
]]>
</screen>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think confuse to put example of output.

The result of mb_detect_encoding is different in several version. Ref: https://3v4l.org/rf0FG
Because this function is tuning in heuristics. Maybe behavior changes to this function.

@alexdowad Do you have any opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really would like that we add some huge warnings that this function is badly named and is far from accurate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Girgias
Added in da4f37f :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SakiTakamachi @Girgias Thanks. Looks good.

</example>
</para>
<para>
Expand Down
36 changes: 19 additions & 17 deletions reference/mbstring/functions/mb-encode-numericentity.xml
Original file line number Diff line number Diff line change
Expand Up @@ -130,27 +130,29 @@ $convmap = array (
<programlisting role="php">
<![CDATA[
<?php
/* Convert Left side of ISO-8859-1 to HTML numeric character reference */
$convmap = array(0x80, 0xff, 0, 0xff);
$str = mb_encode_numericentity($str, $convmap, "ISO-8859-1");

/* Convert user defined SJIS-win code in block 95-104 to numeric
string reference */
$convmap = array(
0xe000, 0xe03e, 0x1040, 0xffff,
0xe03f, 0xe0bb, 0x1041, 0xffff,
0xe0bc, 0xe0fa, 0x1084, 0xffff,
0xe0fb, 0xe177, 0x1085, 0xffff,
0xe178, 0xe1b6, 0x10c8, 0xffff,
0xe1b7, 0xe233, 0x10c9, 0xffff,
0xe234, 0xe272, 0x110c, 0xffff,
0xe273, 0xe2ef, 0x110d, 0xffff,
0xe2f0, 0xe32e, 0x1150, 0xffff,
0xe32f, 0xe3ab, 0x1151, 0xffff );
$str = mb_encode_numericentity($str, $convmap, "sjis-win");
$str = "aAæÆあア𩸽";

/* Convert all UTF8 characters up to 4 bytes to HTML numeric character reference */
$convmap = [0, 0x1FFFFF, 0, 0x10FFFF];
var_dump(mb_encode_numericentity($str, $convmap, "utf8"));

/* Converts only 2-byte and 4-byte UTF8 characters to HTML numeric character reference */
$convmap = [
0x80, 0x7FF, 0, 0x10FFFF,
0x10000, 0x1FFFFF, 0, 0x10FFFF,
];
var_dump(mb_encode_numericentity($str, $convmap, "utf8"));
?>
]]>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
string(46) "&#97;&#65;&#230;&#198;&#12354;&#12450;&#40509;"
string(28) "aA&#230;&#198;あア&#40509;"
]]>
</screen>
</example>
</para>
</refsect1>
Expand Down