|
@animalize, thanks for your PR! By analyzing the history of the files in this pull request, we identified @hyeshik, @bitdancer and @Yhg1s to be potential reviewers. |
There was a problem hiding this comment.
Need an entry in Misc/NEWS.
| c4 = INBYTE4; | ||
| if (c < 0x81 || c3 < 0x81 || c4 < 0x30 || c4 > 0x39) | ||
| if (c < 0x81 || c > 0xFE || c3 < 0x81 || c3 > 0xFE || | ||
| c4 < 0x30 || c4 > 0x39) |
There was a problem hiding this comment.
In the issue I saw discussions about different standards (GB18030-2000 and GB18030-2005).
In order to avoid ambiguity, I would add at the beginning of the file a comment with a link to the standard implemented by this code.
Additional links to sections that describes specific exceptions/algorithms/ranges (such as the one being changed by this patch) are also useful.
There was a problem hiding this comment.
The two different standards are compatible (ignore the draft one, it's not a standard) and doesn't make any difference here.
And about links, I think the only one worth considering here is the wikipedia one, others are all non-authoritative. The truly authoritative one is reserved by the company and you have to pay for it :-(. But if anyone is interested in it, google GB18030 will lead you to wikipedia.
| (b"abc\x81\x30\x81\x30def", "strict", 'abc\x80def'), | ||
| (b"abc\x86\x30\x81\x30def", "replace", 'abc\ufffd0\ufffd0def'), | ||
| # issue29990 | ||
| (b"\x81\x30\xFF\x30", "strict", None), |
There was a problem hiding this comment.
I would add a few more tests, in particular:
\xFF;"replace" error handler.|
I will reply later, have a good weekend :) |
I found this doesn't exist, when the first byte is It can't pass the test So I removed
I have tested, our GB18030-2000 codec is follow this file. This page also mentioned the issue of 0x80: I investigated our GB2312/GBK/GB18030 codecs two years ago, I'm glad to share my summary:
Make a long story short: no (big) problem in GB2312/GBK/GB18030 codecs.
IMO Wikipedia is not authoritative enough because everyone can edit it, but it's a very good referrence.
Libray is a choice, I searched, "National library of China" has a copy of GB18030-2000 standard, but it seems we don't need to look up it. |
|
Why close? |
recreate for CLA check.
http://bugs.python.org/issue29990