Issue3811
Created on 2008-09-09 05:37 by loewis, last changed 2008-09-11 06:05 by loewis. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| ucd51.diff.bz2 | loewis, 2008-09-09 05:37 | |||
| Messages (11) | |||
|---|---|---|---|
| msg72821 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2008-09-09 05:37 | |
This is a patch to update the Unicode database. It's mostly the imported data, but there were two code changes: - 5.1 changes the "mirrored" property for a character (U+0F3A), and the delta-to-3.2 code did not support that. I added a field into hange_record to support that kind of change. - 5.1 also added a character (U+1d79) whose upper-case version is far off (U+A77D), triggering a complaint that the delta can't be represented in 16 bits. I fixed that adding a flag into the ctype record indicating that deltas aren't used for that record. Fredrik, can you please review these changes? |
|||
| msg72941 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2008-09-10 04:51 | |
Guido, would you like to review? |
|||
| msg72946 - (view) | Author: Fredrik Lundh (effbot) * | Date: 2008-09-10 07:06 | |
The patch looks fine to me (assuming that I didn't miss something critical hidden among the large table diffs). (I'd probably named the "NODELTA" flag after what it is rather than what it isn't, but I cannot think of a short replacement right now, so let's leave it as it is.) |
|||
| msg72950 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2008-09-10 09:34 | |
Reviewed the patch: looks fine to me. One nit: the unicodedata module doc-string must be updated to 5.1.0 as well. Ditto for the documentation. |
|||
| msg72962 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2008-09-10 14:11 | |
I have now committed the change as r66362 (including the missing documentation updates), and ported it to 3.0 as r66363 (where I had to change the flag value and regenerate the data, as the flag 0x100 was already taken). |
|||
| msg72973 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2008-09-10 16:11 | |
2008/9/10 Martin v. Löwis <report@bugs.python.org>: > I have now committed the change as r66362 (including the missing > documentation updates), and ported it to 3.0 as r66363 (where I had to > change the flag value and regenerate the data, as the flag 0x100 was > already taken). That's unfortunate -- perhaps the 2.6 flag and data can be brought in line, to make future merges easier? |
|||
| msg72979 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2008-09-10 18:09 | |
> That's unfortunate -- perhaps the 2.6 flag and data can be brought in > line, to make future merges easier? I thought of that, however, merging the databases themselves would still not be possible: the 3.0 database has the flags set in many records, which causes merge conflicts (as the 2.x database has different flag values). So regenerating the database is necessary, anyway. In future changes, it might be useful to have new flags with the same values, so that such patches can be merged without conflicts in the generator. |
|||
| msg72987 - (view) | Author: Daniel Diniz (ajaksu2) | Date: 2008-09-10 21:31 | |
#66363 breaks test_unicode and test_format on 3.0. |
|||
| msg72997 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * | Date: 2008-09-10 23:54 | |
Code point 0x0370 is now a printable character. r66381 corrected the failures by simply changing it to 0x0378, until the next unicodedata upgrade... I wonder if there is a value that is guaranteed to stay non-printable. |
|||
| msg73000 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2008-09-11 01:08 | |
2008/9/10 Amaury Forgeot d'Arc <report@bugs.python.org>: > Code point 0x0370 is now a printable character. > r66381 corrected the failures by simply changing it to 0x0378, until the > next unicodedata upgrade... > I wonder if there is a value that is guaranteed to stay non-printable. The control characters? |
|||
| msg73005 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2008-09-11 06:05 | |
> The control characters? Indeed, also the private-use characters. test_unicode explicitly comments that the test is about unassigned characters, although I don't understand the purpose of that test (it then also tests a surrogate character, which is also guaranteed to remain unprintable). One of the characters that is guaranteed to remain unassigned is U+FFFE (and its mirrors in other planes, e.g. U+1FFFE, ...). This guarantee is made to support the BOM. Along with U+FFFF, these are non-characters. #765036 once suggested that Python should refuse to represent them at all, but that proposal was rejected. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2008-09-11 06:05:22 | loewis | set | messages: + msg73005 |
| 2008-09-11 01:09:44 | gvanrossum | set | files: - unnamed |
| 2008-09-11 01:08:53 | gvanrossum | set | files:
+ unnamed messages: + msg73000 |
| 2008-09-10 23:54:54 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg72997 |
| 2008-09-10 21:31:01 | ajaksu2 | set | nosy:
+ ajaksu2 messages: + msg72987 versions: + Python 3.0 |
| 2008-09-10 18:09:23 | loewis | set | messages: + msg72979 |
| 2008-09-10 16:18:10 | gvanrossum | set | files: - unnamed |
| 2008-09-10 16:11:42 | gvanrossum | set | files:
+ unnamed messages: + msg72973 |
| 2008-09-10 14:11:27 | loewis | set | status: open -> closed resolution: accepted messages: + msg72962 |
| 2008-09-10 09:34:27 | lemburg | set | nosy:
+ lemburg messages: + msg72950 |
| 2008-09-10 07:06:13 | effbot | set | messages: + msg72946 |
| 2008-09-10 04:51:42 | loewis | set | assignee: effbot -> gvanrossum messages: + msg72941 nosy: + gvanrossum |
| 2008-09-09 05:39:59 | loewis | set | keywords: + needs review |
| 2008-09-09 05:39:54 | loewis | set | keywords: + patch, - needs review |
| 2008-09-09 05:37:53 | loewis | create | |