Issue31690
Created on 2017-10-04 14:02 by serhiy.storchaka, last changed 2017-10-24 20:33 by serhiy.storchaka. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 3872 | methane, 2017-10-04 14:11 | ||
| PR 3885 | merged | serhiy.storchaka, 2017-10-04 16:38 | |
| Messages (4) | |||
|---|---|---|---|
| msg303693 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-10-04 14:02 | |
Currently re supports local inline flags. 'a(?i:b)' matches 'a' cases-sensitively, but 'b' case-insensitively. But flags 'a' and 'L' can't be scoped to a subpattern. The 'u' flag currently just redundant, it doesn't make effect in string patterns, and is not allowed in bytes patterns. They can be applied only to the whole pattern. I think it would be nice to make them local. The example of the problem that this can solve is issue31672. Currently '[a-z]' in Unicode case-insensitive mode matches not only Latin letters from ;a' to 'z' and from 'A' to 'Z', but also characters 'İ', 'ı', 'ſ' and 'K' which are equivalent to 'i', 's' and 'k' correspondingly. With local 'a' and 'u' flags you can use ASCII and Unicode ranges in the same pattern. I'm working on the patch. |
|||
| msg303712 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-10-04 16:54 | |
PR 3885 is a preliminary but working implementation. Needed new tests and documentation. >>> import re >>> re.findall('(?i:[a-z]+)', ''.join(map(chr, range(0x10000)))) ['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz', 'İı', 'ſ', 'K'] >>> re.findall('(?ia:[a-z]+)', ''.join(map(chr, range(0x10000)))) ['ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'] The engine now uses separate opcodes for case-insensitive matching in ASCII, UNICODE and LOCALE modes. It may cause small speed up of matching, but slow down of compiling. |
|||
| msg303759 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-10-05 12:16 | |
Added tests and the documentation. |
|||
| msg304939 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2017-10-24 20:31 | |
New changeset 3557b05c5a7dfd7d97ddfd3b79aefd53d25e5132 by Serhiy Storchaka in branch 'master': bpo-31690: Allow the inline flags "a", "L", and "u" to be used as group flags for RE. (#3885) https://github.com/python/cpython/commit/3557b05c5a7dfd7d97ddfd3b79aefd53d25e5132 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2017-10-24 20:33:52 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2017-10-24 20:31:44 | serhiy.storchaka | set | messages: + msg304939 |
| 2017-10-05 12:16:42 | serhiy.storchaka | set | messages: + msg303759 |
| 2017-10-04 16:54:06 | serhiy.storchaka | set | messages: + msg303712 |
| 2017-10-04 16:38:23 | serhiy.storchaka | set | pull_requests: + pull_request3860 |
| 2017-10-04 14:11:55 | methane | set | keywords:
+ patch stage: needs patch -> patch review pull_requests: + pull_request3858 |
| 2017-10-04 14:03:47 | barry | set | nosy:
+ barry |
| 2017-10-04 14:02:56 | serhiy.storchaka | create | |