Issue20998
Created on 2014-03-20 18:40 by Lucretiel, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| sre_fullmatch_repeated_ignorecase.patch | serhiy.storchaka, 2014-03-20 20:26 | review | ||
| issue20998.patch | mrabarnett, 2014-03-20 21:37 | |||
| issue20998_2.patch | serhiy.storchaka, 2014-04-13 15:28 | review | ||
| Messages (10) | |||
|---|---|---|---|
| msg214257 - (view) | Author: Nathan West (Lucretiel) * | Date: 2014-03-20 18:40 | |
I have the following regular expression:
In [2]: regex = re.compile("ME IS \w+", re.I)
For some reason, when using `fullmatch`, it doesn't match substrings longer than 1 for the '\w+':
In [3]: regex.fullmatch("ME IS L")
Out[3]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [4]: regex.fullmatch("me is l")
Out[4]: <_sre.SRE_Match object; span=(0, 7), match='me is l'>
In [5]: regex.fullmatch("ME IS Lucretiel")
In [6]: regex.fullmatch("me is lucretiel")
I have no idea why this is happening. Using `match` works fine:
In [7]: regex.match("ME IS L")
Out[7]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [8]: regex.match("ME IS Lucretiel")
Out[8]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>
In [9]: regex.match("me is lucretiel")
Out[9]: <_sre.SRE_Match object; span=(0, 15), match='me is lucretiel'>
Additionally, using `fullmatch` WITHOUT using the `re.I` flag causes it to work:
In [10]: regex = re.compile("ME IS \w+")
In [11]: regex.fullmatch("ME IS L")
Out[11]: <_sre.SRE_Match object; span=(0, 7), match='ME IS L'>
In [12]: regex.fullmatch("ME IS Lucretiel")
Out[12]: <_sre.SRE_Match object; span=(0, 15), match='ME IS Lucretiel'>
My platform is Ubuntu 12.04, using Python 3.4 installed from Felix Krull's deadsnakes PPA (https://launchpad.net/~fkrull/+archive/deadsnakes).
|
|||
| msg214272 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2014-03-20 20:26 | |
Here is a patch. |
|||
| msg214287 - (view) | Author: Matthew Barnett (mrabarnett) * | Date: 2014-03-20 21:37 | |
FWIW, here's my own attempt at a patch. |
|||
| msg215546 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2014-04-04 18:22 | |
Both patch are almost equivalent (my patch is much simpler but perhaps
Matthew's approach is more correct in long perspective).
Unfortunately Rietvield doesn't work with Matthew's patch, so I have added my
comments here.
> - (!ctx->match_all || ctx->ptr == state->end)) {
> + ctx->ptr == state->end) {
Why this check is not needed anymore?
> - status = SRE(match)(state, pattern + 2*prefix_skip);
> + status = SRE(match)(state, pattern + 2*prefix_skip,
state->match_all);
> - status = SRE(match)(state, pattern + 2);
> + status = SRE(match)(state, pattern + 2, state->match_all);
state->match_all is used but it is never initialized.
|
|||
| msg215549 - (view) | Author: Matthew Barnett (mrabarnett) * | Date: 2014-04-04 18:49 | |
> > - (!ctx->match_all || ctx->ptr == state->end)) {
> > + ctx->ptr == state->end) {
>
> Why this check is not needed anymore?
>
After stepping through the code for that regex that fails, I concluded
that the condition shouldn't depend on ctx->match_all at that point
after all.
> > - status = SRE(match)(state, pattern + 2*prefix_skip);
> > + status = SRE(match)(state, pattern + 2*prefix_skip,
> state->match_all);
>
> > - status = SRE(match)(state, pattern + 2);
> > + status = SRE(match)(state, pattern + 2, state->match_all);
>
> state->match_all is used but it is never initialized.
I thought I'd initialised it in all the places it's used.
I admit that I find the code a little hard to follow at times... :-(
|
|||
| msg215667 - (view) | Author: Gareth Gouldstone (Gareth.Gouldstone) | Date: 2014-04-06 20:32 | |
fullmatch() is not yet implemented on the regex scanner object SRE_Scanner (issue 21002). Is it possible to adapt this patch to fix this omission? |
|||
| msg216019 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2014-04-13 15:28 | |
> After stepping through the code for that regex that fails, I concluded > that the condition shouldn't depend on ctx->match_all at that point > after all. Tests are passed without this check. But I'm not sure it is not needed. At least without this check the code is not equivalent to the code before adding support for fullmatch(). So I prefer to left it as is. > I thought I'd initialised it in all the places it's used. > > I admit that I find the code a little hard to follow at times... :-( Indeed, it is initialized in Modules/_sre.c, and it is always 0. Perhaps it will be more consistent to get rid of the match_all field in the SRE_STATE structure and pass it as argument. |
|||
| msg216022 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2014-04-13 15:50 | |
Gareth, this is unrelated issue. |
|||
| msg218566 - (view) | Author: Roundup Robot (python-dev) | Date: 2014-05-14 18:52 | |
New changeset 6267428afbdb by Serhiy Storchaka in branch '3.4': Issue #20998: Fixed re.fullmatch() of repeated single character pattern http://hg.python.org/cpython/rev/6267428afbdb New changeset bcf64c1c92f6 by Serhiy Storchaka in branch 'default': Issue #20998: Fixed re.fullmatch() of repeated single character pattern http://hg.python.org/cpython/rev/bcf64c1c92f6 |
|||
| msg218567 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2014-05-14 18:57 | |
Thank you Matthew for your contribution. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:00 | admin | set | github: 65197 |
| 2014-05-14 18:57:45 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg218567 stage: patch review -> resolved |
| 2014-05-14 18:52:07 | python-dev | set | nosy:
+ python-dev messages: + msg218566 |
| 2014-04-13 17:57:17 | serhiy.storchaka | set | assignee: serhiy.storchaka |
| 2014-04-13 15:50:27 | serhiy.storchaka | set | messages: + msg216022 |
| 2014-04-13 15:28:32 | serhiy.storchaka | set | files:
+ issue20998_2.patch messages: + msg216019 |
| 2014-04-06 20:32:44 | Gareth.Gouldstone | set | nosy:
+ Gareth.Gouldstone messages: + msg215667 |
| 2014-04-04 18:49:34 | mrabarnett | set | messages: + msg215549 |
| 2014-04-04 18:22:59 | serhiy.storchaka | set | messages: + msg215546 |
| 2014-03-20 21:37:52 | mrabarnett | set | files:
+ issue20998.patch messages: + msg214287 |
| 2014-03-20 20:26:25 | serhiy.storchaka | set | files:
+ sre_fullmatch_repeated_ignorecase.patch keywords: + patch messages: + msg214272 stage: needs patch -> patch review |
| 2014-03-20 18:57:45 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka stage: needs patch versions: + Python 3.5 |
| 2014-03-20 18:43:09 | Lucretiel | set | type: behavior |
| 2014-03-20 18:40:40 | Lucretiel | create | |