[proxy] github.com← back | site home | direct (HTTPS) ↗ | proxy home | ◑ dark◐ light
/ cpython Public

Conversation

Copy link
Member

methane commented Oct 3, 2017

sre_compile does bit test (e.g. flags & SRE_FLAG_IGNORECASE) in loop.
IntFlag.__and__ and IntFlag.__new__ made it slower.

So convert it to normal int before passing flags to sre_compile.

https://bugs.python.org/issue31671

sre_compile does bit test (e.g. `flags & SRE_FLAG_IGNORECASE`) in loop.
`IntFlag.__and__` and `IntFlag.__new__` made it slower.

So convert it to normal int before passing flags to `sre_compile()`.
Lib/re.py Outdated
_MAXCACHE = 512
def _compile(pattern, flags):
# internal: compile pattern
flags = int(flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drawback of this change is that flags=1.0 and flags="0x1f" are accepted now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the .value attribute?

# convert RegexFlag enum to integer for performance
try: flags = flags.value
except AttributeError: pass

Would it be expensive to add a "isinstance(flags, RegexFlags)"?

if isinstance(flags, RegexFlags): flags = flags.value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check what is the fastest method.

Copy link
Member

vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I confirm that it's a real optimization: https://bugs.python.org/issue31671#msg303679

You might want to document it as an optimization in What's New in Python 3.7.

methane removed the skip news label Oct 4, 2017
Copy link
Member

Will the contents of the cache leak back out to the user? It will be frustrating to use IntFlag constants going in, but end up with plain, non-descriptive ints coming out.

Copy link
Member Author

methane commented Oct 4, 2017

Will the contents of the cache leak back out to the user? It will be frustrating to use IntFlag constants going in, but end up with plain, non-descriptive ints coming out.

Last stage of re.compile() is implemented in C.
So RegexFlag is converted to int at last, regardless this pull request.
See Python 3.6 example:

$ python3
Python 3.6.2 (default, Jul 18 2017, 05:47:42) 
[GCC 6.3.0 20170406] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile("foo", re.A | re.I)
>>> p.flags
258

methane merged commit c1c47c1 into python:master Oct 5, 2017
methane deleted the re-enum-int branch October 5, 2017 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants