Restore re performance to pre-PEP393 level #62885

serhiy-storchaka · 2013-08-08T13:10:18Z

BPO	18685
Nosy	@tim-one, @pitrou, @vstinner, @ezio-melotti, @serhiy-storchaka
Dependencies	bpo-18672: Fix format specifiers for debug output in _sre.c
Files	sre_optimize.patch sre_optimize_2.patch sre_optimize_3.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2013-10-26.17:08:54.625>
created_at = <Date 2013-08-08.13:10:17.918>
labels = ['expert-regex', 'expert-unicode', 'performance']
title = 'Restore re performance to pre-PEP393 level'
updated_at = <Date 2013-10-26.17:08:54.623>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2013-10-26.17:08:54.623>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2013-10-26.17:08:54.625>
closer = 'serhiy.storchaka'
components = ['Regular Expressions', 'Unicode']
creation = <Date 2013-08-08.13:10:17.918>
creator = 'serhiy.storchaka'
dependencies = ['18672']
files = ['31198', '32283', '32363']
hgrepos = []
issue_num = 18685
keywords = ['patch']
message_count = 23.0
messages = ['194669', '194683', '194727', '194749', '194757', '194760', '194762', '194765', '195101', '195128', '200813', '201189', '201233', '201289', '201302', '201324', '201327', '201328', '201334', '201335', '201336', '201372', '201375']
nosy_count = 8.0
nosy_names = ['tim.peters', 'pitrou', 'vstinner', 'ezio.melotti', 'mrabarnett', 'python-dev', 'serhiy.storchaka', 'Tal.Weiss']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue18685'
versions = ['Python 3.4']

serhiy-storchaka · 2013-08-08T13:10:15Z

Before PEP-393 the regex functions scanned an array of char or Py_UNICODE and character testing was cheap. After PEP-393 they checks a kind of an unicode string for every tested character and processing of unicode strings becomes slower. _sre.c already generates two sets of functions from one source -- for byte and unicode strings. The proposed patch uses same technique to generate three sets of functions -- for byte/UCS1, UCS2 and UCS4 strings. This simplifies the code (now it more similar to pre-PEP393 version) and makes characters testing faster.

Benchmark example:

Python 3.2:
$ python3.2 -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 613 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 232 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 217 usec per loop

Python 3.4.0a1+ unpatched:
$ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 485 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 790 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 1.09 msec per loop

Python 3.4.0a1+ patched:
$ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 250 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 250 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 256 usec per loop

I also propose for simplicity extract a template part of _sre.c to separated file (i.e. srelib.h) and get rid of recursion.

mrabarnett · 2013-08-08T15:05:54Z

It appears that in your tests Python 3.2 is faster with Unicode than bytestrings and that unpatched Python 3.4 is a lot slower.

I get somewhat different results (Windows XP Pro, 32-bit):

C:\Python32\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 449 usec per loop

C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 506 usec per loop

C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 506 usec per loop

C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
1000 loops, best of 3: 227 usec per loop

C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
1000 loops, best of 3: 339 usec per loop

C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
1000 loops, best of 3: 504 usec per loop

For comparison, in the regex module I don't duplicate whole sections of code, but instead have a pointer to one of 3 functions (for UCS1, UCS2 and UCS4) that gets the codepoint, except for some tight loops. Doing that might be too much of a change for re.

However, the speed appears to be a lot more consistent:

C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

pitrou · 2013-08-09T13:29:52Z

I get the same kind of results as Serhiy:

$ python3.2 -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000"  "f(x)"
10000 loops, best of 3: 81.7 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000"  "f(x)"
10000 loops, best of 3: 31.1 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000"  "f(x)"
10000 loops, best of 3: 31.1 usec per loop

Unpatched 3.4:

$ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000"  "f(x)"
10000 loops, best of 3: 81.6 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000"  "f(x)"
10000 loops, best of 3: 163 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000"  "f(x)"
10000 loops, best of 3: 190 usec per loop

Patched 3.4:

$ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000"  "f(x)"
10000 loops, best of 3: 54.4 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000"  "f(x)"
10000 loops, best of 3: 54.2 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000"  "f(x)"
10000 loops, best of 3: 54.5 usec per loop

mrabarnett · 2013-08-09T16:04:49Z

@antoine: Are you on the same OS as Serhiy?

IIRC, wasn't the performance regression that wxjmfauth complained about in Python 3.3 apparent on Windows, but not on Linux?

pitrou · 2013-08-09T18:57:15Z

@antoine: Are you on the same OS as Serhiy?

I don't know, I'm under Linux with gcc (on a x86-64 machine) :-)

IIRC, wasn't the performance regression that wxjmfauth complained
about in Python 3.3 apparent on Windows, but not on Linux?

I don't know, but I'm not willing to give any attention to something
reported by jmfauth. He's very far in the trollzone, as far as I'm
concerned.

However, if you are under Windows and can give it a try, it would be
nice to have performance numbers for Serhiy's patch.

mrabarnett · 2013-08-09T20:03:01Z

With the patch the results are:

C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 113 usec per loop

I'm most impressed! :-)

serhiy-storchaka · 2013-08-09T20:11:54Z

I'm under 32-bit Linux with gcc 4.6.3.

The above test is only one example for which I expect largest difference. I suppose other tests will show a gain too.

pitrou · 2013-08-09T20:18:26Z

Ok, here are some results from the benchmarks suite:

Report on Linux fsol 3.8.0-27-generic #40-Ubuntu SMP Tue Jul 9 00:17:05 UTC 2013 x86_64 x86_64
Total CPU cores: 4

### regex_effbot ###
Min: 0.058952 -> 0.054367: 1.08x faster
Avg: 0.059060 -> 0.054378: 1.09x faster
Significant (t=132.69)
Stddev: 0.00008 -> 0.00001: 5.9597x smaller

### regex_v8 ###
Min: 0.063401 -> 0.050701: 1.25x faster
Avg: 0.066147 -> 0.053530: 1.24x faster
Significant (t=3.22)
Stddev: 0.00608 -> 0.00630: 1.0363x larger

The following not significant results are hidden, use -v to show them:
regex_compile.

vstinner · 2013-08-13T22:57:45Z

Using #include "_sre.c" in _sre.c looks weird. Instead of huge sections delimited by "#ifdef SRE_RECURSIVE", I would prefer something similar to the stringlib. ".h" template files included more than once. I also expect shorter files: _sre.c is close to 4000 lines of C code :-(

If you move code from _sre.c to a new file, you should use "hg cp" to keep the history. For the review, it's maybe better to continue with your SRE_RECURSIVE hack :)

--

#define SRE_CHAR Py_UCS1
#define SIZEOF_SRE_CHAR 1
..
#define SRE_CHAR Py_UCS2
#define SIZEOF_SRE_CHAR 1
...
#define SRE_CHAR Py_UCS4
#define SIZEOF_SRE_CHAR 1

The value of SIZEOF_SRE_CHAR looks suspicious.

Does test_re have some non-ASCII tests? If not, we should probably start by adding such tests!

serhiy-storchaka · 2013-08-14T09:44:08Z

Using #include "_sre.c" in _sre.c looks weird. Instead of huge sections delimited by "#ifdef SRE_RECURSIVE", I would prefer something similar to the stringlib. ".h" template files included more than once. I also expect shorter files: _sre.c is close to 4000 lines of C code :-(

Agree, but a patch will be larger and harder for the synchronization and for the review in Rietveld. I'm going first solve other issues (bpo-18647, bpo-18672) before creating a large final patch.

The value of SIZEOF_SRE_CHAR looks suspicious.

Good catch. Actually this macro is used only in order to skip some checks for UCS4. It should not affects the correctness, only possible the performance.

Does test_re have some non-ASCII tests? If not, we should probably start by adding such tests!

There is a small number (about 10) of tests for non-ASCII data.

serhiy-storchaka · 2013-10-21T17:03:29Z

Rebased patch to tip and added non-ASCII tests for main re functions.

serhiy-storchaka · 2013-10-24T21:15:32Z

Please review this patch. I will extract template part into separated file in separated commit.

pitrou · 2013-10-25T12:43:40Z

Posted review on Rietveld. See also issue bpo-19387.

serhiy-storchaka · 2013-10-25T20:28:23Z

Updated patch addresses Antoine's comments.

pitrou · 2013-10-25T23:12:21Z

Looks good to me.

python-dev · 2013-10-26T07:46:24Z

New changeset 66e2dfbb1d70 by Serhiy Storchaka in branch 'default':
Issue bpo-18685: Restore re performance to pre-PEP 393 levels.
http://hg.python.org/cpython/rev/66e2dfbb1d70

vstinner · 2013-10-26T08:15:17Z

Sorry, I was busy with my tracemalloc PEP, I didn't havee time to review your patch. I'm happy that you restored Python 3.2 performances! Thanks Serhiy.

python-dev · 2013-10-26T08:19:30Z

New changeset 00e61cb3b11c by Serhiy Storchaka in branch 'default':
Issue bpo-18685: Extract template part of _sre.c into separated sre_lib.h file.
http://hg.python.org/cpython/rev/00e61cb3b11c

pitrou · 2013-10-26T10:28:43Z

Didn't you forget to add sre_lib.h?

pitrou · 2013-10-26T10:32:17Z

Ah, sorry, no. I was fooled by the commit e-mail.

serhiy-storchaka · 2013-10-26T11:00:57Z

Yes, the commit e-mail looks queer.

pitrou · 2013-10-26T16:50:25Z

I suppose this issue can be fixed then. Thanks for doing this!

serhiy-storchaka · 2013-10-26T17:08:55Z

Thank you for your review Antoine and Victor.

serhiy-storchaka self-assigned this Aug 8, 2013

serhiy-storchaka added topic-regex topic-unicode performance Performance or resource usage labels Aug 8, 2013

serhiy-storchaka closed this as completed Oct 26, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

serhiy-storchaka mentioned this issue May 13, 2022

gh-89653: PEP 670: Convert PyUnicode_KIND() macro to function #92705

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore re performance to pre-PEP393 level #62885

Restore re performance to pre-PEP393 level #62885

serhiy-storchaka commented Aug 8, 2013

serhiy-storchaka commented Aug 8, 2013

mrabarnett mannequin commented Aug 8, 2013

pitrou commented Aug 9, 2013

mrabarnett mannequin commented Aug 9, 2013

pitrou commented Aug 9, 2013

mrabarnett mannequin commented Aug 9, 2013

serhiy-storchaka commented Aug 9, 2013

pitrou commented Aug 9, 2013

vstinner commented Aug 13, 2013

serhiy-storchaka commented Aug 14, 2013

serhiy-storchaka commented Oct 21, 2013

serhiy-storchaka commented Oct 24, 2013

pitrou commented Oct 25, 2013

serhiy-storchaka commented Oct 25, 2013

pitrou commented Oct 25, 2013

python-dev mannequin commented Oct 26, 2013

vstinner commented Oct 26, 2013

python-dev mannequin commented Oct 26, 2013

pitrou commented Oct 26, 2013

pitrou commented Oct 26, 2013

serhiy-storchaka commented Oct 26, 2013

pitrou commented Oct 26, 2013

serhiy-storchaka commented Oct 26, 2013

Restore re performance to pre-PEP393 level #62885

Restore re performance to pre-PEP393 level #62885

Comments

serhiy-storchaka commented Aug 8, 2013

serhiy-storchaka commented Aug 8, 2013

mrabarnett mannequin commented Aug 8, 2013

pitrou commented Aug 9, 2013

mrabarnett mannequin commented Aug 9, 2013

pitrou commented Aug 9, 2013

mrabarnett mannequin commented Aug 9, 2013

serhiy-storchaka commented Aug 9, 2013

pitrou commented Aug 9, 2013

vstinner commented Aug 13, 2013

serhiy-storchaka commented Aug 14, 2013

serhiy-storchaka commented Oct 21, 2013

serhiy-storchaka commented Oct 24, 2013

pitrou commented Oct 25, 2013

serhiy-storchaka commented Oct 25, 2013

pitrou commented Oct 25, 2013

python-dev mannequin commented Oct 26, 2013

vstinner commented Oct 26, 2013

python-dev mannequin commented Oct 26, 2013

pitrou commented Oct 26, 2013

pitrou commented Oct 26, 2013

serhiy-storchaka commented Oct 26, 2013

pitrou commented Oct 26, 2013

serhiy-storchaka commented Oct 26, 2013