-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore re performance to pre-PEP393 level #62885
Comments
Before PEP-393 the regex functions scanned an array of char or Py_UNICODE and character testing was cheap. After PEP-393 they checks a kind of an unicode string for every tested character and processing of unicode strings becomes slower. _sre.c already generates two sets of functions from one source -- for byte and unicode strings. The proposed patch uses same technique to generate three sets of functions -- for byte/UCS1, UCS2 and UCS4 strings. This simplifies the code (now it more similar to pre-PEP393 version) and makes characters testing faster. Benchmark example: Python 3.2: Python 3.4.0a1+ unpatched: Python 3.4.0a1+ patched: I also propose for simplicity extract a template part of _sre.c to separated file (i.e. srelib.h) and get rid of recursion. |
It appears that in your tests Python 3.2 is faster with Unicode than bytestrings and that unpatched Python 3.4 is a lot slower. I get somewhat different results (Windows XP Pro, 32-bit): C:\Python32\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)" C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)" C:\Python32\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)" C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)" C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)" C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)" For comparison, in the regex module I don't duplicate whole sections of code, but instead have a pointer to one of 3 functions (for UCS1, UCS2 and UCS4) that gets the codepoint, except for some tight loops. Doing that might be too much of a change for re. However, the speed appears to be a lot more consistent: C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)" C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)" C:\Python32\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)" C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile(b'abc').search; x = b'x'*100000" "f(x)" C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = 'x'*100000" "f(x)" C:\Python34\python.exe -m timeit -s "import regex; f = regex.compile('abc').search; x = '\u20ac'*100000" "f(x)" |
I get the same kind of results as Serhiy: $ python3.2 -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 81.7 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 31.1 usec per loop
$ python3.2 -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 31.1 usec per loop Unpatched 3.4: $ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 81.6 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 163 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 190 usec per loop Patched 3.4: $ ./python -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)"
10000 loops, best of 3: 54.4 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)"
10000 loops, best of 3: 54.2 usec per loop
$ ./python -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)"
10000 loops, best of 3: 54.5 usec per loop |
@antoine: Are you on the same OS as Serhiy? IIRC, wasn't the performance regression that wxjmfauth complained about in Python 3.3 apparent on Windows, but not on Linux? |
I don't know, I'm under Linux with gcc (on a x86-64 machine) :-)
I don't know, but I'm not willing to give any attention to something However, if you are under Windows and can give it a try, it would be |
With the patch the results are: C:\Python34\python.exe -m timeit -s "import re; f = re.compile(b'abc').search; x = b'x'*100000" "f(x)" C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = 'x'*100000" "f(x)" C:\Python34\python.exe -m timeit -s "import re; f = re.compile('abc').search; x = '\u20ac'*100000" "f(x)" I'm most impressed! :-) |
I'm under 32-bit Linux with gcc 4.6.3. The above test is only one example for which I expect largest difference. I suppose other tests will show a gain too. |
Ok, here are some results from the benchmarks suite: Report on Linux fsol 3.8.0-27-generic #40-Ubuntu SMP Tue Jul 9 00:17:05 UTC 2013 x86_64 x86_64 ### regex_effbot ### ### regex_v8 ### The following not significant results are hidden, use -v to show them: |
Using #include "_sre.c" in _sre.c looks weird. Instead of huge sections delimited by "#ifdef SRE_RECURSIVE", I would prefer something similar to the stringlib. ".h" template files included more than once. I also expect shorter files: _sre.c is close to 4000 lines of C code :-( If you move code from _sre.c to a new file, you should use "hg cp" to keep the history. For the review, it's maybe better to continue with your SRE_RECURSIVE hack :) -- #define SRE_CHAR Py_UCS1
#define SIZEOF_SRE_CHAR 1
..
#define SRE_CHAR Py_UCS2
#define SIZEOF_SRE_CHAR 1
...
#define SRE_CHAR Py_UCS4
#define SIZEOF_SRE_CHAR 1 The value of SIZEOF_SRE_CHAR looks suspicious. Does test_re have some non-ASCII tests? If not, we should probably start by adding such tests! |
Agree, but a patch will be larger and harder for the synchronization and for the review in Rietveld. I'm going first solve other issues (bpo-18647, bpo-18672) before creating a large final patch.
Good catch. Actually this macro is used only in order to skip some checks for UCS4. It should not affects the correctness, only possible the performance.
There is a small number (about 10) of tests for non-ASCII data. |
Rebased patch to tip and added non-ASCII tests for main re functions. |
Please review this patch. I will extract template part into separated file in separated commit. |
Posted review on Rietveld. See also issue bpo-19387. |
Updated patch addresses Antoine's comments. |
Looks good to me. |
New changeset 66e2dfbb1d70 by Serhiy Storchaka in branch 'default': |
Sorry, I was busy with my tracemalloc PEP, I didn't havee time to review your patch. I'm happy that you restored Python 3.2 performances! Thanks Serhiy. |
New changeset 00e61cb3b11c by Serhiy Storchaka in branch 'default': |
Didn't you forget to add sre_lib.h? |
Ah, sorry, no. I was fooled by the commit e-mail. |
Yes, the commit e-mail looks queer. |
I suppose this issue can be fixed then. Thanks for doing this! |
Thank you for your review Antoine and Victor. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: