Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Configparser Performance Regression #128641

Closed
2trvl opened this issue Jan 8, 2025 · 17 comments
Closed

Significant Configparser Performance Regression #128641

2trvl opened this issue Jan 8, 2025 · 17 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@2trvl
Copy link
Contributor

2trvl commented Jan 8, 2025

Bug report

Bug description:

Hello, @jaraco! The following commit 019143f slows down ConfigParser.read() from 2 to 6 times.

_Line is created many times when reading a file and the same regular expression is compiled for each object _strip_inline. This amounts to a 60% speed loss. The simplest solution would be to add a __call__ method and preferably create a _Line(object) when initializing RawConfigParser with an empty string value. Or abandon the _Line object altogether.

Another 40% of the performance loss comes from using cached_property for _Line.clean (10%), writing to _ReadState attributes instead of local variables (15%), and breaking up the previous giant loop into new _handle functions (15%).

I discovered this circumstance when writing an update to webbrowser. I needed to parse hundreds of small .desktop files. At first I didn't understand the reason for the increase in execution time between different distributions, so I developed 3 versions of the program:
(M) Multiprocessing
(O) Original Routine
(T) Threading

And measured their performance using timeit:

Linux arch 6.10.6-arch1-1
Parsing 186 files

Python 3.11.11
(M) 5 loops, best of 5: 62.4 msec per loop
(O) 2 loops, best of 5: 110 msec per loop
(T) 2 loops, best of 5: 130 msec per loop

Python 3.12.8
(M) 5 loops, best of 5: 66.5 msec per loop
(O) 2 loops, best of 5: 118 msec per loop
(T) 2 loops, best of 5: 140 msec per loop

Python 3.13.1
(M) 2 loops, best of 5: 125 msec per loop
(O) 1 loop, best of 5: 222 msec per loop
(T) 1 loop, best of 5: 248 msec per loop

Python 3.13.1 Free-threaded
(M) 1 loop, best of 5: 331 msec per loop
(O) 1 loop, best of 5: 648 msec per loop
(T) 1 loop, best of 5: 340 msec per loop

As you can see, performance regression is 2-6 times between 3.11 and 3.13. Isolated comparison of the new and old configparser, which verifies the slowdown of free-threading by 6 times:

Python 3.13t (Old)
(M) 10 loops, best of 5: 26.7 msec per loop
(O) 5 loops, best of 5: 59.4 msec per loop
(T) 10 loops, best of 5: 26 msec per loop

Python 3.13t (New)
(M) 2 loops, best of 5: 137 msec per loop
(O) 1 loop, best of 5: 361 msec per loop
(T) 2 loops, best of 5: 125 msec per loop

I also attach a small reproducible test, just a module calling read():

import configparser

files = [
    "python3.13.desktop",
    "python3.10.desktop",
    "htop.desktop",
    "byobu.desktop",
    "com.gexperts.Tilix.desktop",
    "snap-handle-link.desktop",
    "io.snapcraft.SessionAgent.desktop",
    "remote-viewer.desktop",
    "python3.12.desktop",
    "google-chrome.desktop",
    "vim.desktop",
    "python3.11.desktop",
    "virt-manager.desktop",
    "info.desktop",
    "ubuntu-desktop-installer_ubuntu-desktop-installer.desktop",
    "firefox_firefox.desktop",
]

def main() -> None:
    parser = configparser.ConfigParser(interpolation=None)

    for shortcut in files:
        try:
            parser.clear()
            if not parser.read(shortcut, encoding="utf-8"):
                continue
        except (UnicodeDecodeError, configparser.Error):
            continue

if __name__ == "__main__":
    main()

Archive with the above mentioned .desktop files:
shortcuts.zip

And a program for generating your own .desktop paths on Linux/BSD:

import glob
import os

XDG_DATA_HOME = os.environ.get(
    "XDG_DATA_HOME", os.path.expanduser("~/.local/share")
)
XDG_DATA_DIRS = os.environ.get(
    "XDG_DATA_DIRS", "/usr/local/share/:/usr/share/"
)
XDG_DATA_DIRS = XDG_DATA_DIRS.split(os.pathsep)

def main() -> list[str]:
    files = []
    for appdata in (XDG_DATA_HOME, *XDG_DATA_DIRS):
        shortcuts = os.path.join(appdata, "applications")
        if not os.path.isdir(shortcuts):
            continue
        shortcuts = os.path.join(shortcuts, "**", "*.desktop")
        files.extend(glob.iglob(shortcuts, recursive=True))
    return files

if __name__ == "__main__":
    print(main())

Just run this example with different interpreters and you will see the difference:

$ python3.11 -m timeit -s "import nixconfig" "nixconfig.main()"
50 loops, best of 5: 5.27 msec per loop
$ python3.12 -m timeit -s "import nixconfig" "nixconfig.main()"
50 loops, best of 5: 5.33 msec per loop
$ python3.13 -m timeit -s "import nixconfig" "nixconfig.main()"
20 loops, best of 5: 11.2 msec per loop

At this point I leave the solution to this problem to you, as I have no architectural vision of configparser. However, I am willing to offer my help in proposing solutions for Pull Request.

CPython versions tested on:

3.11, 3.12, 3.13

Operating systems tested on:

Linux

Linked PRs

@2trvl 2trvl added the type-bug An unexpected behavior, bug, or error label Jan 8, 2025
@sobolevn sobolevn added the stdlib Python modules in the Lib dir label Jan 8, 2025
@eendebakpt
Copy link
Contributor

@2trvl Thanks for the detailed description and analysis. Adding a cache to the _Line object improves performance significantly. Could you try out this branch: main...eendebakpt:cpython:configparser and report how much of the original performance this restores on your system?

@2trvl
Copy link
Contributor Author

2trvl commented Jan 9, 2025

@eendebakpt This method doesn't help much, the performance gain is 8.7%. I'll attach a patch made in a hurry, which gives 33%. Although ideally it would be to remove _Line and _ReadState.
configparser1.patch

Note that the free-threading version is slowed down a lot by using regular expressions to search for comments and cached_property. Here's how it was in the previous version:

cpython/Lib/configparser.py

Lines 987 to 1005 in 54f7e14

for lineno, line in enumerate(fp, start=1):
comment_start = sys.maxsize
# strip inline comments
inline_prefixes = {p: -1 for p in self._inline_comment_prefixes}
while comment_start == sys.maxsize and inline_prefixes:
next_prefixes = {}
for prefix, index in inline_prefixes.items():
index = line.find(prefix, index+1)
if index == -1:
continue
next_prefixes[prefix] = index
if index == 0 or (index > 0 and line[index-1].isspace()):
comment_start = min(comment_start, index)
inline_prefixes = next_prefixes
# strip full line comments
for prefix in self._comment_prefixes:
if line.strip().startswith(prefix):
comment_start = 0
break

@picnixz picnixz added performance Performance or resource usage 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Jan 10, 2025
@picnixz
Copy link
Member

picnixz commented Jan 12, 2025

Note that the free-threading version is slowed down a lot by using regular expressions to search for comments and cached_property. Here's how it was in the previous version:

Do you mean the issue could arise from cached_property and regexes on the free-threaded build and not from configparser itself? since free-threading is still experimental (I think?), the performance drop might be due to something else. However, the x2 slowdown is a bit annoying, I agree.

At this point I leave the solution to this problem to you, as I have no architectural vision of configparser.

The purpose of the PR was to reduce cyclotomic complexity (essentially have multiple small steps instead of a huge loop block).

However, I am willing to offer my help in proposing solutions for Pull Request.

I looked at your patch and I think we can perhaps improve it a bit more by computing first self.value.strip() in the any(map(...)) calls. Also the prefixes are not changed line-by-line, so we can avoid storing them in a _Line object itself. I haven't checked how much we'll gain though.

@eendebakpt
Copy link
Contributor

The patch looks good, except for the part in def __call__(self, value): where you modify the object in-place which is a construction I do not really like. A variation on this is configparser_v2 where I also get 30-40% performance increase.

@2trvl Would you like to make a PR based on the ideas here?

@2trvl
Copy link
Contributor Author

2trvl commented Jan 13, 2025

Do you mean the issue could arise from cached_property and regexes on the free-threaded build and not from configparser itself? since free-threading is still experimental (I think?), the performance drop might be due to something else. However, the x2 slowdown is a bit annoying, I agree.

Free-threading works slower in single-thread, I suppose that a large number of sequential instructions can affect it. functools.cached_property is nothing more than a descriptor, it works with the value from __dict__ of the _Line object.

cpython/Lib/functools.py

Lines 1032 to 1038 in 6116e1b

try:
cache = instance.__dict__
except AttributeError: # not all objects have __dict__ (e.g. class defines slots)
msg = (
f"No '__dict__' attribute on {type(instance).__name__!r} "
f"instance to cache {self.attrname!r} property."
)

Apparently, fast creation of immutable objects and writing to their __dict__ is a bad case for the current implementation.
I ran a test where I replaced the cache from the current configparser with a variable like in configparser1.patch:

$ python3.13t
1 loop, best of 5: 667 msec per loop
$ python3.13t
1 loop, best of 5: 392 msec per loop

GIL Python is not affected by this:

$ python3.13
1 loop, best of 5: 343 msec per loop
$ python3.13
1 loop, best of 5: 340 msec per loop

I looked at your patch and I think we can perhaps improve it a bit more by computing first self.value.strip() in the any(map(...)) calls. Also the prefixes are not changed line-by-line, so we can avoid storing them in a _Line object itself. I haven't checked how much we'll gain though.

@2trvl Would you like to make a PR based on the ideas here?

Ok, I'll try to write a decent patch in the next week and maybe create a pull request.

@2trvl
Copy link
Contributor Author

2trvl commented Jan 26, 2025

Hello, I am happy to announce that I have finished developing webbrowser browser finders for Windows and Unix. Today I will start working on a patch for ConfigParser.

@2trvl
Copy link
Contributor Author

2trvl commented Jan 31, 2025

@eendebakpt

Best b1bc375 On 3.13 : 4.91
before unnamed section and refactoring

Main 914c232 On 3.13 : 10.4, which is 211% of Best
the extreme value of the regression is 234%

Patch 2trvl@a3a696b On 3.13 : 6.16 which is 125% of Best, and 41% faster than Main

On 3.13t:
Best : 7.75
Main : 58.8
Patch : 10.6

The difference of 25% from the reference time is context switching and writing to _ReadState attributes. We won't change it anyway.

My new patch replaces _Line with _LineParser, which uses the value property. This also improves code readability. Regex matching nothing removed. What do you think?

@jaraco
Copy link
Member

jaraco commented Feb 2, 2025

Thanks everyone for the respectful conversation and proposed approaches. I hadn't considered performance implications in the previous change other than to take an approach that I expected would have only modest effects. While the old implementation was faster, it was also prone to errors due to its complexity and duplication.

I'm happy to see some progress toward restoring some or much of the performance without compromising readability (and maybe improving it). I haven't looked into the specific implementations, but the summaries sound viable. I'll be happy to review a specific PR.

@eendebakpt
Copy link
Contributor

@2trvl I have a few minor suggestions for the patch, the approach looks good. Can you open a PR with the patch?

@2trvl
Copy link
Contributor Author

2trvl commented Feb 2, 2025

@eendebakpt @jaraco Here is PR #129596

@jaraco
Copy link
Member

jaraco commented Feb 9, 2025

The PR is great and provides an excellent demonstration of the concerns.

I'd like to validate those changes myself and test other options. I see there is a reproducer described in the original bug. It's not the kind of reproducer I can run without setting up some state, so let me work on setting something up that automates that setup.

@jaraco
Copy link
Member

jaraco commented Feb 9, 2025

Here's the nixconfig.py file (for integration).

@jaraco
Copy link
Member

jaraco commented Feb 9, 2025

I created this Dockerfile as a one-file reproducer:

FROM ubuntu:noble

RUN apt update
RUN apt upgrade -y
RUN apt install -y software-properties-common
RUN apt-add-repository -y ppa:deadsnakes
RUN apt update
RUN apt install -y wget libarchive-tools

# Install Pythons
RUN apt install -y python3.12 python3.12-dev python3.12-venv
RUN apt install -y python3.13 python3.13-dev python3.13-venv

# Install Python launcher
RUN wget https://github.com/brettcannon/python-launcher/releases/download/v1.0.0/python_launcher-1.0.0-$(uname -p)-unknown-linux-gnu.tar.xz -O - | tar xJ --directory /usr/local --strip-components 1

run wget https://github.com/user-attachments/files/18351245/shortcuts.zip -O - | bsdtar xz
run wget https://gist.githubusercontent.com/jaraco/b94f5314064d4dbb5fa615fd8b31672e/raw/35f8d8e3508fa50df6058e83a859f9bce2d86bc4/nixconfig.py
CMD py -3.13 -m timeit -s 'import nixconfig' 'nixconfig.main()'

But when I run it, I'm not getting anywhere close to the 2x degradation reported:

 draft 🐚 docker run -it @$(docker build -q .) py -3.13 -m timeit -s 'import nixconfig' 'nixconfig.main'
50000000 loops, best of 5: 5.15 nsec per loop
 draft 🐚 docker run -it @$(docker build -q .) py -3.12 -m timeit -s 'import nixconfig' 'nixconfig.main'
50000000 loops, best of 5: 4.54 nsec per loop

That's a 13% increase. How am I failing to reproduce the degradation?

Also, it seems the impact is on the order of 10s of nanoseconds. How was this degradation noticed (what are the practical implications)?

@2trvl
Copy link
Contributor Author

2trvl commented Feb 9, 2025

@jaraco

4.54 nsec per loop

The nanosecond speed tells me that the files were not read. In your test, read() just excepts OSError because the file was not found.

shortcuts.zip stores all files in a subfolder called shortcuts. Change line to run wget https://github.com/user-attachments/files/18351245/shortcuts.zip -O - | bsdtar -x --strip-components=1

I confirmed this regression on WSL, Void Linux, and another Arch Linux machine.

@jaraco
Copy link
Member

jaraco commented Feb 9, 2025

Change line to [use strip-components].

Yep, rookie mistake. Thanks for that tip.

I've applied the suggestion and also stripped down the repro script and added a couple of assertions to ensure the desktop files are present and getting read.

FROM ubuntu:noble

RUN apt update
RUN apt upgrade -y
RUN apt install -y software-properties-common
RUN apt-add-repository -y ppa:deadsnakes
RUN apt update
RUN apt install -y wget libarchive-tools

# Install Pythons
RUN apt install -y python3.12 python3.12-dev python3.12-venv
RUN apt install -y python3.13 python3.13-dev python3.13-venv

# Install Python launcher
RUN wget https://github.com/brettcannon/python-launcher/releases/download/v1.0.0/python_launcher-1.0.0-$(uname -p)-unknown-linux-gnu.tar.xz -O - | tar xJ --directory /usr/local --strip-components 1

run wget https://github.com/user-attachments/files/18351245/shortcuts.zip -O - | bsdtar xz --strip-components 1
run wget https://gist.githubusercontent.com/jaraco/b94f5314064d4dbb5fa615fd8b31672e/raw/6e598789ca08ab0cf83ef8febd9e951c88151af8/nixconfig.py
CMD py -3.13 -m timeit -s 'import nixconfig' 'nixconfig.main()'

But I'm still seeing execution times in the low nanoseconds.

 draft 🐚 docker run -it @$(docker build -q .) py -3.13 -m timeit -s 'import nixconfig' 'nixconfig.main'
50000000 loops, best of 5: 7.16 nsec per loop
 draft 🐚 docker run -it @$(docker build -q .) py -3.12 -m timeit -s 'import nixconfig' 'nixconfig.main'
50000000 loops, best of 5: 6.95 nsec per loop

Aha! I now see my second mistake - in the CLI invocations, I've failed to actually invoke the function. Correcting that, I'm now successfully replicating the issue:

 draft 🐚 docker run -it @$(docker build -q .) py -3.12 -m timeit -s 'import nixconfig' 'nixconfig.main()'
200 loops, best of 5: 1.81 msec per loop
 draft 🐚 docker run -it @$(docker build -q .) py -3.13 -m timeit -s 'import nixconfig' 'nixconfig.main()'
100 loops, best of 5: 3.38 msec per loop

@jaraco
Copy link
Member

jaraco commented Feb 15, 2025

I'm working on this issue today. Feel free to reach out in Discord if you wish to interact in real time (I'm not sure which server is best; ping me if you don't have a good one to use).

I've worked out how to run the benchmark against the PR or main. I update the Dockerfile to copy the local Lib/configparser.py to the working directory and use that for the test. Demo:

 cpython main 🐚 gh pr checkout 129596
Switched to branch 'configparser_perf'
 cpython configparser_perf 🐚 docker run -it @$(docker build -q .) py -3.13 -m timeit -s 'import nixconfig' 'nixconfig.main()'
100 loops, best of 5: 2.15 msec per loop
 cpython configparser_perf 🐚 git checkout main
Switched to branch 'main'
Your branch is up to date with 'upstream/main'.
 cpython main 🐚 docker run -it @$(docker build -q .) py -3.13 -m timeit -s 'import nixconfig' 'nixconfig.main()'
100 loops, best of 5: 3.51 msec per loop

@jaraco
Copy link
Member

jaraco commented Feb 15, 2025

In the PR, I've backed out some of the changes but managed to retain the bulk of the performance gains. Please take a look and let me know if the compromise if not acceptable (and why). Thanks.

jaraco added a commit that referenced this issue Feb 24, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
---------

Co-authored-by: Jason R. Coombs <[email protected]>
Co-authored-by: Bénédikt Tran <[email protected]>
@2trvl 2trvl closed this as completed Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants