Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing version check to happen once per run #597

Merged
merged 2 commits into from
Dec 20, 2023

Conversation

eli-kir
Copy link
Contributor

@eli-kir eli-kir commented Dec 18, 2023

Description

When running the python profiler on this library, I noticed a pretty intense amount of calls and total time spent in Version.py. Upon investigation, I traced the call to a check that happened once per loop in fetch_one. Taking a clue from #363 (which is more than a year old and sitting), I refactored what had initially been a @lru_cache change on my end to just be a static call once and then replaced all if trees that used it with the variable. I also changed as little as possible in an attempt to get this through. Below are the cProfiler stats before and after the change. Tested when parsing through a remote binary log, limited both to 3000 events from the stream reader just to save time, there is a nearly 50% speedup and the calls to Version.py is no longer dead at the top of the cProfiler for time spent.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Other (please specify below)

Checklist

  • I have read and understood the CONTRIBUTING.md document.
  • I have checked there aren't any other open Pull Requests for the desired changes.
  • This PR resolves an issue #[Issue Number Here].

Tests

  • All tests have passed.
  • New tests have been added to cover the changes. (describe below if applicable).

Additional Information

As stated above, there IS another PR that makes this change, but likely due to the fact that there were both multiple changes being thrown in and an unresponsive PR, I'm hoping this can make it through as it provides an immense speedup to performance of this library.

Before

$ python3 -m cProfile -s tottime parse_bin_log.py

         13139022 function calls (13114132 primitive calls) in 9.540 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   219826    1.270    0.000    2.836    0.000 version.py:186(__init__)
   109935    0.613    0.000    1.765    0.000 connections.py:730(_read_packet)
     3001    0.584    0.000    7.856    0.003 binlogstream.py:579(fetchone)
   109912    0.424    0.000    2.484    0.000 packet.py:60(__init__)
   220004    0.383    0.000    0.383    0.000 {method 'search' of 're.Pattern' objects}
   219826    0.363    0.000    0.396    0.000 version.py:503(_cmpkey)
    36819    0.352    0.000    0.352    0.000 {method 'recv_into' of '_socket.socket' objects}
  2198868    0.300    0.000    0.300    0.000 {method 'group' of 're.Match' objects}
   429139    0.280    0.000    0.316    0.000 protocol.py:62(read)
   319199    0.259    0.000    0.519    0.000 packet.py:133(read)
   219896    0.247    0.000    0.247    0.000 {method 'settimeout' of '_socket.socket' objects}
220847/220599    0.235    0.000    0.672    0.000 {method 'read' of '_io.BufferedReader' objects}
   219870    0.225    0.000    1.054    0.000 connections.py:775(_read_bytes)
    21986    0.206    0.000    1.611    0.000 row_event.py:740(__init__)
   769391    0.182    0.000    0.182    0.000 version.py:205(<genexpr>)
    55463    0.158    0.000    0.388    0.000 column.py:74(data)
   634077    0.150    0.000    0.223    0.000 column.py:76(<genexpr>)
   435833    0.135    0.000    0.135    0.000 {built-in method _struct.unpack}
1225328/1221520    0.127    0.000    0.128    0.000 {built-in method builtins.len}
    53928    0.100    0.000    0.408    0.000 row_event.py:163(__read_values_name)
      715    0.091    0.000    0.091    0.000 {built-in method marshal.loads}
    43975    0.090    0.000    0.118    0.000 event.py:16(__init__)
   659478    0.086    0.000    0.086    0.000 version.py:452(_parse_letter_version)
    21986    0.079    0.000    0.233    0.000 row_event.py:19(__init__)
   616407    0.079    0.000    0.079    0.000 {method 'startswith' of 'str' objects}
     2996    0.078    0.000    0.532    0.000 row_event.py:103(_read_column_data)
   109913    0.068    0.000    0.085    0.000 version.py:76(__lt__)
    85643    0.066    0.000    0.066    0.000 {method 'decode' of 'bytes' objects}
    21976    0.063    0.000    0.301    0.000 row_event.py:652(__init__)
    .
    .
    .

After

$ python3 -m cProfile -s tottime parse_bin_log_with_version_change.py

         7750856 function calls (7725829 primitive calls) in 5.872 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   109935    0.567    0.000    1.543    0.000 connections.py:730(_read_packet)
     3001    0.368    0.000    4.254    0.001 binlogstream.py:584(fetchone)
   109912    0.356    0.000    2.261    0.000 packet.py:60(__init__)
    36536    0.289    0.000    0.289    0.000 {method 'recv_into' of '_socket.socket' objects}
   429139    0.255    0.000    0.289    0.000 protocol.py:62(read)
   319199    0.240    0.000    0.485    0.000 packet.py:133(read)
   219892    0.220    0.000    0.220    0.000 {method 'settimeout' of '_socket.socket' objects}
   219870    0.201    0.000    0.897    0.000 connections.py:775(_read_bytes)
    21986    0.191    0.000    1.521    0.000 row_event.py:739(__init__)
   634077    0.142    0.000    0.212    0.000 column.py:76(<genexpr>)
    55463    0.128    0.000    0.347    0.000 column.py:74(data)
   435778    0.114    0.000    0.114    0.000 {built-in method _struct.unpack}
1225228/1221420    0.113    0.000    0.115    0.000 {built-in method builtins.len}
220846/220598    0.102    0.000    0.470    0.000 {method 'read' of '_io.BufferedReader' objects}
    53928    0.094    0.000    0.395    0.000 row_event.py:162(__read_values_name)
      714    0.081    0.000    0.081    0.000 {built-in method marshal.loads}
     2996    0.076    0.000    0.516    0.000 row_event.py:102(_read_column_data)
   616483    0.076    0.000    0.076    0.000 {method 'startswith' of 'str' objects}
    43975    0.074    0.000    0.097    0.000 event.py:16(__init__)
    21986    0.069    0.000    0.200    0.000 row_event.py:19(__init__)
    85641    0.065    0.000    0.065    0.000 {method 'decode' of 'bytes' objects}
    21976    0.055    0.000    0.259    0.000 row_event.py:651(__init__)
   .
   .
   .

Currently, the code has multiple instances where the pymysql version
is checked inline with a vall to Version.py.  This is unlikely to change
during the duration of the script running so we can optimize by
storing this in a variable once.
@sean-k1
Copy link
Collaborator

sean-k1 commented Dec 20, 2023

@eli-kir
hello eli-kir
Thanks for the good contribution
I didn't realize it took so long to check the version.

@sean-k1 sean-k1 merged commit 243656d into julien-duponchelle:main Dec 20, 2023
@eli-kir
Copy link
Contributor Author

eli-kir commented Dec 20, 2023

Thanks! Happy to have helped, this library does a lot of heavy lifting for me

@sean-k1 sean-k1 mentioned this pull request Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants