Skip to content

Commit 5f83f3a

Browse files
committed
[IMP] snippets.convert_html_columns: a batch processing story
TLDR: RTFM Once upon a time, in a countryside farm in Belgium... At first, the upgrade of databases was straightforward. But, as time passed, the size of the databases grew, and some CPU-intensive computations took so much time that a solution needed to be found. Hopefully, the Python standard library has the perfect module for this task: `concurrent.futures`. Then, Python 3.10 appeared, and the usage of `ProcessPoolExecutor` started to sometimes hang for no apparent reasons. Soon, our hero finds out he wasn't the only one to suffer from this issue[^1]. Unfortunately, the proposed solution looked overkill. Still, it revealed that the issue had already been known[^2] for a few years. Despite the fact that an official patch wasn't ready to be committed, discussion about its legitimacy[^3] leads our hero to a nicer solution. By default, `ProcessPoolExecutor.map` submits elements one by one to the pool. This is pretty inefficient when there are a lot of elements to process. This can be changed by using a large value for the *chunksize* argument. Who would have thought that a bigger chunk size would solve a performance issue? As always, the response was in the documentation[^4]. [^1]: https://stackoverflow.com/questions/74633896/processpoolexecutor-using-map-hang-on-large-load [^2]: python/cpython#74028 [^3]: python/cpython#114975 (review) [^4]: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map closes #94 Signed-off-by: Nicolas Seinlet (nse) <[email protected]>
1 parent 6a7f050 commit 5f83f3a

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/util/snippets.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@ def convert_html_columns(cr, table, columns, converter_callback, where_column="I
279279
convert = Convertor(converters, converter_callback)
280280
for query in util.log_progress(split_queries, logger=_logger, qualifier=f"{table} updates"):
281281
cr.execute(query)
282-
for data in executor.map(convert, cr.fetchall()):
282+
for data in executor.map(convert, cr.fetchall(), chunksize=1000):
283283
if "id" in data:
284284
cr.execute(update_query, data)
285285

0 commit comments

Comments
 (0)