Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with emit dataframe #902

Closed
tomuben opened this issue May 20, 2024 · 0 comments · Fixed by exasol/script-languages#414
Closed

Memory leak with emit dataframe #902

tomuben opened this issue May 20, 2024 · 0 comments · Fixed by exasol/script-languages#414
Assignees
Labels
bug Unwanted / harmful behavior

Comments

@tomuben
Copy link
Collaborator

tomuben commented May 20, 2024

Steps to reproduce

  1. Create UDF
--/
CREATE OR REPLACE PYTHON3 SET SCRIPT generate_data(
  "id" INTEGER, 
  "batch_count" INTEGER, 
  "batch_size" INTEGER
) EMITS (
  SAMPLE01 DOUBLE, 
  SAMPLE02 DOUBLE, 
  SAMPLE03 DOUBLE, 
  SAMPLE04 DOUBLE, 
  SAMPLE05 DOUBLE, 
  SAMPLE06 DOUBLE, 
  SAMPLE07 DOUBLE, 
  SAMPLE08 DOUBLE, 
  SAMPLE09 DOUBLE, 
  SAMPLE10 DOUBLE,
  SAMPLE11 DOUBLE, 
  SAMPLE12 DOUBLE, 
  SAMPLE13 DOUBLE, 
  SAMPLE14 DOUBLE, 
  SAMPLE15 DOUBLE, 
  SAMPLE16 DOUBLE, 
  SAMPLE17 DOUBLE, 
  SAMPLE18 DOUBLE, 
  SAMPLE19 DOUBLE, 
  SAMPLE20 DOUBLE,
  TARGET01 DOUBLE, 
  TARGET02 DOUBLE, 
  TARGET03 DOUBLE, 
  TARGET04 DOUBLE, 
  TARGET05 DOUBLE
) AS
import numpy as np
import pandas as pd
n_features = 20
n_targets = 5
def run(ctx):

  for i in range(ctx.batch_count):
        features = np.ones([ctx.batch_size, n_features+n_targets])
        df_result = pd.DataFrame(features)
        ctx.emit(df_result)
/
  1. Run UDF:
SELECT generate_data_oom("id", 100000, 1000)
FROM VALUES BETWEEN 1 AND 2 "id"("id")
GROUP BY "id"

=> Throws an OOM error:

024-05-20	FAILED	SELECT	102.246	(null)	0	[Code: 0, SQL State: 40020]  Connection lost after system running out of memory. SessionID:1799586813378101248	

Hints:

Try https://docs.python.org/3/library/gc.html#gc.set_debug or dump objects with https://docs.python.org/3/library/gc.html#gc.get_objects

Possible root cause: Missing decrement of reference counter of a Python object.

@tomuben tomuben added the bug Unwanted / harmful behavior label May 20, 2024
@tomuben tomuben self-assigned this May 20, 2024
tomuben added a commit that referenced this issue Jun 7, 2024
@tomuben tomuben reopened this Jun 7, 2024
tomuben added a commit that referenced this issue Jun 10, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fixes #902
@tomuben tomuben closed this as completed Jun 10, 2024
tomuben added a commit that referenced this issue Jun 11, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Changelist:
- #904: Ignored Kernel CVE (#905) 
- #906: Updated APT package (#907) 
- #908 pined conda package (#909) 
- #910: Updated APT package (#911) 
- #892: Filtered out Linux Kernel related CVE's (#912) 
- #895: Fixed GH Action 'Publish Docker Test Container' (#913) 
- #856: Changed mirror for installing R packages (#914) 
- #917: Updated Ubuntu JDK package (#918) 
- #915: Updated Python (#916) 
- #902: fixed memory related bugs with emit dataframe (#920) 
- #921: Use exasol-python-test-framework 0.5.0 (#922) 
- #923: Use exasol-python-test-framework 0.5.1 (#924)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unwanted / harmful behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants