Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch GC #592

Closed
x66ccff opened this issue Feb 4, 2025 · 8 comments
Closed

pytorch GC #592

x66ccff opened this issue Feb 4, 2025 · 8 comments

Comments

@x66ccff
Copy link

x66ccff commented Feb 4, 2025

I'm using torch in PythonCall. When I try to create tensors multiple times, even after reassigning the same tensor or setting it to nothing, I don't observe any decrease in GPU memory usage. This persists even after using GC.

using PythonCall
torch = pyimport("torch")
torch.cuda.is_available()
n=20000
 
a = torch.randn((1,n*n),device=torch.device("cuda"))  # VRAM increase here
a = torch.randn((1,n*n),device=torch.device("cuda"))  # VRAM also increase here
a = torch.randn((1,n*n),device=torch.device("cuda"))  # VRAM also increase here
a = nothing # useless
 
PythonCall.GC.gc() # useless
torch.cuda.empty_cache() # useless

Anyone can help?

julia Version 1.11.3
julia> torch.version
Python: '2.6.0+cu126'

@x66ccff
Copy link
Author

x66ccff commented Feb 4, 2025

@cjdoris @MilesCranmer Any thoughts on this issue? Thanks!

@x66ccff
Copy link
Author

x66ccff commented Feb 4, 2025

alright, PythonCall.pydel!(a) can solve this

using PythonCall
torch = pyimport("torch")
torch.cuda.is_available()
n=20000
 
a = torch.randn((1,n*n),device=torch.device("cuda"))  # VRAM increase here
a = torch.randn((1,n*n),device=torch.device("cuda"))  # VRAM also increase here
a = torch.randn((1,n*n),device=torch.device("cuda"))  # VRAM also increase here
a = nothing # useless
 
PythonCall.GC.gc() # useless
torch.cuda.empty_cache() # useless

PythonCall.pydel!(a)

PythonCall.GC.gc()
torch.cuda.empty_cache() # Released!

@x66ccff x66ccff closed this as completed Feb 4, 2025
@x66ccff
Copy link
Author

x66ccff commented Feb 22, 2025

x66ccff/SymbolicRegressionGPU.jl#22

still problem here

@x66ccff x66ccff reopened this Feb 22, 2025
@x66ccff
Copy link
Author

x66ccff commented Feb 22, 2025

When using PythonCall and Pytorch together, if tensors are created in Julia code (including any temporary tensors) without keeping a "handle" for Julia to release them through pydel!(), it leads to an inability to release these tensors through gc or torch[].cuda.empty_cache. I wrote a specific example to demonstrate:

using PythonCall
torch = pyimport("torch")
torch.cuda.is_available()
n=20000
 
a = torch.randn((1,n*n),device=torch.device("cuda"))  # VRAM increase here

f(x) = begin
1 + 1
3 * 1
x + x             # ✅ can be released
end             

g = f(a)

PythonCall.pydel!(a)
PythonCall.pydel!(g)

println(torch.cuda.memory_summary())
using PythonCall
torch = pyimport("torch")
torch.cuda.is_available()
n=20000
 
a = torch.randn((1,n*n),device=torch.device("cuda"))  # VRAM increase here

f(x) = begin
x + 1         # ❌ can not release this anymore
x * 1         # ❌ can not release this anymore
x + x         # ✅ can be released
end              

g = f(a)

PythonCall.pydel!(a)
PythonCall.pydel!(g)

println(torch.cuda.memory_summary())

@cjdoris
Copy link
Collaborator

cjdoris commented Feb 22, 2025

After f(x) finishes, the result of x+1 is unreachable so Julia will finalize it at some point in the future, which will delete the Python object and so free the memory backing the tensor. However Julia provides no guarantees about when it will GC (which runs finalizers) - and in general waits until there is too much memory pressure on your system. Explicitly calling GC.gc() at some point after f(x) should free that memory.

@x66ccff
Copy link
Author

x66ccff commented Feb 22, 2025

wow! Thanks! That works! 😂 I forgot to try GC.gc() I kept trying using PythonCall.pydel!, PythonCall.GC.gc(), and torch.cuda.empty_cache().

@x66ccff
Copy link
Author

x66ccff commented Feb 22, 2025

@cjdoris However, frequently calling GC.gc() makes it too slow. Do you have any methods that specifically focus on garbage collection for PyTorch tensors?

@cjdoris
Copy link
Collaborator

cjdoris commented Feb 23, 2025

No, unless you pydel! every intermediate tensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants