Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OSError: /usr/local/lib/python3.10/dist-packages/fbgemm_gpu/fbgemm_gpu_config.so: undefined symbol against PyTorch 2.7 #3302

Open
AlannaBurke opened this issue Mar 19, 2025 · 1 comment
Labels
2.7 bug rl Issues related to reinforcement learning tutorial, DQN, and so on

Comments

@AlannaBurke
Copy link
Contributor

AlannaBurke commented Mar 19, 2025

Add Link

https://pytorch.org/tutorials/intermediate/pinmem_nonblock.html

Describe the bug

Tutorial failing with the following error:

Unexpected failing examples:
/var/lib/workspace/intermediate_source/pinmem_nonblock.py failed leaving traceback:
Traceback (most recent call last):
  File "/var/lib/workspace/intermediate_source/pinmem_nonblock.py", line 642, in <module>
    from tensordict import TensorDict
  File "/usr/local/lib/python3.10/dist-packages/tensordict/__init__.py", line 6, in <module>
    import tensordict._reductions
  File "/usr/local/lib/python3.10/dist-packages/tensordict/_reductions.py", line 11, in <module>
    from tensordict._lazy import LazyStackedTensorDict
  File "/usr/local/lib/python3.10/dist-packages/tensordict/_lazy.py", line 37, in <module>
    from tensordict.memmap import MemoryMappedTensor
  File "/usr/local/lib/python3.10/dist-packages/tensordict/memmap.py", line 22, in <module>
    from tensordict.utils import _shape, implement_for, IndexType, NESTED_TENSOR_ERR
  File "/usr/local/lib/python3.10/dist-packages/tensordict/utils.py", line 94, in <module>
    from torchrec import KeyedJaggedTensor
  File "/usr/local/lib/python3.10/dist-packages/torchrec/__init__.py", line 10, in <module>
    import torchrec.distributed  # noqa
  File "/usr/local/lib/python3.10/dist-packages/torchrec/distributed/__init__.py", line 38, in <module>
    from torchrec.distributed.model_parallel import DistributedModelParallel  # noqa
  File "/usr/local/lib/python3.10/dist-packages/torchrec/distributed/model_parallel.py", line 26, in <module>
    from torchrec.distributed.planner import EmbeddingShardingPlanner, Topology
  File "/usr/local/lib/python3.10/dist-packages/torchrec/distributed/planner/__init__.py", line 24, in <module>
    from torchrec.distributed.planner.planners import EmbeddingShardingPlanner  # noqa
  File "/usr/local/lib/python3.10/dist-packages/torchrec/distributed/planner/planners.py", line 21, in <module>
    from torchrec.distributed.planner.constants import BATCH_SIZE, MAX_SIZE
  File "/usr/local/lib/python3.10/dist-packages/torchrec/distributed/planner/constants.py", line 12, in <module>
    from torchrec.distributed.embedding_types import EmbeddingComputeKernel
  File "/usr/local/lib/python3.10/dist-packages/torchrec/distributed/embedding_types.py", line 16, in <module>
    from fbgemm_gpu.split_table_batched_embeddings_ops_training import EmbeddingLocation
  File "/usr/local/lib/python3.10/dist-packages/fbgemm_gpu/__init__.py", line 71, in <module>
    _load_library(f"{library}.so")
  File "/usr/local/lib/python3.10/dist-packages/fbgemm_gpu/__init__.py", line 21, in _load_library
    raise error
  File "/usr/local/lib/python3.10/dist-packages/fbgemm_gpu/__init__.py", line 17, in _load_library
    torch.ops.load_library(os.path.join(os.path.dirname(__file__), filename))
  File "/var/lib/ci-user/.local/lib/python3.10/site-packages/torch/_ops.py", line 1392, in load_library
    ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/local/lib/python3.10/dist-packages/fbgemm_gpu/fbgemm_gpu_config.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSsb

build log

Please submit fixes against the 2.7-RC-TEST branch and enable in .jenkins/validate_tutorials_built.py.

Describe your environment

CUDA: 12.6
PyTorch: 2.7

cc @vmoens @nairbv

@svekars svekars added rl Issues related to reinforcement learning tutorial, DQN, and so on 2.7 labels Mar 19, 2025
@vmoens
Copy link
Contributor

vmoens commented Mar 19, 2025

@svekars should confirm but #3300 should solve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.7 bug rl Issues related to reinforcement learning tutorial, DQN, and so on
Projects
None yet
Development

No branches or pull requests

3 participants