Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to extracting positions from sdf files #4654

Merged
merged 2 commits into from
May 16, 2022
Merged

Updates to extracting positions from sdf files #4654

merged 2 commits into from
May 16, 2022

Conversation

arunppsg
Copy link
Contributor

This PR updates extracting molecular positions from qm9 dataset. Earlier, the sdf file was read as a text file and positions were extracted using patterns in the text file. Now, we use rdkit's utils directly to extract atom position in molecules from sdf files.

@codecov
Copy link

codecov bot commented May 16, 2022

Codecov Report

Merging #4654 (6408d04) into master (f045451) will decrease coverage by 0.11%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4654      +/-   ##
==========================================
- Coverage   83.09%   82.97%   -0.12%     
==========================================
  Files         316      316              
  Lines       16787    16784       -3     
==========================================
- Hits        13949    13927      -22     
- Misses       2838     2857      +19     
Impacted Files Coverage Δ
torch_geometric/nn/conv/utils/typing.py 81.25% <0.00%> (-17.50%) ⬇️
torch_geometric/io/tu.py 93.58% <0.00%> (-2.57%) ⬇️
torch_geometric/nn/models/mlp.py 98.52% <0.00%> (-1.48%) ⬇️
torch_geometric/transforms/gdc.py 78.17% <0.00%> (-1.02%) ⬇️
torch_geometric/data/dataset.py 96.80% <0.00%> (-0.80%) ⬇️
torch_geometric/nn/conv/rgat_conv.py 83.76% <0.00%> (-0.53%) ⬇️
torch_geometric/data/download.py 100.00% <0.00%> (ø)
torch_geometric/graphgym/utils/comp_budget.py 15.51% <0.00%> (+0.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f045451...6408d04. Read the comment docs.

Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, does that result in the same atom positions?

@arunppsg
Copy link
Contributor Author

Yes, here is a minimal example using temp.sdf with the following contents:

atom1


  5  4  0     0  0  0  0  0  0999 V2000
   -0.0127    1.0858    0.0080 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0022   -0.0060    0.0020 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.0117    1.4638    0.0003 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5408    1.4475   -0.8766 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5238    1.4379    0.9064 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  1  4  1  0  0  0  0
  1  5  1  0  0  0  0
M  END
$$$$

Extracting positions:

from rdkit import Chem
suppl = Chem.SDMolSupplier('temp.sdf', removeHs = False, sanitize=False)

for i, mol in enumerate(suppl):
    N = mol.GetNumAtoms()
    pos = suppl.GetItemText(i).split('\n')[4:4 + N]
    pos_via_text = [[float(x) for x in line.split()[:3]] for line in pos]

    conf = mol.GetConformer()
    pos_via_rdkit = conf.GetPositions()

Verifying

>>> print (pos_via_text)
[[-0.0127, 1.0858, 0.008],
 [0.0022, -0.006, 0.002],
 [1.0117, 1.4638, 0.0003],
 [-0.5408, 1.4475, -0.8766],
 [-0.5238, 1.4379, 0.9064]]

>>> print (pos_via_rdkit)
array([[-1.2700e-02,  1.0858e+00,  8.0000e-03],
       [ 2.2000e-03, -6.0000e-03,  2.0000e-03],
       [ 1.0117e+00,  1.4638e+00,  3.0000e-04],
       [-5.4080e-01,  1.4475e+00, -8.7660e-01],
       [-5.2380e-01,  1.4379e+00,  9.0640e-01]])

@rusty1s
Copy link
Member

rusty1s commented May 16, 2022

Thanks for confirming:)

@rusty1s rusty1s merged commit da78713 into pyg-team:master May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants