CrossEntropyLoss fails to run with GPU #2400

celsofranssa · 2020-06-28T15:04:30Z

🐛 Bug

Using the following training_step method which uses nn.CrossEntropyLoss() loss function:

    def training_step(self, batch, batch_idx):
        x1, x2 = batch["x1"], batch["x2"]
        predict = self(x1, x2)
        target = torch.arange(x1.size()[0])
        loss = self.loss_fn(predict, target)
        return {'loss': loss}

fails to run with GPU throwing the following error:

RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'target' in call to _thnn_nll_loss_forward

The function self.loss_fn is shown below:

import torch
from pytorch_lightning import LightningModule
from torch import nn


class NPairsLoss(LightningModule):
    """
    The N-Pairs Loss.
    It measures the loss given predicted tensors x1, x2 both with shape [batch_size, hidden_size],
    and target tensor y which is the identity matrix with shape  [batch_size, batch_size].
    """

    def __init__(self, alpha=100):
        super(NPairsLoss, self).__init__()
        self.ce = nn.CrossEntropyLoss()
        self.alpha = alpha

    def similarities(self, x1, x2):
        """
        Calculates the cosine similarity matrix for every pair (i, j),
        where i is an embedding from x1 and j is another embedding from x2.

        :param x1: a tensors with shape [batch_size, hidden_size].
        :param x2: a tensors with shape [batch_size, hidden_size].
        :return: the cosine similarity matrix with shape [batch_size, batch_size].
        """
        x1 = x1 / torch.norm(x1, dim=1, keepdim=True)
        x2 = x2 / torch.norm(x2, p=2, dim=1, keepdim=True)
        return self.alpha * torch.matmul(x1, x2.t())

    def forward(self, predict, target):
        """
        Computes the N-Pairs Loss between the target and predictions.
        :param predict: the prediction of the model,
        Contains the batches x1 (image embeddings) and x2 (description embeddings).
        :param target: the identity matrix with shape  [batch_size, batch_size].
        :return: N-Pairs Loss value.
        """
        x1, x2 = predict
        predict = self.similarities(x1, x2)
        # by construction the probability distribution must be concentrated on the diagonal of the similarities matrix.
        # so, Cross Entropy can be used to measure the loss.
        return self.ce(predict, target)

Is target = torch.arange(x1.size()[0]) not being created in the GPU?

Expected behavior

That target tensor (target = torch.arange(x1.size()[0])) is created on the GPU.

Environment

CUDA:
- GPU:
  - GeForce RTX 2080
- available: True
- version: 10.2
Packages:
- numpy: 1.19.0
- pyTorch_debug: False
- pyTorch_version: 1.5.1
- pytorch-lightning: 0.8.1
- tensorboard: 2.2.2
- tqdm: 4.46.1
System:
- OS: Linux
- architecture:
  - 64bit
  - ELF
- processor: x86_64
- python: 3.7.3
- version: Unable to import trainer #41-Ubuntu SMP Tue Dec 3 00:27:35 UTC 2019

The text was updated successfully, but these errors were encountered:

rohitgr7 · 2020-06-28T17:14:21Z

No, you have to move target = torch.arange(x1.size()[0]) to the GPU(or any other device you want) because it's not present in the batch from the dataloader.
You can use target = torch.arange(x1.size()[0]).to(x.get_device()).

celsofranssa · 2020-06-28T18:20:46Z

Ok, thanks @rohitgr7.

williamFalcon · 2020-06-29T01:48:09Z

You can also use:

arget = torch.arange(x1.size()[0]).to(self.device)

the PL module knows what device it is on.

taylorchu · 2020-07-04T07:56:32Z

@williamFalcon is there a reason why this is not managed by lightning?

rohitgr7 · 2020-07-04T11:02:20Z

@taylorchu If you pass that from the DataLoader(or Dataset) itself it will be handled automatically, but if a tensor is created in between the procedure by the user itself, one has to move it to the device manually the PyTorch way.

williamFalcon · 2020-07-04T12:40:21Z

yup... no way around it as mentioned above

celsofranssa added bug Something isn't working help wanted Open to be worked on labels Jun 28, 2020

williamFalcon closed this as completed Jun 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrossEntropyLoss fails to run with GPU #2400

CrossEntropyLoss fails to run with GPU #2400

celsofranssa commented Jun 28, 2020

rohitgr7 commented Jun 28, 2020 •

edited

Loading

celsofranssa commented Jun 28, 2020

williamFalcon commented Jun 29, 2020

taylorchu commented Jul 4, 2020

rohitgr7 commented Jul 4, 2020

williamFalcon commented Jul 4, 2020

CrossEntropyLoss fails to run with GPU #2400

CrossEntropyLoss fails to run with GPU #2400

Comments

celsofranssa commented Jun 28, 2020

🐛 Bug

Expected behavior

Environment

rohitgr7 commented Jun 28, 2020 • edited Loading

celsofranssa commented Jun 28, 2020

williamFalcon commented Jun 29, 2020

taylorchu commented Jul 4, 2020

rohitgr7 commented Jul 4, 2020

williamFalcon commented Jul 4, 2020

rohitgr7 commented Jun 28, 2020 •

edited

Loading