DATA_DIR not respected #7

Pseudomanifold · 2022-06-17T06:12:52Z

Hi Bastain!
I tried installing without poetry and running your code.
Everything worked...
I am not able to figure out how to set the DATA_DIR , as the code is looking for the data in the wrong directory.
Here is the output that I get

(togl) mohit@user-Default-string:~/TOGL$ python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007
Using backend: pytorch
/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:68: UserWarning: No correct seed found, seed set to 3526443079
  warnings.warn(*args, **kwargs)
Global seed set to 3526443079
Traceback (most recent call last):
  File "topognn/train_model.py", line 150, in <module>
    main(model_cls, dataset_cls, args)
  File "topognn/train_model.py", line 59, in main
    dataset.prepare_data()
  File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn
    return fn(*args, **kwargs)
  File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 48, in wrapped_fn
    return fn(*args, **kwargs)
  File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/data_utils.py", line 549, in prepare_data
    with open(os.path.join(DATA_DIR, 'Benchmark_idx', self.name+"_"+section+'.index'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'

Originally posted by @mohit-kumar-27 in #6 (comment)

The text was updated successfully, but these errors were encountered:

Pseudomanifold · 2022-06-17T06:19:16Z

Simplest fix I'd recommend is setting DATA_DIR yourself in TOGL/topognn/__init__.py. You can point that to a directory that you want to use.

As a fix from our side, we could use an env variable or refer to another path. What do you think @edebrouwer, @ExpectationMax, @mi92?

Pseudomanifold · 2022-06-18T09:40:05Z

@mohit-kumar-27 any updates on this? Does the proposed workaround solve your problem?

mohit-kumar-27 · 2022-06-22T07:13:43Z

Hello Bastain,
Not checked till now, stuck up with some urgent work. Will try running again this weekend and update you possibly on Sunday/Monday

mohit-kumar-27 · 2022-06-22T18:10:41Z

This is how I modified the TOGL/topognn/init.py

import os.path
from enum import Enum, auto
DATA_DIR='/home/mohit/TOGL/data/'
#DATA_DIR = os.path.join(os.path.dirname(file), '..', 'data')

class Tasks(Enum):
"""Valid tasks."""

GRAPH_CLASSIFICATION = auto()
NODE_CLASSIFICATION = auto()
NODE_CLASSIFICATION_WEIGHTED = auto()

Still the code searches in the wrong directory and gives the following error
FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'

Pseudomanifold · 2022-06-23T06:18:47Z

This is the right way; I think you need to install TOGL again afterwards to refresh the file in your virtual environment.

…

On 22 June 2022 20:10:53 Mohit Kumar ***@***.***> wrote: This is how I modified the TOGL/topognn/__init__.py import os.path from enum import Enum, auto **DATA_DIR='/home/mohit/TOGL/data/' #DATA_DIR = os.path.join(os.path.dirname(__file__), '..', 'data')** class Tasks(Enum): """Valid tasks.""" GRAPH_CLASSIFICATION = auto() NODE_CLASSIFICATION = auto() NODE_CLASSIFICATION_WEIGHTED = auto() Still the code searches in the wrong directory and gives the following error **FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'** -- Reply to this email directly or view it on GitHub: #7 (comment) You are receiving this because you authored the thread. Message ID: ***@***.***>

mohit-kumar-27 · 2022-06-24T04:47:36Z

Hi Bastain,

I tried running the code by reinstalling the project and DATA_DIR error was resolved, but now I get the following error
raise CommError("Permission denied, ask the project owner to grant you access")
wandb.errors.CommError: Permission denied, ask the project owner to grant you access
wandb: ERROR Internal wandb error: file data was not synced

I created a new wandb account and gave the api key, when the program asked me to, then I got this error

This is the full output

wandb: Currently logged in as: mohitk2 (use wandb login --relogin to force relogin)
wandb: wandb version 0.12.19 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: ERROR Error while calling W&B API: project not found (<Response [404]>)
Thread SenderThread:
Traceback (most recent call last):
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 102, in call
result = self._call_fn(*args, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 133, in execute
six.reraise(*sys.exc_info())
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 127, in execute
return self.client.execute(*args, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/transport/requests.py", line 39, in execute
request.raise_for_status()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/requests/models.py", line 960, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.wandb.ai/graphql

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 24, in wrapper
return func(*args, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 922, in upsert_run
response = self.gql(mutation, variable_values=variable_values, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 118, in call
if not check_retry_fn(e):
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/util.py", line 727, in no_retry_auth
raise CommError("Permission denied, ask the project owner to grant you access")
wandb.errors.CommError: Permission denied, ask the project owner to grant you access

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 55, in run
self._run()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_util.py", line 105, in _run
self._process(record)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal.py", line 292, in _process
self._sm.send(record)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 181, in send
send_handler(record)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 604, in send_run
self._init_run(run, config_value_dict)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/sender.py", line 626, in _init_run
server_run, inserted = self._api.upsert_run(
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 62, in wrapper
six.reraise(CommError, CommError(message, err), sys.exc_info()[2])
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/six.py", line 718, in reraise
raise value.with_traceback(tb)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/apis/normalize.py", line 24, in wrapper
return func(*args, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 922, in upsert_run
response = self.gql(mutation, variable_values=variable_values, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 118, in call
if not check_retry_fn(e):
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/util.py", line 727, in no_retry_auth
raise CommError("Permission denied, ask the project owner to grant you access")
wandb.errors.CommError: Permission denied, ask the project owner to grant you access
wandb: ERROR Internal wandb error: file data was not synced
Problem at: /scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py 155 experiment
Traceback (most recent call last):
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 761, in init
run = wi.init()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 520, in init
backend.cleanup()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 167, in cleanup
self.interface.join()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 836, in join
_ = self._communicate(record)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 545, in _communicate
return self._communicate_async(rec, local=local).get(timeout=timeout)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 550, in _communicate_async
raise Exception("The wandb backend process has shutdown")
Exception: The wandb backend process has shutdown
wandb: ERROR Abnormal program exit
Traceback (most recent call last):
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 761, in init
run = wi.init()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 520, in init
backend.cleanup()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/backend/backend.py", line 167, in cleanup
self.interface.join()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 836, in join
_ = self._communicate(record)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 545, in _communicate
return self._communicate_async(rec, local=local).get(timeout=timeout)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/interface/interface.py", line 550, in _communicate_async
raise Exception("The wandb backend process has shutdown")
Exception: The wandb backend process has shutdown

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "topognn/train_model.py", line 150, in
main(model_cls, dataset_cls, args)
File "topognn/train_model.py", line 82, in main
dirpath=wandb_logger.experiment.dir,
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 41, in experiment
return get_experiment() or DummyExperiment()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 48, in wrapped_fn
return fn(*args, **kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/base.py", line 39, in get_experiment
return fn(self)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/loggers/wandb.py", line 155, in experiment
self._experiment = wandb.init(
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/wandb/sdk/wandb_init.py", line 798, in init
six.raise_from(Exception("problem"), error_seen)
File "", line 3, in raise_from
Exception: problem

Pseudomanifold · 2022-06-24T09:33:59Z

You can start train_model.py with WANDB_MODE=disabled or WANDB_MODE=offline, i.e.:

$ WANDB_MODE=offline poetry run python train_model.py

@ExpectationMax @edebrouwer: should we solve this more generically and remove the team name from the WandB logger? Or potentially default to a tensorboard logger?

mohit-kumar-27 · 2022-07-01T07:27:58Z

I ran the following from my terminal
(togl) mohit@user-Default-string:~/TOGL$ wandb offline

(togl) mohit@user-Default-string:~/TOGL$ python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007

I get the following error:

Traceback (most recent call last):
File "topognn/train_model.py", line 152, in
main(model_cls, dataset_cls, args)
File "topognn/train_model.py", line 98, in main
trainer.fit(model, datamodule=dataset)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
self.dispatch()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch

self.accelerator.start_training(self)

File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
self.training_type_plugin.start_training(trainer)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 107, in start_training
mp.spawn(self.new_process, **self.mp_spawn_kwargs)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes
process.start()
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 58, in _launch
self.pid = util.spawnv_passfds(spawn.get_executable(),
File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/multiprocessing/util.py", line 452, in spawnv_passfds
return _posixsubprocess.fork_exec(
**ValueError: bad value(s) in fds_to_keep

wandb: Waiting for W&B process to finish, PID 11662
wandb: Program failed with code 1.**

Could you suggest what needs to be done here?

Pseudomanifold · 2022-07-01T07:58:10Z

Seems to be a problem with wandb; please try WANDB_MODE=disabled.

PS: Please read and follow these instructions for formatting your messages.

mohit-kumar-27 · 2022-07-01T09:08:21Z

I tried
(mohit_f) mohit@user-Default-string:~/TOGL$ WANDB_MODE=disabled python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007
Still getting same error

The issue seems to be with pytorch_lightning and multiprocessing

Pseudomanifold · 2022-07-01T11:52:58Z

Hmm, might be better to open a separate issue with pytorch-lightning. You could also check whether you can change the Trainer class (use a different strategy for training, as described in the documentation). See also PyTorch issue 538.

Closing this issue for now since the original problem has been resolved. Please feel free to open another issue for anything else related to TOGL.

Pseudomanifold closed this as completed Jul 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA_DIR not respected #7

DATA_DIR not respected #7

Pseudomanifold commented Jun 17, 2022 •

edited

Loading

Pseudomanifold commented Jun 17, 2022

Pseudomanifold commented Jun 18, 2022

mohit-kumar-27 commented Jun 22, 2022

mohit-kumar-27 commented Jun 22, 2022

Pseudomanifold commented Jun 23, 2022 via email

mohit-kumar-27 commented Jun 24, 2022 •

edited

Loading

Pseudomanifold commented Jun 24, 2022

mohit-kumar-27 commented Jul 1, 2022

Pseudomanifold commented Jul 1, 2022

mohit-kumar-27 commented Jul 1, 2022

Pseudomanifold commented Jul 1, 2022

DATA_DIR not respected #7

DATA_DIR not respected #7

Comments

Pseudomanifold commented Jun 17, 2022 • edited Loading

Pseudomanifold commented Jun 17, 2022

Pseudomanifold commented Jun 18, 2022

mohit-kumar-27 commented Jun 22, 2022

mohit-kumar-27 commented Jun 22, 2022

Pseudomanifold commented Jun 23, 2022 via email

mohit-kumar-27 commented Jun 24, 2022 • edited Loading

Pseudomanifold commented Jun 24, 2022

mohit-kumar-27 commented Jul 1, 2022

Pseudomanifold commented Jul 1, 2022

mohit-kumar-27 commented Jul 1, 2022

Pseudomanifold commented Jul 1, 2022

Pseudomanifold commented Jun 17, 2022 •

edited

Loading

mohit-kumar-27 commented Jun 24, 2022 •

edited

Loading