Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc pull from an s3 bucket fails with _init__() got an unexpected keyword argument 'ssl_verify' #40

Closed
JenniferHem opened this issue Jan 19, 2023 · 11 comments

Comments

@JenniferHem
Copy link

Bug Report

Description

My remote is an s3-like bucket which needs ssl verification, however running dvc pull with the config containing ssl_verify=/path/to/cert results in the abovementioned error.

Expected

dvc pulls the data

Environment information

Output of dvc doctor:

$ dvc doctor
Platform: Python 3.8.12 on Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.17
Subprojects:
        dvc_data = 0.29.0
        dvc_objects = 0.14.1
        dvc_render = 0.0.17
        dvc_task = 0.1.9
        dvclive = 1.3.2
        scmrepo = 0.1.5
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2022.11.0, boto3 = 1.26.52),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb
Caches: local
Remotes: webdavs
Workspace directory: ext4 on /dev/sdb
Repo: dvc, git

Additional Information (if any):

@efiop
Copy link
Contributor

efiop commented Jan 20, 2023

Hi @JenniferHem ,

So what error are you getting? Please post full dvc pull -v output

@JenniferHem
Copy link
Author

JenniferHem commented Jan 20, 2023

The full output is (redacted to not contain the user and urls):

2023-01-20 08:19:52,599 DEBUG: Preparing to transfer data from 'URL' to '/repo/.dvc/cache'
2023-01-20 08:19:52,600 DEBUG: Preparing to collect status from '/repo/.dvc/cache'
2023-01-20 08:19:52,600 DEBUG: Collecting status from '/repo/.dvc/cache'
2023-01-20 08:19:52,601 DEBUG: Preparing to collect status from 'URL'
2023-01-20 08:19:52,601 DEBUG: Collecting status from 'URL'
2023-01-20 08:19:52,602 DEBUG: Querying 1 oids via object_exists
2023-01-20 08:19:52,657 ERROR: unexpected error - __init__() got an unexpected keyword argument 'ssl_verify'
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/commands/data_sync.py", line 31, in run
    stats = self.repo.pull(
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/repo/pull.py", line 34, in pull
    processed_files_count = self.fetch(
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/repo/fetch.py", line 86, in fetch
    d, f = _fetch(
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/repo/fetch.py", line 166, in _fetch
    d, f = repo.cloud.pull(
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/data_cloud.py", line 170, in pull
    return self.transfer(
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc/data_cloud.py", line 124, in transfer
    return transfer(src_odb, dest_odb, objs, **kwargs)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_data/hashfile/transfer.py", line 190, in transfer
    status = compare_status(
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 185, in compare_status
    src_exists, src_missing = status(
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_data/hashfile/status.py", line 151, in status
    odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_objects/db.py", line 362, in oids_exist
    return list(wrap_iter(remote_oids, callback))
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_objects/db.py", line 25, in wrap_iter
    for index, item in enumerate(iterable, start=1):
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_objects/db.py", line 309, in list_oids_exists
    yield from itertools.compress(oids, in_remote)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_objects/fs/base.py", line 308, in exists
    return self.fs.exists(path)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/fsspec/asyn.py", line 113, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/fsspec/asyn.py", line 98, in sync
    raise return_result
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/fsspec/asyn.py", line 53, in _runner
    result[0] = await coro
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/fsspec/implementations/http.py", line 305, in _exists
    session = await self.set_session()
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/fsspec/implementations/http.py", line 128, in set_session
    self._session = await self.get_client(loop=self.loop, **self.client_kwargs)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_http/__init__.py", line 131, in get_client
    return ReadOnlyRetryClient(**kwargs)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/dvc_http/retry.py", line 8, in __init__
    super().__init__(*args, **kwargs)
  File "/home/myuser/miniconda3/envs/ML/lib/python3.8/site-packages/aiohttp_retry/client.py", line 193, in __init__
    client = ClientSession(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'ssl_verify'
------------------------------------------------------------
2023-01-20 08:19:52,794 DEBUG: Version info for developers:
DVC version: 2.41.1 (pip)
---------------------------------
Platform: Python 3.8.12 on Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.17
Subprojects:
        dvc_data = 0.29.0
        dvc_objects = 0.14.1
        dvc_render = 0.0.17
        dvc_task = 0.1.9
        dvclive = 1.3.2
        scmrepo = 0.1.5
Supports:
        http (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2022.11.0, boto3 = 1.26.52),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: https, s3
Workspace directory: ext4 on /dev/sdb
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2023-01-20 08:19:52,796 DEBUG: Analytics is disabled.

@JenniferHem
Copy link
Author

JenniferHem commented Jan 20, 2023

@efiop If it helps: we also have a WebDAV backend which works without any issues. Just saw this for s3

@JenniferHem
Copy link
Author

and pulling data works with dvc version 2.8.3 (which was suggested by a colleague)

@efiop
Copy link
Contributor

efiop commented Jan 20, 2023

@JenniferHem The error is clearly comming from http remote and not from s3 remote. Could you show dvc config --list, please? Make sure you edit-out sensitive creds/urls if there are any.

@JenniferHem
Copy link
Author

remote.myremote.url=https://myremote.net:8080/myremote
remote.myremote.custom_auth_header=Token
remote.myremote.verify=False
remote.myremote.ssl_verify=/path/to/cert.pem
core.check_update=false
core.analytics=false
core.autostage=true
core.remote=myremote
remote.myremote.password=password

@efiop
Copy link
Contributor

efiop commented Jan 20, 2023

@JenniferHem Thanks. Btw, when you are saying s3-like bucket, are you using minio or something?

Regarding the error itself, could you also run pip check and show pip freeze | grep dvc-http, please?

@JenniferHem
Copy link
Author

pip check
aiobotocore 2.4.2 has requirement botocore<1.27.60,>=1.27.59, but you have botocore 1.29.52.

pip freeze
dvc-http==2.30.0

I cannot really tell you what the underlying technology is we are using sorry. Just that its not an AWS bucket.

@themaikelman
Copy link

We have the same issue and we work with a http-remote and we add the ssl_verify option to False

https://dvc.org/doc/command-reference/remote/modify#http
ssl_verify - whether or not to verify SSL certificates, or a path to a custom CA package to do so (true by default).

DVC VERSION: 2.41.1

Checked with DVC 2.38.1 and works fine!

This is the .dvc/config

[core]
    remote = http-remote
    check_update = false
    analytics = false
['remote "http-remote"']
    url = https://my-remote:443/remote?remote=123456
    auth = custom
    custom_auth_header = X-Token
    ssl_verify = False

@efiop
Copy link
Contributor

efiop commented Jan 23, 2023

Seems to be related to https://github.com/iterative/dvc-http/pull/28/files , i see that we've changed some logic there. Moving to dvc-http

@efiop efiop transferred this issue from iterative/dvc Jan 23, 2023
@efiop
Copy link
Contributor

efiop commented Jan 23, 2023

Fixed by #32 Will be part of the next dvc release later today.

@efiop efiop closed this as completed Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants