Skip to content

mitogen.core.StreamError: invalid message: cannot unpickle Context: bad input #689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
madrover opened this issue Feb 18, 2020 · 8 comments

Comments

@madrover
Copy link

Hi
When trying to execute any ansible command, either adhoc or on playbook, against some hosts it fais with the following error:

[mux  28801] 23:45:24.171948 D mitogen.[fork.12844]: Recovered sys.executable: '/usr/bin/python'
[mux  28801] 23:45:24.172174 D ansible_mitogen.target.[ssh.server1.sudo.root]: Selected temp directory: u'/root/.ansible/tmp' (from [u'/root/.ansible/tmp', u'/var/tmp', u'/tmp', '/tmp', '/var/tmp', '/usr/tmp', '/home/centos'])
[mux  28801] 23:45:24.172361 D mitogen.[ssh.server1.sudo.root]: Dispatcher: Message(4, 0, 0, 101, 1002, '\x80\x02(NX\x16\x00\x00\x00ansible_mitogen.targetq\x00NX\n\x00\x00\x00init_childq'..186) -> {u'fork_context': Context(1003, u'fork.12844'), u'home_dir': u'/root', u'good_temp_dir': u'/root/.ansible/tmp'}
ERROR! [task 28836] 23:45:24.173411 E mitogen: raw pickle was: b'\x80\x02}q\x00(X\x07\x00\x00\x00contextq\x01cmitogen.core\n_unpickle_context\nq\x02K\x04Xi\x00\x00\x00ssh.server1.sudo.rootq\x03\x86q\x04Rq\x05X\x03\x00\x00\x00viaq\x06h\x02K\x02X_\x00\x00\x00ssh.server1q\x07\x86q\x08Rq\tX\x11\x00\x00\x00init_child_resultq\n}q\x0b(X\x0c\x00\x00\x00fork_contextq\x0ch\x02M\xeb\x03X\n\x00\x00\x00fork.12844q\r\x86q\x0eRq\x0fX\x08\x00\x00\x00home_dirq\x10X\x05\x00\x00\x00/rootq\x11X\r\x00\x00\x00good_temp_dirq\x12X\x12\x00\x00\x00/root/.ansible/tmpq\x13uX\x03\x00\x00\x00msgq\x14Nu.'
[task 28836] 23:45:24.174361 D ansible_mitogen.mixins: _remove_tmp_path(None)
[task 28836] 23:45:24.178439 D mitogen: MitogenProtocol(unix_listener.28801): disconnecting
[task 28836] 23:45:24.179229 D mitogen: Waker(fd=11/12): disconnecting
[mux  28801] 23:45:24.179270 D mitogen: <Side of unix_client.28836 fd 76>: empty read, disconnecting
[task 28836] 23:45:24.180078 D mitogen: Router(Broker(4780)): stats: 0 module requests in 0 ms, 0 sent (0 ms minify time), 0 negative responses. Sent 0.0 kb total, 0.0 kb avg.
[mux  28801] 23:45:24.180053 D mitogen: MitogenProtocol(unix_client.28836): disconnecting
The full traceback is:
Traceback (most recent call last):
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/mitogen/core.py", line 952, in unpickle
    obj = unpickler.load()
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/mitogen/core.py", line 822, in _unpickle_context
    return _unpickle_context(context_id, name, router=self.router)
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/mitogen/core.py", line 2307, in _unpickle_context
    raise TypeError('cannot unpickle Context: bad input')
TypeError: cannot unpickle Context: bad input

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible/executor/task_executor.py", line 146, in run
    res = self._execute()
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible/executor/task_executor.py", line 645, in _execute
    result = self._handler.run(task_vars=variables)
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible_mitogen/mixins.py", line 121, in run
    return super(ActionModuleMixin, self).run(tmp, task_vars)
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible/plugins/action/gather_facts.py", line 79, in run
    res = self._execute_module(module_name=fact_module, module_args=mod_args, task_vars=task_vars, wrap_async=False)
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible_mitogen/mixins.py", line 351, in _execute_module
    self._set_temp_file_args(module_args, wrap_async)
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible_mitogen/mixins.py", line 330, in _set_temp_file_args
    self._connection.get_good_temp_dir()
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible_mitogen/connection.py", line 775, in get_good_temp_dir
    self._connect()
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible_mitogen/connection.py", line 797, in _connect
    self._connect_stack(stack)
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/ansible_mitogen/connection.py", line 748, in _connect_stack
    stack=mitogen.utils.cast(list(stack)),
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/mitogen/service.py", line 120, in call
    return call_context.call_service(service_name, method_name, **kwargs)
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/mitogen/core.py", line 2276, in call_service
    return recv.get().unpickle()
  File "/home/user/ansible/.venv/lib/python3.6/site-packages/mitogen/core.py", line 959, in unpickle
    raise StreamError('invalid message: %s', e)
mitogen.core.StreamError: invalid message: cannot unpickle Context: bad input
fatal: [server1]: FAILED! => {
    "msg": "Unexpected failure during module execution.",
    "stdout": ""
}

This error is reproducible only when using become, otherwise it seems to work.

Thanks!

  • Which version of Ansible are you running? 2.9.5
  • Is your version of Ansible patched in any way? No
  • Are you running with any custom modules, or module**utils loaded? No
  • Have you tried the latest master version from Git? No, we're on Mitogen 0.2.9
  • Do you have some idea of what the underlying problem may be? It seems to be sudo related
  • Mention your host and target OS and versions: Host: CentOS 7.4.1708 Target: Centos 6.10
  • Mention your host and target Python versions: Host: 3.6.8 Target: 2.6.6
  • If reporting a performance issue, mention the number of targets and a rough
    description of your workload (lots of copies, lots of tiny file edits, etc.): N/A
  • If reporting a crash or hang in Ansible, please rerun with -vvv and include
    200 lines of output around the point of the error, along with a full copy of
    any traceback or error text in the log. Beware "-vvv" may include secret
    data! Edit as necessary before posting.
    cannot_unpickle.log
  • If reporting any kind of problem with Ansible, please include the Ansible
    version along with output of "ansible-config dump --only-changed".
@madrover
Copy link
Author

Can I have some guidance on this one, please?

@s1113950
Copy link
Collaborator

Please try Ansible 2.8.8; Ansible 2.9 isn't fully supported yet. Also if you want you can try my patch that fixes some things as well: #658 It mainly fixes ansible_python_interpreter discovery but touches some other stuff too. After my patch lands I hope to spend some time getting Ansible 2.9 supported fully.

@madrover
Copy link
Author

The misbehavior can be reproduced in Ansible 2.8.8. However, I have determined what's causing the issue; we have a role which includes a library which implements a task that we use to validate a license that we use to deploy a component.

The module is called this way:

- name: Test that license has not expired
license_check:
  license_file: "{{ license_file }}"
delegate_to: localhost

And the library content is:

#!/usr/bin/python

from ansible.module_utils.basic import AnsibleModule
from ansible.utils.display import Display
from datetime import datetime
from datetime import timedelta
import re
import os

def main():

  result = dict(
    changed = False,
    warnings = [],
    renewal_date = ''
  )

  module = AnsibleModule(
    argument_spec = dict(
      license_file = dict(required=True),
      warning_weeks = dict(required=False, type='int', default=12)
    ),
    supports_check_mode=True
  )

  pattern = re.compile("^key.renewal=(.*)")

  try:
    for line in open(module.params['license_file']):
      for match in re.finditer(pattern, str(line)):
        expiry_date = datetime.strptime(match.group(1), '%Y%m%d')
        result['renewal_date'] = expiry_date.strftime('%Y-%m-%d')
        expiry_window = expiry_date - timedelta(weeks=module.params['warning_weeks'])
        now = datetime.now()

        if expiry_date < now:
          fail_msg = 'The license expired on ' + expiry_date.strftime('%d %b %Y') + '.'
          module.fail_json(msg = fail_msg)

        elif expiry_window < now:
          fail_msg = 'The license is due to expire on ' + expiry_date.strftime('%d %b %Y') + '.'
          result['warnings'].append(fail_msg)

  except IOError:
    module.fail_json(msg = 'Could not open the requested file: ' + module.params['license_file'])

  module.exit_json(**result)

if __name__ == '__main__':
  main();

If we remove the call to that module then the playbook executes without problems.

@s1113950
Copy link
Collaborator

Thanks for the info 👍
I should have time in the next few days to repro on my end with your custom module and see if I can fix your issue

@s1113950
Copy link
Collaborator

s1113950 commented Mar 25, 2020

I'm working on this here: s1113950/mitogen-test@6029980 , running with make custom-lib-unpickle-test. I get a different error than you do though, possibly because I'm not targeting a centos6 machine yet. Don't have time to work on this anymore today, but plan on tackling it more soon.

@s1113950
Copy link
Collaborator

Do you have a specific warning_weeks value or license_file I should use to repro?

@s1113950
Copy link
Collaborator

@madrover
Try removing this import from your license_file:

from ansible.utils.display import Display

It works for me targeting a centos6 docker host from my Mac (closest setup I could make):

TASK [run_test : Create license file on docker host] *************************************************************************************************************
task path: pathHere
changed: [localhost -> testMitogen] => {"ansible_facts": {}, "changed": true, "cmd": "echo \"LICENSE\" > /tmp/license_file.txt", "delta": "0:00:00.003377", "end":
 "2020-03-26 07:09:30.639485", "rc": 0, "start": "2020-03-26 07:09:30.636108", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

TASK [run_test : Test that license has not expired] **************************************************************************************************************
task path: pathHere
ok: [localhost -> testMitogen] => {"ansible_facts": {}, "changed": false, "renewal_date": ""}

via

- name: Create license file on docker host
  shell: echo "LICENSE" > /tmp/license_file.txt

- name: Test that license has not expired
  license_check:
    license_file: /tmp/license_file.txt

with a delegate_to pointing to my centos6 container.

I tested with a license file that just contained the word LICENSE in it since I'm not sure what kind of license file you're using.

If that doesn't work for you then I'll need a sample license_file and warning_weeks value to get a more precise test :)

@Tristan971
Copy link

Tristan971 commented May 14, 2024

While I'm unsure if it's the same issue, the same message was raised for me on Python 3.11.6 and fixed with minor version bumps, so if someone else runs into it:

  • broken on Ansible core 2.14.14 (with 7.6) + Mitogen 0.3.4
  • fixed on Ansible core 2.14.16 (with 7.6) + Mitogen 0.3.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants