Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch-to-configuration not interpreted using perl #53672

Closed
eadwu opened this issue Jan 8, 2019 · 52 comments
Closed

switch-to-configuration not interpreted using perl #53672

eadwu opened this issue Jan 8, 2019 · 52 comments
Labels
0.kind: regression Something that worked before working no longer 6.topic: kernel The Linux kernel
Milestone

Comments

@eadwu
Copy link
Member

eadwu commented Jan 8, 2019

Issue description

Running it as an executable returns a bad interpreter error while running it explicitly with perl works fine. Running with bash also replicates the following error.

/nix/store/v0ba874xw18j47f3hisk4130qll3wfrx-nixos-system-nixos-19.03.git.824a927/bin/switch-to-configuration: line 3: use: command not found
/nix/store/v0ba874xw18j47f3hisk4130qll3wfrx-nixos-system-nixos-19.03.git.824a927/bin/switch-to-configuration: line 4: use: command not found
/nix/store/v0ba874xw18j47f3hisk4130qll3wfrx-nixos-system-nixos-19.03.git.824a927/bin/switch-to-configuration: line 5: use: command not found
/nix/store/v0ba874xw18j47f3hisk4130qll3wfrx-nixos-system-nixos-19.03.git.824a927/bin/switch-to-configuration: line 6: use: command not found
/nix/store/v0ba874xw18j47f3hisk4130qll3wfrx-nixos-system-nixos-19.03.git.824a927/bin/switch-to-configuration: line 7: use: command not found
/nix/store/v0ba874xw18j47f3hisk4130qll3wfrx-nixos-system-nixos-19.03.git.824a927/bin/switch-to-configuration: line 8: syntax error near unexpected token `('
/nix/store/v0ba874xw18j47f3hisk4130qll3wfrx-nixos-system-nixos-19.03.git.824a927/bin/switch-to-configuration: line 8: `use Sys::Syslog qw(:standard :macros);'

Steps to reproduce

nixos-rebuild switch

Technical details

Please run nix-shell -p nix-info --run "nix-info -m" and paste the
results.

 - system: `"x86_64-linux"`
 - host os: `Linux 5.0.0-rc1, NixOS, 19.03.git.824a927 (Koi)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.2pre6526_9f99d624`
 - channels(root): `"nixos-19.03pre165037.eebd1a92637"`
 - nixpkgs: `$HOME/Downloads/nixpkgs`
@dtzWill
Copy link
Member

dtzWill commented Jan 9, 2019

I encountered the same when trying out the new 5.0.0-rc1 (!!) kernel, and it appears to have been resolved when reverting.

Does that work for you? If so, then we need to sort out why it's acting this way (config? dunno!)....

@eadwu
Copy link
Member Author

eadwu commented Jan 9, 2019

Seems like it is a problem with 5.0.0-rc1, reverted to 4.20.1 and the error doesn't show anymore.

@vcunat vcunat added 6.topic: kernel The Linux kernel 0.kind: regression Something that worked before working no longer labels Jan 10, 2019
@matthewbauer
Copy link
Member

matthewbauer commented Jan 14, 2019

Please report this upstream! I think https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/binfmt_script.c?id=c2315c187fa0d3ab363fdebe22718170b40473e3 is the culprit but don't know enough about the kernel to fix it.

If someone is able to, verify that these cases work:

#!/usr/bin/env perl
use constant;
print 'ok';

and

#! /usr/bin/env perl
use constant;
print 'ok'

@eadwu
Copy link
Member Author

eadwu commented Jan 14, 2019

Both don't work if you run them through bash, running them through perl and works though. I'll see if I can compile 5.0-rc2 fast enough before I can goto bed.

Using chmod +x beforehand also seems to make it work.

~/Downloads/tests
➜ ls
test1  test2

~/Downloads/tests
➜ cat test1
#!/usr/bin/env perl
use constant;
print 'ok';

~/Downloads/tests
➜ cat test2
#! /usr/bin/env perl
use constant;
print 'ok'

~/Downloads/tests
➜ bash test1
test1: line 2: use: command not found
test1: line 3: print: command not found

~/Downloads/tests
➜ bash test2
test2: line 2: use: command not found
test2: line 3: print: command not found

~/Downloads/tests
➜ perl test1
ok%

~/Downloads/tests
➜ perl test2
ok%

~/Downloads/tests
➜ chmod +x test1 test2

~/Downloads/tests
➜ ./test1
ok%

~/Downloads/tests
➜ ./test2
ok%

~/Downloads/tests
➜ uname -a
Linux nixos 5.0.0-rc1 #1-NixOS SMP Mon Jan 7 01:08:20 UTC 2019 x86_64 GNU/Linux

Still seems to be a problem on 5.0.0-rc1, using nixos-rebuild switch.

Still occuring on 5.0.0-rc3.

@L-as
Copy link
Member

L-as commented Jan 16, 2019

This problem also applies to the command-not-found script. Also, for me running the scripts directly from fish (i.e. /path/to/script and not fish /path/to/script), it will give a more accurate error about the operating system not being able to execute the files. I will post the exact error I get once I can.

@L-as
Copy link
Member

L-as commented Jan 17, 2019

This is what happens if I try to run the file directly from a fish shell:

Failed to execute process '/nix/store/8n8cvy1kbqzs13i5b8vnl55q50bjvr4h-nixos-system-las-19.03pre166308.626233eee6e/bin/switch-to-configuration'. Reason:
exec: Exec format error
The file '/nix/store/8n8cvy1kbqzs13i5b8vnl55q50bjvr4h-nixos-system-las-19.03pre166308.626233eee6e/bin/switch-to-configuration' is marked as an executable but could not be run by the operating system.

@dorianr666
Copy link

dorianr666 commented Feb 2, 2019

I confirm this bug as well, running linux 5.0.0-rc4.

@dorianr666
Copy link

dorianr666 commented Feb 2, 2019

So I noticed that the switch-to-configuration script has very long hashbang line and did some testing. I found out that kernel 5.0.0-rc1 - 5.0.0-rc4 (newest at this moment) starts to fail to execute a script if its hashbang line is longer than exactly 128 bytes (newline included).

Reproduction:
Run any kernel in range 5.0.0-rc1 - 5.0.0-rc4.
Create file script containing:
#!/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (newline included)
Execute:
$ chmod a+x script
$ ./script
You'll get:

Failed to execute process './script'. Reason:
exec: Exec format error
The file './script' is marked as an executable but could not be run by the operating system.

Now remove one a from the hashbang line and run it again; you'll get the expected:

Failed to execute process './script'. Reason:
The file './script' specified the interpreter '/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', which is not an executable command.

Running both variants of the script on kernel 4.20.5 throws only the second, expected, error. So this looks like userspace-breaking kernel regression, if it affects essential NixOS scripts. Not sure what kernel dev to contact so I'll leave that up to someone else.

PS: I'm using fish shell, error messages in bash may differ but the problem remains.

@dtzWill
Copy link
Member

dtzWill commented Feb 2, 2019

Looks like they're enforcing the limit that was supposed to already be applied?

pypa/pip#1773 (comment)

And indeed looking at our headers I'm seeing:

/* sizeof(linux_binprm->buf) */
#define BINPRM_BUF_SIZE 128

Scrolling to the end it looks like this is getting attention elsewhere recently as well,
likely others encountering same?

I think the commit linked previously is indeed responsible-- or certainly rather relevant (see the commit message!).

Not sure where this leaves us.... :/

@samueldr samueldr added the 9.needs: upstream fix This PR needs upstream to change something label Feb 2, 2019
@samueldr
Copy link
Member

samueldr commented Feb 2, 2019

Has anyone contacted upstream yet? Is there a report (maybe from another project) that is relevant to follow?

Fixing it in the nixos kernels is probably not enough; kernel 5.x from other distros may still have issues with Nixpkgs stuff.

@L-as
Copy link
Member

L-as commented Feb 2, 2019

@dtzWill: it doesn't matter that it was a bug, kernel policy is nothing breaks userspace. @samueldr I'll try contacting upstream

@L-as
Copy link
Member

L-as commented Feb 2, 2019

Well I made a bug report on bugzilla, hopefully I CC'ed the right guy.

@aszlig
Copy link
Member

aszlig commented Feb 2, 2019

I've bisected this using this VM test and the commit that causes the regression is torvalds/linux@8099b04.

@aszlig
Copy link
Member

aszlig commented Feb 2, 2019

I also think this could break more than just our use case, especially because Perl seems to expect that the hashbang is truncated by BINPRM_BUF_SIZE. It parses the hashbang and execve()s itself, so that multiple arguments are possible and the length limitation doesn't affect it, so I guess it's expected behavior even on other Unices.

So even if our bug report doesn't get noticed, I'd suspect that this change will get reverted eventually, hopefully before 5.0 gets its final release.

@dtzWill
Copy link
Member

dtzWill commented Feb 13, 2019 via email

@samueldr
Copy link
Member

This might be affecting the current LTS (4.19.21) in addition to 4.20.8 (as reported by @dtzWill) and 5.0-rc*.

Assuming the commit tracked down by @aszlig is the right one:

Found also in

@grahamc
Copy link
Member

grahamc commented Feb 13, 2019

Based on the documentation at https://www.kernel.org/doc/linux/REPORTING-BUGS and some extra courage from @mmlb, we've decided @samueldr will write a message to LKML and Linus. I'll send Oleg (author of that patch) an email now, though.

@samueldr
Copy link
Member

Mailing list thread:

@NeQuissimus
Copy link
Member

@dtzWill Could we have a PR with the patch? Even if only as a temporary mediation?

@L-as
Copy link
Member

L-as commented Feb 13, 2019

@samueldr I think you may want to CC Andrew Morton [email protected], since he signed off on the commit. I'd like to CC him myself, but then I'd have to figure out how to send e-mails to the mailing list first.

@L-as
Copy link
Member

L-as commented Feb 13, 2019

Actually, one can just send him an e-mail regarding the matter now that I think about it.
Edit: Well I sent him an e-mail.

dtzWill added a commit to dtzWill/nixpkgs that referenced this issue Feb 13, 2019
Doesn't fix the problem but fixes my/our machines for now.

I'm running 4.20.8 with this presently,
other kernels "should" patch okay but I haven't checked.

And I wasn't sure if hardening wanted this or not? Dunno.

NixOS#53672 (comment)
@grahamc
Copy link
Member

grahamc commented Feb 14, 2019 via email

@NeQuissimus
Copy link
Member

NeQuissimus commented Feb 14, 2019

Only the small channels have seen a bump (just looking at http://howoldis.herokuapp.com/)

@grahamc
Copy link
Member

grahamc commented Feb 14, 2019

The nixos-unstable-small and nixos-18.09-small channel URLs are being rolled back.

@NeQuissimus
Copy link
Member

I can now confirm that #55763 fixes the immediate issue. Built 4.20.8-hardened and booted into it. nixos-rebuild switch --upgrade -I nixpkgs=... still runs fine.

@NeQuissimus
Copy link
Member

Does anybody feel strongly about which fix we apply as long as we have something for the moment?

@grahamc
Copy link
Member

grahamc commented Feb 14, 2019

I think we should do this multi-pronged:

  1. fix our perl shebangs to not be massive: not everybody runs nix on nixos
  2. apply what looks like the most likely upstream patch

@L-as
Copy link
Member

L-as commented Feb 14, 2019

I think the safest choice should be taken, although a proper fix, i.e. this patch is preferable in the end.

@L-as
Copy link
Member

L-as commented Feb 14, 2019

good point @grahamc, but how would you make the shebangs smaller?

@grahamc
Copy link
Member

grahamc commented Feb 14, 2019

Hydra already does this with buildEnv. Nice perk of this is we get a much faster perl startup time: https://github.com/NixOS/hydra/blob/master/release.nix#L47

@NeQuissimus
Copy link
Member

Somebody make an executive decision or I merge #55763 :D
I don't think we should just sit on this for too long...

@grahamc
Copy link
Member

grahamc commented Feb 14, 2019

I'd like for at least one other person to take a look at #55763 -- kernel upstream or otherwise, but it is my preferred patch.

@grahamc
Copy link
Member

grahamc commented Feb 15, 2019

Should we leave this one open and pinned for a bit, until channels advance?

@grahamc
Copy link
Member

grahamc commented Feb 15, 2019

FYI: Our revert patch can be deleted, as upstream has released new, fixed kernels:

:)

@NeQuissimus
Copy link
Member

I am on it!

@samueldr
Copy link
Member

samueldr commented Feb 15, 2019

Should we leave this one open and pinned for a bit, until channels advance?

I didn't know for sure, and when I saw the title of the issue, once pinned, I got cold feet in re-writing the title and editing the main post to make it more useful for external users. Though, I'm thinking it could be a good idea if done. Though it doesn't necessarily needs to be kept open, only visible?

@NeQuissimus
Copy link
Member

master: 50f518c 7954ec0 8c14948 8c14948

release-18.09: f0ce0f3 6e90746 6da8b83 0ba800c

@samueldr samueldr removed the 9.needs: upstream fix This PR needs upstream to change something label Feb 15, 2019
@vcunat
Copy link
Member

vcunat commented Feb 23, 2019

Let me link the upstream summary of this: https://lwn.net/SubscriberLink/779997/11de2bdc8dbc0d69/ (taken from weekly.nixos.org)

@domenkozar
Copy link
Member

I've seen this also with NixOS 20.09 and kernel 5.4.77.

@samueldr
Copy link
Member

samueldr commented Jan 4, 2021

Many issues could end up causing similar behaviour. It might not be that the kernel again decided to stop interpreting the shebang in its entirety.

I guess you'll need to have a minimal repro, so we can see what's going on. Shouldn't it fail on Hydra too, if it is a reproducible issue? I don't remember if the issue eluded our tests at the time. Though it's likely to be a new issue, rather than linked to a now year old issue. Or else we'll have to have strong words with the kernel, again. Though, check with the most up to date kernel, I'm running 5.4 past .77 and I haven't had this issue.

@domenkozar
Copy link
Member

Repro is to deploy a new Hetzner machine with NixOS 20.09.

I have only helped a friend get things going, so I didn't spend much time investigating what's going on this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: regression Something that worked before working no longer 6.topic: kernel The Linux kernel
Projects
None yet
Development

No branches or pull requests