Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::net::TcpStream::connect() and .to_socket_addrs() segfault when address is "localhost:8080", build is static and /etc/hosts is empty on arch linux. #100711

Open
alkeryn opened this issue Aug 18, 2022 · 11 comments
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. O-linux-gnu Operating system: Linux with glibc (i.e. a bug that can't happen with musl)

Comments

@alkeryn
Copy link

alkeryn commented Aug 18, 2022

Both of those function will segfault when trying to resolve localhost on any port if the following condition are met :

  • the build is static (compiled with rustc -C target-feature=+crt-static main.rs
  • the address used is "localhost:"
  • the os is Arch Linux lattest (may work on other distros/OS)
  • the file /etc/hosts is empty

I tried this code:

use std::net::ToSocketAddrs;

pub fn main() {
    println!("before");
    let _ = "localhost:8080".to_socket_addrs(); // will segfault
    std::net::TcpStream::connect("localhost:8080").unwrap(); // will also segfault
    println!("hello world");
}

I expected to see this happen: the address is resolved

Instead, this happened: the program segfault

rustc --version --verbose:

rustc 1.65.0-nightly (9c20b2a8c 2022-08-17)
binary: rustc
commit-hash: 9c20b2a8cc7588decb6de25ac6a7912dcef24d65
commit-date: 2022-08-17
host: x86_64-unknown-linux-gnu
release: 1.65.0-nightly
LLVM version: 15.0.0

uname -a: (This is Arch-linux lattest, i could not reproduce the bug on another distro, but still, it shouldn't segfault)

Linux Alkeryn-PC 5.19.1-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 11 Aug 2022 16:06:13 +0000 x86_64 GNU/Linux

For the backtrace, RUST_BACKTRACE=1 did not work and gave the following output :

RUST_BACKTRACE=1 ./main
before
zsh: segmentation fault (core dumped)  RUST_BACKTRACE=1 ./main

so here is a backtrace made with gdb (don't mind the gef plugin being installed

Backtrace

[ Legend: Modified register | Code | Heap | Stack | String ]
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── registers ────
$rax   : 0x0               
$rbx   : 0x3               
$rcx   : 0x007ffff7cf838e  →  0x310b77fffff0003d ("="?)
$rdx   : 0x007ffff7ff8490  →  <_dl_static_dtv+16> add BYTE PTR [rax], al
$rsp   : 0x007fffffffcaf0  →  "/proc/sys/net/ipv6/conf/all/disable_ipv6"
$rbp   : 0x007fffffffcc20  →  0x007fffffffcce0  →  0x0000000000000010
$rsi   : 0x007ffff7d99dd5  →  0x6225206125000200
$rdi   : 0x007ffff79c1c88  →  0x0000000000000005
$rip   : 0x007ffff79a5196  →   mov r12, QWORD PTR [rax+0x8]
$r8    : 0x0               
$r9    : 0x0               
$r10   : 0x1000            
$r11   : 0x206             
$r12   : 0x0               
$r13   : 0x007fffffffcaf0  →  "/proc/sys/net/ipv6/conf/all/disable_ipv6"
$r14   : 0x007fffffffccf0  →  0x0000000000000000
$r15   : 0x007fffffffcca0  →  0xe1efbb33d283a048
$eflags: [ZERO carry PARITY adjust sign trap INTERRUPT direction overflow RESUME virtualx86 identification]
$cs: 0x33 $ss: 0x2b $ds: 0x00 $es: 0x00 $fs: 0x00 $gs: 0x00 
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0x007fffffffcaf0│+0x0000: "/proc/sys/net/ipv6/conf/all/disable_ipv6"	 ← $rsp, $r13
0x007fffffffcaf8│+0x0008: "s/net/ipv6/conf/all/disable_ipv6"
0x007fffffffcb00│+0x0010: "v6/conf/all/disable_ipv6"
0x007fffffffcb08│+0x0018: "all/disable_ipv6"
0x007fffffffcb10│+0x0020: "ble_ipv6"
0x007fffffffcb18│+0x0028: 0xffffffffffffff00
0x007fffffffcb20│+0x0030: 0x0000000000000000
0x007fffffffcb28│+0x0038: 0x007ffff79a4e33  →   lea rdx, [rax+0xb]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff79a5187                  je     0x7ffff79a51d0
   0x7ffff79a5189                  lea    rdi, [rip+0x1caf8]        # 0x7ffff79c1c88
   0x7ffff79a5190                  call   QWORD PTR [rip+0x1cc82]        # 0x7ffff79c1e18
 → 0x7ffff79a5196                  mov    r12, QWORD PTR [rax+0x8]
   0x7ffff79a519d                  mov    r13, rax
   0x7ffff79a51a0                  test   r12, r12
   0x7ffff79a51a3                  je     0x7ffff79a51e5
   0x7ffff79a51a5                  sub    r12, 0x1
   0x7ffff79a51a9                  mov    eax, 0x3ffffe
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "main", stopped 0x7ffff79a5196 in ?? (), reason: SIGSEGV
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x7ffff79a5196 → mov r12, QWORD PTR [rax+0x8]
[#1] 0x7ffff79ad6b1 → jmp 0x7ffff79ad518
[#2] 0x7ffff79a1045 → mov rbx, QWORD PTR [rsp]
[#3] 0x7ffff79aa1a6 → _nss_myhostname_gethostbyname4_r()
[#4] 0x7ffff7f248ae → getaddrinfo()
[#5] 0x7ffff7ed5cf6 → std::sys_common::net::{impl#6}::try_from()
[#6] 0x7ffff7ece64c → core::convert::{impl#6}::try_into<(&str, u16), std::sys_common::net::LookupHost>()
[#7] 0x7ffff7ece64c → std::sys_common::net::{impl#5}::try_from()
[#8] 0x7ffff7ece64c → core::convert::{impl#6}::try_into<&str, std::sys_common::net::LookupHost>()
[#9] 0x7ffff7ece64c → std::net::addr::{impl#30}::to_socket_addrs()
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  bt
#0  0x00007ffff79a5196 in ?? () from /usr/lib/libnss_myhostname.so.2
#1  0x00007ffff79ad6b1 in ?? () from /usr/lib/libnss_myhostname.so.2
#2  0x00007ffff79a1045 in ?? () from /usr/lib/libnss_myhostname.so.2
#3  0x00007ffff79aa1a6 in _nss_myhostname_gethostbyname4_r () from /usr/lib/libnss_myhostname.so.2
#4  0x00007ffff7f248ae in getaddrinfo ()
#5  0x00007ffff7ed5cf6 in std::sys_common::net::{impl#6}::try_from () at library/std/src/sys_common/net.rs:205
#6  0x00007ffff7ece64c in core::convert::{impl#6}::try_into<(&str, u16), std::sys_common::net::LookupHost> () at library/core/src/convert/mod.rs:590
#7  std::sys_common::net::{impl#5}::try_from () at library/std/src/sys_common/net.rs:190
#8  core::convert::{impl#6}::try_into<&str, std::sys_common::net::LookupHost> () at library/core/src/convert/mod.rs:590
#9  std::net::addr::{impl#30}::to_socket_addrs () at library/std/src/net/addr.rs:961
#10 0x00007ffff7eb91eb in main::main ()
#11 0x00007ffff7eb9ef3 in core::ops::function::FnOnce::call_once ()
#12 0x00007ffff7eb9159 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#13 0x00007ffff7eb8fc9 in std::rt::lang_start::{{closure}} ()
#14 0x00007ffff7ecb7bf in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/core/src/ops/function.rs:280
#15 std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:492
#16 std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:456
#17 std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:137
#18 std::rt::lang_start_internal::{closure#2} () at library/std/src/rt.rs:128
#19 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panicking.rs:492
#20 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> () at library/std/src/panicking.rs:456
#21 std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panic.rs:137
#22 std::rt::lang_start_internal () at library/std/src/rt.rs:128
#23 0x00007ffff7eb8fb1 in std::rt::lang_start ()
#24 0x00007ffff7eb9273 in main ()
gef➤  

@alkeryn alkeryn added the C-bug Category: This is a bug. label Aug 18, 2022
@Urgau
Copy link
Member

Urgau commented Aug 18, 2022

I'm unable to reproduce the issue. I tried stable, beta, nightly, with/without /etc/hosts, ...

I'm also seeing thanks to your gdb backtrace that the crash seems to be inside /usr/lib/libnss_myhostname.so.2 which is a systemd library. This doesn't indicate that the rust code isn't responsible for the crash but it pass trough the glibc getaddrinfo function which should have rejected invalid inputs, so.

What is your current systemd-libs and glibc package version installed ? Is your system completely updated ?

@alkeryn
Copy link
Author

alkeryn commented Aug 18, 2022

Hey, yea, i was unable to reproduce it on my debian based vps.

on the system it does occur,

systemd-libs version is 251.4-1
glibc version is 2.36-2

yes, i updated it yesterday.
isn't it odd that it tries to use that library knowing it is a static build however ?

@alkeryn
Copy link
Author

alkeryn commented Aug 18, 2022

@Urgau oh wait, i did put part of my original issue in a comment block, the condition were missing.
you need to compile with rustc -C target-feature=+crt-static main.rs
sorry, i missed that it was commented out.

@Urgau
Copy link
Member

Urgau commented Aug 18, 2022

Okay, thanks for the info.

isn't it odd that it tries to use that library knowing it is a static build however ?

Well, yes but mostly no. Generally a static build include mostly/every library it is dynamically linking to but sometimes some libraries aren't linked trough at linked time but figured out at run-time and here that's the case for the domain resolution, because there are many different ways it could be done and including all of them isn't possible.

@Urgau oh wait, i did put part of my original issue in a comment block, the condition were missing.
you need to compile with rustc -C target-feature=+crt-static main.rs
sorry, i missed that it was commented out.

Thanks I was about to ask.


I'm now able to reproduce the crash and I'm almost at 100% sure it's a glibc bug. Unfortunately glibc advise against static linking, so I'm not sure if reporting the crash to them will help.

I would however advise you to use musl a glibc replacement that is known to work with static linking and is supported natively by the Rust compiler. Just install the target rustup +nightly target install x86_64-unknown-linux-musl and build for the target rustc -C target-feature=+crt-static --target=x86_64-unknown-linux-musl main.rs

@alkeryn
Copy link
Author

alkeryn commented Aug 18, 2022

@Urgau thanks !
i do wonder why i can't reproduce it on a debian server, but not that important.

i see, still i wouldn't have expected to segfault a rust program without using unsafe, even though it segfault from glibc, couldn't rust handle it gracefully in one way or another ?

anyway, thanks for the tips !

@Urgau
Copy link
Member

Urgau commented Aug 18, 2022

i do wonder why i can't reproduce it on a debian server, but not that important.

I also tested on a debian-based system and couldn't reproduced the crash. The problem probably comes from the recent glibc upgrade done in archlinux. This may be a recent regression in glibc, but as I said glibc advise against static-linking so I don't know if they will do something about it.

i see, still i wouldn't have expected to segfault a rust program without using unsafe, even though it segfault from glibc, couldn't rust handle it gracefully in one way or another ?

The segfault is not in the rust code it's in the systemd lib probably because glibc passed some invalid values (speculation). There nothing the rust runtime can do in this situation, we don't have control over glibc, systemd`, or whatever else.

SIGSEVG means invalid memory access, this generally means that some piece of code wanted to access a place in memory that it doesn't have the permission to do so. This could leave some state in an invalid state, corrupting other state and maybe even more. The only sensible things to do in this situation is to abort.

@alkeryn
Copy link
Author

alkeryn commented Aug 19, 2022

Well thank you for all the details ! :)
should we close the issue or report it to glibc devs ?

@pymongo
Copy link
Contributor

pymongo commented Aug 22, 2022

Reproduce on manjaro Linux ww 5.10.136-1-MANJARO with glibc 2.36, same backtrace

@workingjubilee workingjubilee added A-linkage Area: linking into static, shared libraries and binaries A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. labels Mar 4, 2023
andreaslongo added a commit to andreaslongo/testnc that referenced this issue Aug 22, 2023
@saethlin
Copy link
Member

saethlin commented Aug 23, 2024

It's probably worth reporting this upstream.

The segfault here is a null pointer dereference on this line: https://github.com/systemd/systemd/blob/b45730389ba025489ec8d445bc91534fef515c28/src/basic/memory-util.c#L12

I suspect that the problem is that thread-locals aren't initialized. Whether that's caused by our unsupported linkage, or it's some other kind of bug in rustc or glibc/systemd is unclear. But I'm a C novice, so that's not saying much.

@saethlin saethlin added the O-linux Operating system: Linux label Aug 23, 2024
@Noratrieb Noratrieb added O-linux-gnu Operating system: Linux with glibc (i.e. a bug that can't happen with musl) and removed A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. labels Aug 23, 2024
@Noratrieb
Copy link
Member

Noratrieb commented Aug 23, 2024

This is exactly why you should not link glibc statically. Your glibc dlopened the systemd library which probably depends on glibc too and thus brought in a second glibc. That is guaranteed to cause issues.
image

You should either stop linking glibc statically or switch to a musl target, which supports static linking (and even does so by default today).
I don't think upstream glibc would treat this as a bug.

@Noratrieb
Copy link
Member

I think it would make sense to print a warning when trying to link glibc statically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries C-bug Category: This is a bug. O-linux-gnu Operating system: Linux with glibc (i.e. a bug that can't happen with musl)
Projects
None yet
Development

No branches or pull requests

6 participants