Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use capi for syscalls that break under musl's handling of 64-bit time_t #252

Merged
merged 3 commits into from
Dec 5, 2022

Conversation

redneb
Copy link
Contributor

@redneb redneb commented Oct 2, 2022

I maintain a repo with builds of GHC for the musl C standard library and I encountered a subtle bug of this library that affects GHC under 32-bit architectures with musl.

In 32-bit architectures, there was a transition where the C type time_t was redefined to be 64-bit in order to address the Y2038 problem. The way this was handled by musl was by introducing new versions of all affected syscalls that work with 64-bit time_t (e.g. utimensat_time64 is like utimensat but with 64-bit time_t). In addition, a redirect was introduced to create an alias of the new versions of these functions under the old name (e.g. here's the redirection for utimensat).

The problem is that this redirection is defined as a C macro in a C header file and as such, it does not get picked by GHC when the foreign function import is done via ccall, but works just fine when capi is used. So this PR changes all affected syscalls to be imported with capi, which should be a fairly uncontroversial change.

@Bodigrim
Copy link
Contributor

Bodigrim commented Oct 3, 2022

@TerrorJack could you please advise on wasm32 job? I remember there were some changes wrt dlfcn.h.

@redneb
Copy link
Contributor Author

redneb commented Oct 10, 2022

It appears that in wasm32-wasi there is no dlfcn.h, but the following line does compile successfully:

foreign import ccall unsafe "dlopen" c_dlopen :: CString -> CInt -> IO (Ptr ())

which is what we currently use in this package. So I think the solution here is to add an #ifdef in order to use ccall for wasm32-wasi (since ccall doesn't require a header file to be provided) and use capi elsewhere.

I pushed a commit that does that and the CI tests are passing everywhere now.

Copy link

@vdukhovni vdukhovni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo perhaps the making dlopen be safe???

@@ -84,10 +85,17 @@ data RTLDFlags
| RTLD_LOCAL
deriving (Show, Read)

#if defined(HAVE_DLFCN_H)
foreign import capi unsafe "dlfcn.h dlopen" c_dlopen :: CString -> CInt -> IO (Ptr ())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dlopen call can be comparatively expensive. One can reasonably argue that this should be safe without a significant performance penalty. Once the object is loaded, the other calls should be quick.

Of course the same would then apply to the ccall version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that dlopen should be safe and I would be happy to push another commit to do that. But I think this is part of a wider problem with this package (see #34) and I don't think we should tackle the problem in a piecemeal fashion. So I would prefer to not do this as part of this PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that dlopen should be safe and I would be happy to push another commit to do that. But I think this is part of a wider problem with this package (see #34) and I don't think we should tackle the problem in a piecemeal fashion. So I would prefer to not do this as part of this PR.

I can accept that I guess, though I tend to take a more opportunistic approach. Big bang fixes can be difficult, but spot improvements while looking at adjacent code are sometimes a good way to make progress. I'll leave the judgement call to you and others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "this is bad elsewhere too" isn't a good argument against fixing.

Rather fix it locally here and open a new ticket to fix the rest as well, unless this is controversial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdukhovni @hasufell I pushed added an additional commit that switches the imports of dlopen & dlclose to safe. The reason that dlopen needs safe is obvious; it interacts with the local file system. But dlclose also has to be safe because it might trigger the execution of destructors defined in the library.

@hasufell
Copy link
Member

How do we know this is the only issue with header redirection? This frequently happens on darwin system too. I'd be fine with making all foreign imports use capi. Any opinions, @vdukhovni ?

@vdukhovni
Copy link

How do we know this is the only issue with header redirection? This frequently happens on darwin system too. I'd be fine with making all foreign imports use capi. Any opinions, @vdukhovni ?

My take is that whenever possible we should use capi.

@redneb
Copy link
Contributor Author

redneb commented Nov 13, 2022

Here's how I came up with the changes included in the PR: I found the commit in the musl repo where the redirect was introduced and examined all changes there. I also searched for the _REDIR_TIME64 CPP symbol in all C headers. I compared all that against all of GHC's source tree (including the libraries directory) and found all the affected places. There were all in 3 libraries: directory, time, and here (the other PRs have been merged already).

I'd be fine with making all foreign imports use capi. Any opinions, @vdukhovni ?

Agreed, capi should be the default policy for this repo.

@hasufell
Copy link
Member

Agreed, capi should be the default policy for this repo.

So then we can just remove the CPP ifdef and keep capi?

@redneb
Copy link
Contributor Author

redneb commented Nov 13, 2022

So then we can just remove the CPP ifdef and keep capi?

Unfortunately not. As I explained in a previous post here, it's need for wasm32-wasi.

@liskin
Copy link
Contributor

liskin commented Apr 27, 2024

I'm wondering whether (and if so, why?) this fix is enough—I believe that the instance Storable CTimeVal implementation will still peek/poke both members (tv_sec and tv_usec) as 32-bit values, as they're both declared as CLong. This might silently work if the memory happens to be zeroed (and xmonad/X11#100 (comment) suggests it often is in practice) but one thing is certain—CLong is not Y2038-safe on 32-bit platforms.

Can someone please check if my thinking is correct? Thanks!

@liskin
Copy link
Contributor

liskin commented Apr 28, 2024

Can someone please check if my thinking is correct? Thanks!

I asked the Debian folks as well and it appears that CTimeVal isn't really being used in practice (detailed explanation). Might not apply to other platforms but Linux/Debian should be all right then.

@hasufell
Copy link
Member

@liskin

From the debian discussion

My understanding is that we are good with this patch in Debian, but this should definitely be fixed in the unix library if they want to support other systems / OSes.

Can you submit a PR?

@liskin
Copy link
Contributor

liskin commented Apr 30, 2024

From the debian discussion

My understanding is that we are good with this patch in Debian, but this should definitely be fixed in the unix library if they want to support other systems / OSes.

Can you submit a PR?

Suppose I can try but right now I have absolutely no idea what platform to use which would be affected. Turns out Debian i386 isn't adopting 64-bit time_t, only armhf, armel and a few others. And even there, the code in question isn't being used (as we established earlier). Suppose I could just fix CTimeVal in isolation and test in a Debian armhf VM… but it's not exactly the most important thing I could be doing today :-D

@hasufell
Copy link
Member

@liskin
Copy link
Contributor

liskin commented Apr 30, 2024

We have armv7 CI: https://github.com/haskell/unix/actions/runs/8825026384/job/24228587469

Yeah, but it'll take months/years before Debian's t64 stuff trickles down to Ubuntu, and even then CTimeVal won't be used at all because of some #ifdefs. So that CI won't help me, and also that CI won't even start failing in a couple months time (possibly never) because the issue doesn't affect glibc/musl at all because utimensat is being used which takes CTimeSpec instead.

@hasufell
Copy link
Member

Even in absence of a reproducer, I am assuming there is a theoretical fix. What is it? Can you provide a PR for it?

liskin added a commit to liskin/unix that referenced this pull request Apr 30, 2024
One such platform is Debian unstable armhf, which is in the process of
transitioning to 64-bit time_t: https://wiki.debian.org/ReleaseGoals/64bit-time

On that platform (as well as any other glibc/musl platform), however,
CTimeVal isn't used for anything at all because there are #ifdefs that
prefer using `utimensat` which takes CTimeSpec instead. This fix is
therefore quite theoretical, as it is unknown whether there are any
platforms actually affected.

Related: haskell#252
@liskin
Copy link
Contributor

liskin commented Apr 30, 2024

Even in absence of a reproducer, I am assuming there is a theoretical fix. What is it? Can you provide a PR for it?

Yes: #318
I mean, I don't really know what I'm doing (as in, not as much as I'd like to) but the CI won't run in my fork (arm needs custom runners and MacOS uses a different major version ans fails) so PR it is.

liskin added a commit to liskin/unix that referenced this pull request May 2, 2024
One such platform is Debian unstable armhf, which is in the process of
transitioning to 64-bit time_t: https://wiki.debian.org/ReleaseGoals/64bit-time

On that platform (as well as any other glibc/musl platform), however,
CTimeVal isn't used for anything at all because there are #ifdefs that
prefer using `utimensat` which takes CTimeSpec instead. This fix is
therefore quite theoretical, as it is unknown whether there are any
platforms actually affected.

Related: haskell#252
Bodigrim pushed a commit that referenced this pull request Jun 17, 2024
One such platform is Debian unstable armhf, which is in the process of
transitioning to 64-bit time_t: https://wiki.debian.org/ReleaseGoals/64bit-time

On that platform (as well as any other glibc/musl platform), however,
CTimeVal isn't used for anything at all because there are #ifdefs that
prefer using `utimensat` which takes CTimeSpec instead. This fix is
therefore quite theoretical, as it is unknown whether there are any
platforms actually affected.

Related: #252
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants