Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usecase: memory-bound design using fixed-size allocators #518

Closed
thejoshwolfe opened this issue Oct 2, 2017 · 6 comments
Closed

usecase: memory-bound design using fixed-size allocators #518

thejoshwolfe opened this issue Oct 2, 2017 · 6 comments
Labels
question No questions on the issue tracker, please.
Milestone

Comments

@thejoshwolfe
Copy link
Contributor

I'm currently trying to write a Zig program that does some file IO (the program doesn't compile currently for reasons unrelated to this discussion.). As an experiment, I'm trying to avoid mmap allocation, instead favoring allocators that wrap static/stack buffers. I'm running into some hassle with this approach that seems like a deficiency in the os API.

Below I'm discussing the hassle, but an important assumption that we may want to discuss here is: Is my approach here even supposed to be supported? Is the Zig standard library appropriate for running memory-bound applications? Does it even make sense to try to write memory-bound applications outside of an embedded OS, or something similar?

Just to clarify, I'm trying to write a program that only uses static memory (global variables) and the stack (local variables), and the worst-case stack size is a compile-time constant. There is no "heap" in my design. I use std.mem.Allocator objects, but they just return slices of the above two types of memory, static and stack memory.

Posix paths are 0-terminated

A frequent source of pain is the 0-terminated string expectation of the posix API. In order to delete a file, for example, you might have to copy the path to add a 0-byte to terminate the string. (This implementation of std.os.deleteFile actually always copies the path.) Sometimes, there's a magic path length called max_noalloc_path_len that determines whether the copy happens into a stack buffer, or otherwise the passed-in allocator is used to allocate a buffer long enough to put the path and a 0-terminator.

We need some way to avoid this allocation, especially when there's already a 0-terminator in the string, such as in my program, which uses the Buffer class.

std.os.Dir uses an allocator

It seems unnecessary for the Dir class to use an allocator. First, an allocator is used to open the dir, which is a case of the above problem about adding a 0-terminator. Then the same allocator is used to allocate a page-sized buffer to read the directory contents into. This way of using an allocator is hostile to a memory-bound design.

I worked around this mutli-use of an allocator by resetting the allocator (effectively equivalent to swapping out the .allocator field of the Dir) after calling open, before calling next for the first time. This is the kind of trick that is not allowed by the class's API without explicit permission, which means it's a hack.

The next issue is that the buffer size that the Dir object uses is not configurable. So if I only wanted to use 0x400 bytes for this buffer, I'd get a NoMem error for not supporting a buffer of size 0x1000. This buffer size is not documented in the API, so I had to read the source to find out how much memory I needed to give my fixed-size allocator before the Dir tried to use it. Then, I might still get a NoMem, but then it means that a directory entry didn't fit into the buffer, which is actually more of a NameTooLong error than a NoMem error.

All of this nonsense could be avoided by the Dir object accepting a fixed-size memory buffer to use, instead of using a dynamically growing buffer from an allocator. A fixed-size buffer would solve my usecase, but would be suboptimal for usecases that allow mmap-style allocators that expand as needed. We need the ability to expand on demand, but is it possible to have this feature without being hostile to a memory-bound usecase like I have?

Is Posix/Windows/etc even appropriate for memory-bound applications?

One big reason to write a memory-bound application is if you're trying to write deterministic embedded software. But if you're targeting Linux, Windows, etc., you can't get determinism, so why even try to write a memory-bound application? Perhaps there are parts of the Zig standard library that are appropriate for memory-bound design, and maybe the os API is not one of them.

I've been saying "memory-bound" as though I can make some compile-time guarantee about the maximum amount of memory my program will require. But maybe I actually don't have as much control as I think over what kind of system resources Linux is allocating on my behalf. Without any real measurements, I don't even know if the concept of "memory-bound" design has any meaning.

@thejoshwolfe thejoshwolfe added the question No questions on the issue tracker, please. label Oct 2, 2017
@PavelVozenilek
Copy link

Windows: depending on what you exactly want, you can make program quite deterministic:

  1. Memory can be reliably preallocated by VirtualAlloc( MEM_COMMIT | MEM_RESERVE ), even at specific location.
  2. Executable could be loaded at specific address.
  3. Third party allocators support using existing buffer as heap, and hard max limit. Doug Lea's malloc, for example.

@andrewrk
Copy link
Member

andrewrk commented Oct 2, 2017

Should we re-open #265?

@andrewrk
Copy link
Member

andrewrk commented Oct 2, 2017

The max_noalloc_path_len pattern is good I think and should be used more. Feel free to bend the standard library to your will (as long as it's an API improvement) and make a PR with your changes.

@andrewrk
Copy link
Member

andrewrk commented Oct 2, 2017

Is my approach here even supposed to be supported? Is the Zig standard library appropriate for running memory-bound applications? Does it even make sense to try to write memory-bound applications outside of an embedded OS, or something similar?

Yes to all this. If you don't call mmap on linux and you have a static binary, the OS does not allocate any memory on your behalf.

@kyle-github
Copy link

I made a bit of a comment on this topic in another issue, #157, that may relate.

Perhaps there could be a way (using comptime tests?) of having the standard library switch its internal performance vs. size tradeoff? Not only should you avoid allocation or use a constrained allocator, but you should also trigger things like using packed structs and using different algorithms.

For instance when getting DNS responses to a hostname, the standard C library builds a linked list in memory. Instead, the Zig standard library could keep the original UDP response and unpack it on demand when it was optimized for size.

@andrewrk andrewrk added this to the 0.2.0 milestone Oct 2, 2017
@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Oct 19, 2017
@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Feb 28, 2018
@andrewrk andrewrk closed this as completed Feb 5, 2019
@andrewrk
Copy link
Member

andrewrk commented Feb 5, 2019

I consider this a valid use case of Zig, and if you run into trouble trying to do it, that represents an API issue with the standard library or whatever third party library you're trying to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question No questions on the issue tracker, please.
Projects
None yet
Development

No branches or pull requests

4 participants