-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add another way of passing non-copyable things as parameters #733
Comments
Makes a lot of sense. The original syntax takes a problem that should be dealt with by the optimiser (namely that data should be passed by reference for performance reasons) and places it in the hands of the developer, someone that is - in most cases - woefully unequipped to handle such optimisation decisions. The new syntax leaves it up to the optimiser what it thinks it should or shouldn't pass by reference. At the end of the day, the optimizer knows best. There's no loss of semantic expressiveness either since a |
This is a reversal of #103 The example from there is: var global_array: [1024]i32 = undefined;
fn dont_modify_param(param: [1024]i32) i32 {
const y = param[0];
global_array[0] = 0;
const x = param[0];
return x + y;
}
test "uh oh" {
dont_modify_param(global_array);
} how we fix this is, every write through a global pointer will have the safety check. So on the line if you wanted to be able to do this, you'd have to make var global_array: [1024]i32 = undefined;
fn dont_modify_param(param: &const [1024]i32) i32 {
const y = (*param)[0];
global_array[0] = 0;
const x = (*param)[0];
return x + y;
}
test "uh oh" {
dont_modify_param(&global_array);
} Now this will trigger a similar safety check, panic "write invalidates alias rules". By default, modifying global variables is not allowed to modify any data directly pointed to by parameters. To fix it: fn dont_modify_param(alias param: &const [1024]i32) i32 { Now param can alias anything, including global variables and other parameters. So writing through a global variable, or another parameter could potentially change the bytes param points to. This is all for optimization. Gotta go fast. This is better done after some pointer reform changes I plan to make. I'm about to open that issue. |
Once this is done, I'd like to change fn create(self: &Allocator, init: var) !&@typeOf(init) {
const T = @typeOf(init);
if (@sizeOf(T) == 0) return &{};
const slice = try self.alloc(T, 1);
const ptr = &slice[0];
*ptr = init;
return ptr;
} So you would use it like this: const defer_node = try arena.create(ast.Node.Defer {
.base = ast.Node {
.id = ast.Node.Id.Defer,
.comments = comments,
},
.defer_token = token,
.kind = switch (token.id) {
Token.Id.Keyword_defer => ast.Node.Defer.Kind.Unconditional,
Token.Id.Keyword_errdefer => ast.Node.Defer.Kind.Error,
else => unreachable,
},
.expr = undefined,
});
If you wanted to leave the value undefined: const ptr = try allocator.create(Foo(undefined)); This more closely matches variable declaration, where you explicitly have to use |
These aliasing safety checks are tricky. Here's an example of code that runtime safety should catch: const Point = struct {
x: i32,
y: i32,
};
const Indirect = struct {
p: *Point,
};
test "alias" {
var pt = Point {
.x = 1,
.y = 2,
};
var indirect = Indirect {
.p = &pt,
};
foo(&pt.x, &indirect);
}
fn foo(a: *i32, b: *Indirect) void {
const v1 = a.*;
b.p.x += 1;
const v2 = a.*;
// aliasing rules assert that v1 == v2
// however v1 == 1 and v2 == 2
} The expected crash message would be: I think this example means that we would have to put the runtime safety checks before every store instruction. So the performance penalty for this safety in every function is O(number_of_pointer_params * number_of_store_instructions). Might be costly, but worth a try. |
Would |
Yes, and add |
Here's an adversarial example against the current proposal: var global_array: [1024]i32 = undefined;
fn dont_modify_param(param: [1024]i32) i32 {
const y = param[0];
innocent_function();
const x = param[0];
// we would expect x == y
return x + y;
}
fn innocent_function() void {
global_array[0] = 2;
}
test "uh oh" {
global_array[0] = 1;
dont_modify_param(global_array);
} here's the proposed debug safety: var global_array: [1024]i32 = undefined;
var __zig_safety_global_array: usize = 0; // generated safety global
fn dont_modify_param(param: [1024]i32) i32 {
const y = param[0];
innocent_function();
const x = param[0];
// we would expect x == y
return x + y;
}
fn innocent_function() void {
{
// generated safety code for modifying global variables
const prev = @atomicRmw(usize, &__zig_safety_global_array, builtin.AtomicRmwOp.Xchg, @maxValue(usize));
if (prev != 0)
@panic("data race on global variable 'global_array'");
}
global_array[0] = 2;
{
// generated safety code
const prev = @atomicRmw(usize, &__zig_safety_global_array, builtin.AtomicRmwOp.Xchg, 0);
if (prev != @maxValue(usize))
@panic("data race on global variable 'global_array'");
}
}
test "uh oh" {
global_array[0] = 1;
{
// generated safety code, because we passed a mutable global variable to a
// parameter value that isn't supposed to be mutated
const prev = @atomicRmw(usize, &__zig_safety_global_array, builtin.AtomicRmwOp.Add, 1);
if (prev == @maxValue(usize))
@panic("data race on global variable 'global_array'");
}
dont_modify_param(global_array);
{
// generated safety code
const prev = @atomicRmw(isize, &__zig_safety_global_array, builtin.AtomicRmwOp.Add, -1);
if (prev == @maxValue(usize))
@panic("data race on global variable 'global_array'");
}
} This is neat because in addition to solving the above example, it also detects data races of globals. OK but then here's an adversarial example against this: var global_array: [1024]i32 = undefined;
fn dont_modify_param(param: [1024]i32) i32 {
const y = param[0];
innocent_function();
const x = param[0];
// we would expect x == y
return x + y;
}
fn innocent_function() void {
modify(&global_array[0]);
}
fn modify(p: *i32) void {
p.* = 2;
}
test "uh oh" {
global_array[0] = 1;
dont_modify_param(global_array);
} So, debug safety is yet to be solved for this. And there are more adversarial examples to come up with, for example ones that use pointer references in structs rather than global variables. |
I created #1108 for the rather ambitious "noalias on everything" ideas here. What's left in this issue is the ability to "pass by non-copying value", which is when the parameter type is an aggregate type such as a struct or array. In this case, the semantics are:
For async functions the semantics are the same. Async functions have to copy information into their coroutine frame, so a "pass by non-copying value" usually does a copy, making the data available for the lifetime of the async function. However, if Zig can prove that the lifetime of the coroutine frame is eclipsed by the lifetime of the calling function or async function, then Zig my pass a reference. And with that, there is no safety to add to Zig; instead there are new semantics for Zig programmers to be aware of. However these semantics are designed to be what C programmers already expect from most of the code they already write. So this issue is reasonably easy to solve now. |
This sounds a bit confusing to me. |
@monouser7dig |
@alexnask that makes sense, thanks |
Just want to clarify my thoughts and understanding. Please correct me if I'm overlooking something.
This is equivalent to C, except for the fact that On when to use An example in std where we would want copy semantics would be in the current An example where we don't want this is in the Maybe a loose rule like the following is useful in the common case? Don't mix |
How I think about values and references it would make sense if it would actually behave just like a value in c, you can modify it but this modification is only local. But detecting IF the value is modified in the function was one of the problem to begin with right? so maybe this is not possible. |
Added Your understanding is correct.
There's one more thing I want to do with this, and that's the well defined copy elision (#287) but in this case probably solved with named return value. It would look something like this:
At the callsite if you did this: Idea here being that zig never does
I'm thinking that the process would look like: prefer Regarding var a = Complex(f32).new(5, 3); // fn new() directly initializes a
var b = Complex(f32).new(2, 7); // fn new() directly initializes b
var c = a.add(b); // fn add() directly initializes c
b = b.add(c); // no copies! add() directly reads and writes b
b = b.add(c).add(c); // hidden stack variable for the intermediate expression, but the final add() writes directly to b As for big int, can you elaborate more on what you mean? Currently in the big int code, mutable pointers are used for the return values, but it is my vision to achieve the above semantics with return value copy elision. So there would not really be any
The result of this being that we could have similar code to the above var a = try Int.initSet(al, -3);
defer a.deinit();
var b = try Int.initSet(al, -5);
defer b.deinit();
var q, var r = try Int.divFloor(a, b);
defer q.deinit();
defer r.deinit(); Here you would want to avoid intermediate expressions since they would produce resource leaks. Anyway this depends on 2 features we don't have yet. So what Big Int would really look like is just the What would be the goal of the loose rule you proposed? |
One more note: @Hejsil points out in #670 that the problem with automatically passing structs as |
In std/mem.zig on line 42, there is a comment referencing this issue:
@andrewrk, has this been solved? what is the next step here? |
Removing the copy here actually doesn't achieve anything I think.
It might actually be better to make a copy and treat the assigned This really only is an issue for large structs which you probably shouldn't be returning from a function if you don't want them to be copied anyway imho. It's also not like it can copy large chunks of memory easily like in C++ where you have hidden copy operations with constructors and assignment operators. And when the variable to assign to is in some struct (or a global) it'll (at least in the case of complex) have to read all the values from memory, do some operations and write them back anyway, so it doesn't matter a whole lot if it makes a 'copy'. Oh and maybe it matters when you pass structs by reference (as an optimization), but then you shouldn't be modifying the input when creating the result. So all in all I fail to see the use of copy elision in Zig to be honest. |
Right now you have to pass structs and other non-copyable things as a pointer. As a consolation prize, we allow implicitly casting
T
to&const T
. This causes problems in a number of ways. One example is with generics, where it seems like you could take aT
parameter but really you would want a&const T
, and then if you try to look at the type, it's a pointer. Or if you use avar
parameter, zig has to automatically pass a&const T
instead which is counter-intuitive.This proposal is to allow a function like this:
For lack of a better name, I'm going to call this "passing arguments by const reference".
To the callee,
foo
looks like a value, the same as if you didconst foo = Foo {.x = 1, .y = 2};
on the first line of the body of the function. However, it is not a by-value parameter, because the caller does not necessarily make a copy. Zig would be free to pass the parameter by value, perhaps if it is smaller than some number of bytes, or pass it by reference. The caller guarantees that the bytes offoo
will not change for the lifetime of the function.This allows Zig to use the "noalias" optimization on the const reference pointer.
Zig could figure out that this should be a compile error:
Zig knows that arg 1 will be passed as a pointer under the hood, and it knows that arg 2 is the same pointer. So
another.x += 1
violates the noalias rules.However this could be obfuscated enough that zig could not figure out this causes undefined behavior, so runtime safety is in order.
What this looks like is, in a function with args passed this way - we note the pointer ranges of the const references and noalias arguments. That is, the pointer of the argument + sizeOf(the_struct_or_arg).
When a mutable pointer is indexed, we look at the address of the indexed element, and check if it's in the range of any of the noalias ranges. If it is, that's a runtime safety panic, because there shouldn't be a mutable pointer to that address.
With this proposal, I think we should remove the implicit cast of
T
to&const T
. Almost every place that currently uses&const T
should be changed to use this new arg passing convention.Related: #670
The text was updated successfully, but these errors were encountered: