-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current __gc logic for userdata is unsound #19
Comments
There's no need to use extern crate rlua;
use rlua::*;
struct Userdata {
id: u8,
}
impl Drop for Userdata {
fn drop(&mut self) {
println!("dropping {}", self.id);
}
}
impl LuaUserDataType for Userdata {
fn add_methods(methods: &mut LuaUserDataMethods<Self>) {
methods.add_method("access", |lua, this, args| {
println!("accessing userdata {}", this.id);
lua.pack(())
});
}
}
fn main() {
let lua = Lua::new();
{
let globals = lua.globals();
globals.set("userdata", Userdata { id: 123 }).unwrap();
}
lua.eval::<()>(r#"
local tbl = setmetatable({
userdata = userdata
}, { __gc = function(self)
-- resurrect userdata
hatch = self.userdata
end })
print("userdata = ", userdata)
print("hatch = ", hatch)
print "collecting..."
tbl = nil
userdata = nil -- make table and userdata collectable
collectgarbage("collect")
print("userdata = ", userdata)
print("hatch = ", hatch)
hatch:access()
"#, None).unwrap();
} This demonstrates not a double-drop, but a use-after-free in
|
So, I apparently had a bad understanding of how __gc metamethods worked from lua circa 5.1, because I was unaware that somewhere between 5.1 and 5.3, tables got the ability to have __gc methods that could be assigned by scripts. Apparently, ANY userdata may be resurrected, and may have its __gc methods called twice, correct? I now need to.. I guess store all userdata in a form where it can be nulled, and panic on access if it is nulled, and allow it to be nulled more than once safely. Something like, I suppose storing Option, and simply setting it to None on gc instead of dropping it, then I can check to make sure it is not None before returning a pointer. I can do that, but what it means is it may be.. very hard or impossible to prevent a lua script from forcing a panic. I suppose practically that is not a HUGE deal, since preventing a lua script from causing a DOS attack, even with execution limits, is probably already extremely extremely difficult. |
I believe this may be fixed by 36134e6. I don't necessarily like the fact that some userdata now may have multiple levels of Option, and may be able to be collapsed somewhat. |
Actually no, there's something I didn't think of. Since Lua 5.3 allows __gc methods on lua tables, that means that any function that may trigger a gc metamethod may longjmp, which makes my entire longjmp story unsound. I was relying on only methods marked as 'e' in the reference manual causing longjmp, but this means that any method marked 'm' may ALSO cause longjmp, which is more or less everything. It's noted in the reference manual that 'm' methods may trigger a __gc metamethod, but I was relying on only userdata having __gc methods, and not being defined in scripts.
Okay, so probably just extremely hard not impossible. I think the sanest way to do it would be to, as your first step, write an entire C API that guarantees never to longjmp past the caller. |
I completely agree that it's not well documented. It's not well documented because it has until this point been in dramatic flux. When we finally (FINALLY) stop finding fundamental flaws in my view of the universe, I may rewrite it to be consistent and actually document it. This is not the first time that I have discovered that a lua bindings system I work on is fundamentally unsound, and it probably won't be the last. Does ANYBODY get this right, if it's even possible to GET right? I'm legitimately considering writing my own 5.3 compatible interpreter at this point, it would be faster than wrapping everything in the universe to make sure it doesn't longjmp. |
I did briefly consider a more low-level wrapper around Lua's C API, which would handle this and abstract away some Lua version differences, too. But I don't yet know how such a library would have to look to be useful, reasonably fast but still safe.
I was interested in this for other reasons (and found a way to implement a pretty slow GC in safe Rust), but it is unlikely that such an implementation wouldn't have some bugs, too. Maybe no safety issues, but bugs in the language itself, which isn't really pleasant to debug, either. This library is at least the third attempt at safe Lua bindings for Rust, at some point it has to work, even if just because of the infinite monkey theorem ;) |
Sometimes I think if I do end up writing something that works, it is probably only due to the infinite monkey theorem. |
A reasonably easy way to "fix" the remaining issue would be to replace |
That's not a bad idea! |
The documentation for
But, if you go look at the documentation for the debug library, it says, for debug.setmetatable:
So, I don't know if debug.setmetatable can be used to set the metatable on something other than a table, but if it can, that is also a soundness hole. |
I could wrap both setmetatable and debug.setmetatable, I could only wrap setmetatable, or I could wrap setmetatable and not load the debug library. I feel like other things in the debug library could be used to cause unsoundness as well.. somehow. I mean, you can do things like set hooks with things that cause 'error' or replace arbitrary upvalues. I know it's drastic but I'm leaning towards removing the debug library somehow. |
I think you can ignore the debug library, it can be used to crash any Lua program and manipulates VM state. Just disable it when running untrusted code. |
Okay, I took kind of a heavyhanded approach. Currently, with the patch for handling resurrected userdata, Lua CAN cause aborts by simply resurrecting a garbage collected userdata and then attempting to call a method on it. Because the cat's already out of the bag there, 9c34d4b simply wraps user provided __gc methods with an abort on error. Also, for now, I simply disabled the debug library, which I realized is quite heavy handed, but I want to address that when I address configurable startup environments. I will document all this in the README when 0.9 is released. |
Also, I believe (though am not certain) that there is one actual soundness issue left (I'm not counting the theoretical "all longjmps are unsafe" issue as a soundness issue). That issue is that, IF the system allocator returns null, I believe lua longjmps with an OOM error, and that causes the same issues as __gc obviously. I think.. the only time most system allocators return null is when provided an un-allocatable size of memory in the first place, and most modern environments of course just overcommit memory, but I'm sure not all platforms where this may build and run will be like that. I believe the solution is to provide a custom allocator to lua that aborts on OOM, but it would currently require nightly to use the rust allocator? I'm not sure on that actually. |
That's easy to fix: Use |
Yeah, I knew that part, but what do you use to allocate? Are the rust raw allocators available in stable, or.. I guess you can use Box to allocate a char array or something? |
Perhaps |
Right, so don't use the rust allocator, directly use malloc, but abort instead of returning nullptr. That would work, but I'd much rather use the rust allocator once it's available. I don't really mind a dependency on libc though, everything depends on libc, so I'll do that in the meantime. |
Okay, I believe 2553623 fixes the OOM unsoundness, and that is the last tangential issue that this raised. I think I'm going to release 0.9 now. |
Looks great! I'm really looking forward to the release, a lot of new stuff, lots of fixes, much more documentation. Maybe announce this release on the users forum or the subreddit? |
Yeah, me too. I reallllly could not have done this without your help though, thank you so much! |
No problem! I think it's about time Rust gets a safe and usable Lua wrapper. |
How about just unsetting the metatable in diff --git a/src/util.rs b/src/util.rs
index 41c76fc..dc68574 100644
--- a/src/util.rs
+++ b/src/util.rs
@@ -335,6 +335,8 @@ pub unsafe fn get_userdata<T>(state: *mut ffi::lua_State, index: c_int) -> *mut
pub unsafe extern "C" fn userdata_destructor<T>(state: *mut ffi::lua_State) -> c_int {
match catch_unwind(|| {
*(ffi::lua_touserdata(state, 1) as *mut Option<T>) = None;
+ ffi::lua_newtable(state);
+ ffi::lua_setmetatable(state, -2);
0
}) {
Ok(r) => r, |
For bonus points: Instead of unsetting the metatable, have a special metatable that marks objects as "this was already finalized" so that you end up with better error messages. (Yup, I was always too lazy to implement this) |
I haven’t looked into it in detail yet, but I’m gonna reopen this in light of the comments on #38. |
It might be sound again when 0.10 is released.. ..I hope. |
This should definitely be fixed now, in fact it was fixed with 0.10. |
@kyren Feel free to close, I've kept this open so #19 (comment) doesn't get lost, which seems like it could simplify rluas internals a bit, but I can open another issue for that. |
So, to summarize the suggested change, I think IF AND ONLY IF we wrap the calls to newtable and setmetatable in a If so, I'm happy to make that change. Though, eventually something will have to be done about relying on |
I actually now believe that this change solves a bug. First, a question.. If you have a Lua table with a finalizer, and a Lua userdata with a finalizer, and the Lua table's finalizer accesses the Lua userdata, is the Lua table finalizer guaranteed to run before the Lua userdata finalizer if both objects are garbage collected at the same time? I'm guessing that it's not. If that's the case, there are several panics in util.rs related to |
That sounds correct, I think.
Nope, don't think so. I think it currently depends on the order in which the objects are created, but Lua doesn't make any guarantees. |
Okay, hopefully 77eb73a correctly implements that change. |
NOPE, it's all wrong, I used mem::replace instead of ptr::write, give me a moment. Edit: okay, 6382baa hopefully is correct. |
I'm going to go ahead and close this, since both the original soundness issue was fixed and also the proposed simplification was implemented. |
The manual states:
This means that a destructor assigned to the
__gc
metamethod might be run multiple times on the same userdata. Insidefn destructor
, rlua currently just does this:If one figures out a way to resurrect the userdata object from withindrop
, this leads to a double free. However, I am not sure if this is possible, since all userdata must outlive'static
.If this is by design, it would be helpful to document that. In general,rlua
s internals are barely documented which makes it hard to understand and debug stuff like #18.EDIT: See comment below for use-after-free demonstration
The text was updated successfully, but these errors were encountered: