-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GGUF endianness cannot be determined from GGUF itself #3957
Comments
Well. The future is now 😄
I suggest storing the endianness as a single byte that can be read without ambiguity on both little- and big-endian hosts. Reading a 16-, 32- or 64-bit number is often confusing when you don't know the endianness of the file itself. It quickly leads to very confusing code where you must first guess the host byte order, whether you're reading the file using native byte order, little-endian, or big-endian byte order, and somehow combine those using pure magic and We could use a single int8 (The ASCII value It will be a breaking change, but failure because of hidden endian issues is also a breaking change.
The only argument I see for making the file magic endian dependable would be to make incompatible readers bail fast - but hey, if they ignore the version field, all bets are off anyway. |
Hmm, does anything actually need to change other than just storing the version field in the respective endianness? Until there's GGUF version 20,000 or something, every big-endian version is going to be invalid for little endian and every little endian version is going to be invalid for big endian. (Except 0 which is the same for both, but we're already past that.) edit: If you don't like the idea of just relying on the version being invalid if the file was created for the wrong endian, an alternative (also backward compatible except for existing BE files) is to say the version field is still 4 bytes but the value is a |
Hm - yes, you're both right, I hadn't noticed the version was written with the correct endianness. This is then not as much of an issue as it seemed - big-endian files that end up out there are unlikely to be loadable by existing executors if they're correctly version-checking. That being said, I'd still prefer to make it explicit instead of having it be an implicit property of the version number. I can't imagine it'll ever reach a version number in which there's a realistic possibility of confusion, but it's easier for readers if they can determine it directly without having to check if they support the endian-swapped version number.
Interesting and clever idea. This could work and wouldn't require a strictly-breaking change to the format. The immediate concerns I have are that:
Ultimately I don't think either of these really matter, it's just not quite as elegant as a more dedicated detection mechanism. I'd be fine with this if we wanted to maintain strict backwards compatibility. With that being said, a cursory search indicates that nobody's uploaded big-endian models to HF yet, so I'm pretty comfortable with breaking them specifically as long as LE's fine. I can think of two potential solutions here:
I'm leaning towards the right-aligned uint16 solution because it maintains LE compatibility, is still obviously a GGUF file, and should work for us until sometime in the next century. I could be convinced to use one of the three other solutions (magic number solution, using only one bit, or BE-backwards-compatible moving uint16), but this seems like the best set of tradeoffs. |
If anyone wants to test with non-native endian files, I added a conversion script in #3981 - you can just do something like:
Warning: This does not produce a new file, it modifies the input. It currently can only convert |
I don't mind leaving the problem of dealing with GGUF version 65536 to the people in the year 4,000. My solution was actually a little overcomplicated though. You can just read the field as normal then bitwise AND with |
For anyone interested, I have implemented v3 byte order detection - without needing to know the host byte order - using the last byte of the version field as a big-endian marker. It works as intended. It's a bit ugly, but it works. |
Until version 256. :) That still may never happen in our lifetimes, but still. I think |
Yeah I think I'd be fine with |
I don't think you will be. :) It's gotta be |
Yes, of course - my bad 😅 |
Thanks @ggerganov for pointing me to this nice convo, it should enable us to support visualization of big-endian gguf files in HF: |
The important snippet is: ```ts const [littleEndian, version] = (() => { /// ggml-org/llama.cpp#3957 /// Assume this code is always running on little-endian /// but wants to be able to parse both endianness const version = r.view.getUint32(4, true); if (version & 65535) { return [true, version]; } else { return [false, r.view.getUint32(4, false)]; } })(); ``` from ggml-org/llama.cpp#3957 and thanks to @ggerganov [comment](https://github.com/huggingface/huggingface.js/pull/540/files#r1521103912)
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Just noticed this was marked as stale - as far as I know, this is still an issue, the only indicator that a GGUF is big-endian is if its version is in big-endian, which is not a strong signal. Given that big-endian models are the minority by far, I'd continue to suggest reversing the magic number to make it explicit, and forcing a re-encode of the BE models. |
Re-encode "GGUF" -> "FUGG", is this the proposal? |
Yes, pretty much. I think that should be sufficient? |
Seems to work well enough in existing implementations, though (just my 2 cents) |
It works as a weak signal ("this file can't be loaded"), but you have to use a heuristic to determine if it's a big-endian file ("could I support this if the version was endian-swapped?"). A reader could determine if a file is big-endian and provide a more informative error in that case; another reader could also offer to swap the bytes around for the user. This could also be useful for online viewers, which could definitively tell you what type of model it is before you download it. In general, I think it's a good idea to make things like this as upfront as possible in the reading process so that a reader knows what it's working with without having to intuit it with a heuristic. |
As of the time of writing, the big-endian support that was added in #3552 doesn't encode the endianness within the file itself:
https://github.com/ggerganov/llama.cpp/blob/3d48f42efcd05381221654376e9f6f69d76af739/gguf-py/gguf/gguf.py#L689-L698
This means that there is no way to distinguish a big-endian GGUF file from a little-endian file, which may cause some degree of consternation in the future if these files get shared around 😅
The cleanest solution would be to add the endianness to the header - ideally, it would be in the metadata, but the reading of the metadata is dependent on the endianness - but that would be a breaking change.
Given that, my suggestion would be to use
FUGG
as the header for big-endian files so that a little-endian executor won't attempt to read it at all unless it knows how to deal with it. The same can go the other way, as well (a big-endian executor won't attempt to read a little-endian executor).The text was updated successfully, but these errors were encountered: