-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return &'static TimeZone
from jiff-tzdb
#284
Comments
I'm not sure I see how this would work. I also don't understand the relevance of |
Possibly relevant: one of the things I'm planning to do for the next release is to add a |
The way I understand this is that you gave
Yeah, |
Nothing of consequence can live in
I'm not going to do this because of my dependency philosophy. I have no interest (at this point, I may in the more distant future) in maintaining a semver boundary that exposes TZif parsing (and likely POSIX time zones and possibly other things depending on how the API is designed). I think the |
Oh I was under the impression that If the UNIX-tzdb parsing logic lives in
This is not for my specific use case, but for the default user. As long as
|
I don't know how to do it. The requirements are:
Jiff's crate structure to support this is an idiom (AFAIK, although I'm having trouble recalling other examples of this) for achieving this sort of target specific and optional behavior. If you're fine depending on Jiff, then
IMO, these costs are not generally significant. And if they are significant, I'm fine with telling users they should use the proc macro. The proc macro also potentially permits greater savings by constructing a If there was a way to avoid these costs easily without doing something else that I perceive as costly, then I'm open to it. But I don't really see a way (with maybe 90% confidence). The proc macro squares this circle by doing code generation in a context where If I were open to pushing the TZif parser (and everything it requires) out into a new crate (say, I have a history of splitting crates apart into semver boundaries, so I wouldn't be surprised if that eventually happens with Jiff. But I am definitely not going to do it before Jiff 1.0, and I'll avoid it altogether if I can. But once It's also important to acknowledge that some of difficulty in the design space here is a direct consequence of Jiff's API being a closed system, where with |
I'm trying very hard to understand why the crate split is necessary. |
The platform specific behavior is in I'll noodle on this to see if there is another way. |
I may be forgetting a requirement. |
I have been able to remove both master...robertbastian:jiff:tzdb The other user of |
I remember now why I split it out into a separate crate. I did it so that people downloading With that said, I'm not sure how much that's worth it. According to crates.io, As I said above, I generally don't consider the costs you've highlighted to be significant. So it's hard to choose between eliminating said costs versus the costs of extra bandwidth for every download of |
I don't think that's how it works? If I
My PR reduced the blob size to 200kB, the crates.io size cannot be correct. That said, the data is being downloaded anyway, the tradeoff is actually between the tzif blob and what I'm proposing, because as preparsed Rust code, this will be bigger.
Inlining the crates also means reduced API surface to stabilise, then these optimisations can be punted down the road. |
A crate being in the lock file doesn't imply it is being downloaded AFAIK. Certainly worth double checking. The crates.io size is likely compressed. Depending on the compression, the deduplication may not help it much. |
|
And even if |
OK, let's try it out. Starting with a clean project and no deps:
Now let's do a
I don't see anything being downloaded, although it's not obvious to me that Cargo would even tell me if something were downloaded? Maybe we can do a little better by first clearing Cargo's
OK, reverting my project back to a clean state (no dependencies, no
Now something about
OK cool, makes sense. And now let's try a
I don't see any And now if I forcefully enable
So I think this is enough to convince me, absent other evidence, that putting the data into a separate crate does indeed avoid downloading it at all on Unix systems. |
Ok, so this reduces jiff's one-time crates.io download size by 11%. It's an odd thing to optimise if you ask me, you could split out more code into other crates (and pay the semver price) if that is really an issue. |
I feel I was appropriately circumspect in my comment above:
I don't see anything immediately actionable here on my end, so I'm going to close this. Happy to revisit if new data arises that influences the trade-offs here. One other thing that isn't captured in this issue but I meant to mention is that |
Isn't the |
I don't see how to get rid of
If we cut out This idiom was suggested as far back as 2016.
I think so. There is a slight unknown though in supporting two different types of I guess I'll re-open this for that issue, but honestly, it's not going to be a priority of mine and I'm not completely convinced it's even worth doing. I've said a few times now that I don't think the costs you've highlighted are significant, and the existence of the proc macro should act as a feasible alternative for cases where those costs are significant. So I'm not sure I really want to go through the effort of making |
[target.'cfg(not(any(windows, target_family = "wasm")))'.features]
tzdb-bundle-platform = []
[target.'cfg(any(windows, target_family = "wasm"))'.features]
tzdb-bundle-platform = ["dep:jiff-tzdb", "alloc"] |
Where did you get that from? I don't see that documented anywhere, and if I try it, I get warnings from Cargo saying that it is unused:
Please read through rust-lang/cargo#1197 |
Oh I misread that post. Anyway, the |
Yes, indeed, I know...
The no-alloc aspect of it is compelling, I'll grant you that. I will likely experiment with this after I get the proc macro released, which will be another way to create a fully no-alloc tzdb. The advantage of the proc macro approach is that you can choose to include a subset of all time zones into your binary, where as with To summarize, the remaining concerns I have here are:
I think the only way to figure out whether those concerns are blocking is to try it out. |
One thing I just realised is that the bundled database cache requires
I'm not convinced that that is something clients really need. You either handle time zones yourself, in which case you don't need a TZDB (i.e. the current macros are fine), or you need to handle TZDB IDs, in which case it's risky to not understand all of them. I think a more useful size-reduction strategy would be to drop old time zone transitions. Many programs for example don't handle the past at all, and even those that do probably don't need Unlike the zone filtering, this is something you could extend to |
Running Looking at the files, even most of that is TZIF headers overhead, so preparsing would cut that down again.
|
You mean this as a downside of the status quo? Yes, I agree. In general I am more than fine for users without I'm not saying this to say that we should therefore not care about this, but just as an expression of my general design philosophy. I also perceive the proc macro as a release valve of sorts here.
Risky yes. But it does increase flexibility. Being able to use a It may be prudent to hold off adding this macro for now, but it's a very small amount of work and unlocks new workflows.
I'm hesitant to do this in I am okay with making such size reductions opt-in, but that is hard to do with This issue has used up a lot of my time so far. I have some action items, as discussed above, to experiment with. When I do that, I'll report back with my results. |
Yes
They shouldn't be:
This is not a problem specific to this proposal, different
I don't think it's fair to evaluate such proposals on the grounds of crates.io download sizes. It doesn't "still help binary size" despite the download size, it helps binary size, significantly, and if some dev has to download 10kB (1.3% of
I agree that the proc macro is the better place for this, as that can offer an opt-in, fully customisable TZDB. However, I think that for the sake of usability, these options should also be exposed in easier ways, and for the sake of maintainability,
Yeah, same. But I think it's worth the time, because I think |
I'm aware. The problem arises when one application uses different tzdb data than another. "When I use
I don't really have the energy to engage with you on this. Bandwidth matters for lots of reasons and I don't agree with your framing. I am not making a specific proposal as to how much weight to attach to it. I don't put a ton of weight on it. I would very likely prioritize binary size over bandwidth. But it is a concern that is on my mind, and it matters if there is a release valve (like the proc macro) for achieving smaller binary size when that is important.
I laid out what I think the next steps are in this comment. None of the comments that have followed have changed that. The next step is to actually try out the proposed |
I am un-subscribing from this thread. Future responses may take longer that past responses have. |
With #287 merged, these are now the type definitions used for representing #[derive(Clone, Debug)]
pub struct Tzif<STR, ABBREV, TYPES, TIMESTAMPS, STARTS, ENDS, INFOS> {
pub fixed: TzifFixed<STR, ABBREV>,
pub types: TYPES,
pub transitions: TzifTransitions<TIMESTAMPS, STARTS, ENDS, INFOS>,
}
#[derive(Clone, Debug)]
pub struct TzifFixed<STR, ABBREV> {
pub name: Option<STR>,
pub version: u8,
pub checksum: u32,
pub designations: STR,
pub posix_tz: Option<PosixTimeZone<ABBREV>>,
}
#[derive(Clone, Copy, Debug)]
pub struct TzifLocalTimeType {
pub offset: i32,
pub is_dst: bool,
pub designation: (u8, u8),
pub indicator: TzifIndicator,
}
#[derive(Clone, Copy, Debug)]
pub enum TzifIndicator {
LocalWall,
LocalStandard,
UTStandard,
}
#[derive(Clone, Debug)]
pub struct TzifTransitions<TIMESTAMPS, STARTS, ENDS, INFOS> {
pub timestamps: TIMESTAMPS,
pub civil_starts: STARTS,
pub civil_ends: ENDS,
pub infos: INFOS,
}
#[derive(Clone, Copy, Debug)]
pub struct TzifTransitionInfo {
pub type_index: u8,
pub kind: TzifTransitionKind,
}
#[derive(Clone, Copy, Debug)]
pub enum TzifTransitionKind {
Unambiguous,
Gap,
Fold,
}
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, PartialOrd, Ord)]
pub struct TzifDateTime {
bits: i64,
}
impl TzifDateTime {
pub const ZERO: TzifDateTime = TzifDateTime::new(0, 0, 0, 0, 0, 0);
pub const fn new(
year: i16,
month: i8,
day: i8,
hour: i8,
minute: i8,
second: i8,
) -> TzifDateTime {
let mut bits = (year as u64) << 48;
bits |= (month as u64) << 40;
bits |= (day as u64) << 32;
bits |= (hour as u64) << 24;
bits |= (minute as u64) << 16;
bits |= (second as u64) << 8;
// The least significant 8 bits remain 0.
TzifDateTime { bits: bits as i64 }
}
pub const fn year(self) -> i16 {
(self.bits as u64 >> 48) as u16 as i16
}
pub const fn month(self) -> i8 {
(self.bits as u64 >> 40) as u8 as i8
}
pub const fn day(self) -> i8 {
(self.bits as u64 >> 32) as u8 as i8
}
pub const fn hour(self) -> i8 {
(self.bits as u64 >> 24) as u8 as i8
}
pub const fn minute(self) -> i8 {
(self.bits as u64 >> 16) as u8 as i8
}
pub const fn second(self) -> i8 {
(self.bits as u64 >> 8) as u8 as i8
}
} I've excluded POSIX time zones, and for now, this does include the IANA time zone identifier too (ref #285). So for this issue, Short of that, this also implies that the internal TZif routines for TZ lookups will need to be generic over both of these types. That's going to be very annoying to do, but I believe is possible. A possible alternative is if we somehow guaranteed that the representation of the types defined in |
Now that we can create static time zones at compile time,
jiff-tzdb
could return those instead of&'static [u8]
. This would save both time and space at runtime, as no runtime parsing would be required, and parsed timezones won't need to be cached anymore (caching would move intojiff-tzdb-platform
).One complication is that
TimeZone
corresponds to the current(&'static str, &'static [u8])
, i.e. it includes the IANA name. This means we cannot deduplicate at theTimeZone
level.The text was updated successfully, but these errors were encountered: