Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return &'static TimeZone from jiff-tzdb #284

Open
robertbastian opened this issue Mar 2, 2025 · 31 comments
Open

Return &'static TimeZone from jiff-tzdb #284

robertbastian opened this issue Mar 2, 2025 · 31 comments
Labels
question Further information is requested

Comments

@robertbastian
Copy link
Contributor

Now that we can create static time zones at compile time, jiff-tzdb could return those instead of &'static [u8]. This would save both time and space at runtime, as no runtime parsing would be required, and parsed timezones won't need to be cached anymore (caching would move into jiff-tzdb-platform).

One complication is that TimeZone corresponds to the current (&'static str, &'static [u8]), i.e. it includes the IANA name. This means we cannot deduplicate at the TimeZone level.

@BurntSushi
Copy link
Owner

I'm not sure I see how this would work. jiff depends on jiff-tzdb, not the other way around.

I also don't understand the relevance of jiff-tzdb-platform here, since that's "just" a Cargo hack to make it possible for a Cargo feature to have target specific behavior. If there were some other way to express "enabling feature foo only has an effect on platforms quux and baz," then I wouldn't need jiff-tzdb-platform.

@BurntSushi
Copy link
Owner

Possibly relevant: one of the things I'm planning to do for the next release is to add a static_tzdb! macro to jiff-static that creates a TimeZoneDatabase from a &'static [TimeZone]. That provides the benefits you mention (no need to do caching and what not). The deduplication complication is still present, but I think it should be possible to make something work there where the static TZif data can be shared among multiple TimeZone values with distinct IANA time zone identifiers.

@robertbastian
Copy link
Contributor Author

robertbastian commented Mar 2, 2025

I also don't understand the relevance of jiff-tzdb-platform here

The way I understand this is that you gave jiff-tzdb-platform and jiff-tzdb the same API. Hence an API change in jiff-tzdb would imply an API change in jiff-tzdb-platform as well. And if keeping the same APIs, the caching can move into jiff-tzdb-platform, as caching the output of jiff-tzdb would be a waste of space.

                 CURRENT                                   PROPOSED                              
                  cache                                       │                   
                    │                                         │ TimeZone          
                    │ TimeZone                      ┌─────────┴──────────┐        
                    │                         ┌─────────────────┐  ┌──────────┐   
                  parse                       │   cache         │  │jiff-tzdb │   
                    │                         │     │           │  └──────────┘   
                    │ &[u8]                   │     │ TimeZone  │                 
           ┌────────┴─────────┐               │     │           │                 
   ┌────────────────────┐┌──────────┐         │   parse         │                 
   │ jiff-tzdb-platform ││jiff-tzdb │         │     │           │                 
   └────────────────────┘└──────────┘         │     │ &[u8]     │                 
                                              │     │           │                 
                                              │  <read fs>      │                 
                                              └─────────────────┘                 
                                               jiff-tzdb-platform                 

I'm not sure I see how this would work. jiff depends on jiff-tzdb, not the other way around.

Yeah, jiff-tzdb and jiff-tzdb-platform could not return TimeZones directly. They would need to return something like (&'static str, TzifStatic)/(String, TzifOwned), which jiff can then convert into TimeZones. I don't entirely grasp the combination of crates/features, but these types could either live in the jiff-tzdb* crates, or a new jiff-tzif crate if both crates need to be usable at the same time.

@BurntSushi
Copy link
Owner

Nothing of consequence can live in jiff-tzdb-platform. For example, there is no supported Jiff configuration where users on Linux or macOS would depend on jiff-tzdb-platform (transitively or otherwise).

Yeah, jiff-tzdb and jiff-tzdb-platform could not return TimeZones directly. They would need to return something like (&'static str, TzifStatic)/(String, TzifOwned), which jiff can then convert into TimeZones. I don't entirely grasp the combination of crates/features, but these types could either live in the jiff-tzdb* crates, or a new jiff-tzif crate if both crates need to be usable at the same time.

I'm not going to do this because of my dependency philosophy. I have no interest (at this point, I may in the more distant future) in maintaining a semver boundary that exposes TZif parsing (and likely POSIX time zones and possibly other things depending on how the API is designed).

I think the static_tzdb proc macro I mentioned probably covers the benefits you're seeking here?

@robertbastian
Copy link
Contributor Author

For example, there is no supported Jiff configuration where users on Linux or macOS would depend on jiff-tzdb-platform (transitively or otherwise).

Oh I was under the impression that jiff-tzdb-platform does the platform-specific lookup and parsing, but it's really just an empty crate.

If the UNIX-tzdb parsing logic lives in jiff (not jiff-tzdb-platform), can the built-in tzdb not also live in jiff (not jiff-tzdb)? jiff-tzdb could still be a dummy crate if needed.

I think the static_tzdb proc macro I mentioned probably covers the benefits you're seeking here?

This is not for my specific use case, but for the default user. As long as jiff-tzdb contains a big binary blob, there will be code paths that parse this blob at runtime, and cache the results, even though the parsing could happen at compile-time and caching would not be necessary.
You could say that this is on Windows only (I think?), and it is no less efficient than on Linux, but on Linux users pay this price in order to use latest data from the system, on Windows they pay this price for nothing.

static_tzdb would allow a user to work around this, but the blob would still be there, and used by default.

@BurntSushi
Copy link
Owner

If the UNIX-tzdb parsing logic lives in jiff (not jiff-tzdb-platform), can the built-in tzdb not also live in jiff (not jiff-tzdb)? jiff-tzdb could still be a dummy crate if needed.

I don't know how to do it. The requirements are:

  • Bundled TZif data is used by default on platforms without established practice for accessing a system copy of the IANA time zone database.
  • Bundled TZif data is not included at all by default on platforms with an established practice for accessing a system copy of the IANA time zone database.
  • Users of Jiff may opt into always including bundled TZif data, even when, e.g., /usr/share/zoneinfo is available.
  • These decisions must be made at compile time to avoid bundling data into the binary when it isn't necessary, and they should not require heavy-weight dependencies (like syn) by default.
  • Overall, Jiff should own the decision by default to avoid the scenario where end users are forced to figure all of this out on their own (like they do with chrono).

Jiff's crate structure to support this is an idiom (AFAIK, although I'm having trouble recalling other examples of this) for achieving this sort of target specific and optional behavior.

If you're fine depending on Jiff, then jiff-tzdb should be treated as an implementation detail generally.

This is not for my specific use case, but for the default user. As long as jiff-tzdb contains a big binary blob, there will be code paths that parse this blob at runtime, and cache the results, even though the parsing could happen at compile-time and caching would not be necessary.

You could say that this is on Windows only (I think?), and it is no less efficient than on Linux, but on Linux users pay this price in order to use latest data from the system, on Windows they pay this price for nothing.

IMO, these costs are not generally significant. And if they are significant, I'm fine with telling users they should use the proc macro. The proc macro also potentially permits greater savings by constructing a TimeZoneDatabase with a subset of time zones.

If there was a way to avoid these costs easily without doing something else that I perceive as costly, then I'm open to it. But I don't really see a way (with maybe 90% confidence). The proc macro squares this circle by doing code generation in a context where jiff is available, but I don't see how to do that for something like jiff-tzdb in a way the meets the requirements above.

If I were open to pushing the TZif parser (and everything it requires) out into a new crate (say, jiff-core or something), then it might be possible to make this work. Basically, jiff-tzdb would depend on jiff-core, and it would provide an API like what you suggested above. And then we'd need a public constructor for TimeZone that takes this jiff-tzdb data type and builds a TimeZone. All of that infrastructure is already there, since it was required to build it for the proc macro. But none of it is exposed over a semver boundary. We can get away with this because of the conventions around proc macros (e.g., serde provides one-version-compatibility with serde-derive, and jiff will do the same for jiff-static).

I have a history of splitting crates apart into semver boundaries, so I wouldn't be surprised if that eventually happens with Jiff. But I am definitely not going to do it before Jiff 1.0, and I'll avoid it altogether if I can. But once jiff 1.0 is out, that doesn't mean jiff-tzdb will need to be 1.0 and can't have breaking changes. So it's possible to still iterate there.

It's also important to acknowledge that some of difficulty in the design space here is a direct consequence of Jiff's API being a closed system, where with chrono, its API is an open system that allows arbitrary extensibility. This gives some more flexibility in how dependencies are arranged. For example, in Jiff, you can only build a TimeZone with public APIs in Jiff. But in chrono, anyone can build something that implements the TimeZone trait and use it with chrono::DateTime.

@robertbastian
Copy link
Contributor Author

I'm trying very hard to understand why the crate split is necessary. jiff-tzdb and jiff-tzdb-platform don't have any features themselves, or any platform-specific behaviour, so why can this logic not be expressed in jiff itself?

@BurntSushi
Copy link
Owner

The platform specific behavior is in Cargo.toml.

I'll noodle on this to see if there is another way.

@BurntSushi
Copy link
Owner

I may be forgetting a requirement.

@robertbastian
Copy link
Contributor Author

I have been able to remove both jiff-tzdb-platform and jiff-tzdb, and I think the behaviour is the same. I inlined the data of jiff-tzdb into jiff, which opens the door to replacing the binary blob with TzifStatics.

master...robertbastian:jiff:tzdb

The other user of jiff-tzdb is the new get! macro, which I have removed in my branch for now.

@BurntSushi
Copy link
Owner

I remember now why I split it out into a separate crate. I did it so that people downloading jiff but didn't need a bundled copy of tzdb wouldn't also need to download the tzdb. For example, today on Linux and macOS systems, if you depend on jiff in its default configuration, you won't download jiff-tzdb.

With that said, I'm not sure how much that's worth it. According to crates.io, jiff is over 600KB and jiff-tzdb is only 80KB. And that's before your de-duplication. So adding the tzdb to Jiff proper doesn't seem like it would expand the size much. On the other hand, I haven't spent a lot of time optimizing the size of Jiff, so maybe it's not such a good idea to bundle tzdb with all downloads of Jiff.

As I said above, I generally don't consider the costs you've highlighted to be significant. So it's hard to choose between eliminating said costs versus the costs of extra bandwidth for every download of jiff.

@robertbastian
Copy link
Contributor Author

I remember now why I split it out into a separate crate. I did it so that people downloading jiff but didn't need a bundled copy of tzdb wouldn't also need to download the tzdb.

I don't think that's how it works? If I cargo add jiff in an empty project (on Mac), jiff-tzdb appears in the lockfile (as do all Windows dependencies).

According to crates.io, jiff is over 600KB and jiff-tzdb is only 80KB.

My PR reduced the blob size to 200kB, the crates.io size cannot be correct. That said, the data is being downloaded anyway, the tradeoff is actually between the tzif blob and what I'm proposing, because as preparsed Rust code, this will be bigger.

As I said above, I generally don't consider the costs you've highlighted to be significant.

Inlining the crates also means reduced API surface to stabilise, then these optimisations can be punted down the road.

@BurntSushi
Copy link
Owner

A crate being in the lock file doesn't imply it is being downloaded AFAIK. Certainly worth double checking.

The crates.io size is likely compressed. Depending on the compression, the deduplication may not help it much.

@BurntSushi
Copy link
Owner

jiff-tzdb is separately versioned. I think the change you are proposing is semver compatible for jiff itself. And I don't think it requires any new APIs in jiff. Which I think means the only API reduction is jiff-tzdb itself, which I don't think is significant. And jiff-tzdb being separately versioned means it can evolve independently of jiff. (Just like regex-syntax and regex-automata can and do evolve independently of regex.)

@BurntSushi
Copy link
Owner

And even if cargo add downloads jiff-tzdb, that doesn't mean a cargo build will if it's already in the lock file. I would be quite surprised otherwise.

@BurntSushi BurntSushi added the question Further information is requested label Mar 3, 2025
@BurntSushi
Copy link
Owner

OK, let's try it out. Starting with a clean project and no deps:

$ ls -l
total 16
-rw-rw-r-- 1 andrew users 153 Mar  3 09:14 Cargo.lock
-rw-rw-r-- 1 andrew users 173 Mar  3 09:14 Cargo.toml
-rw-rw-r-- 1 andrew users  45 Mar  3 09:14 main.rs
-rw-rw-r-- 1 andrew users  44 Mar  3 09:14 rustfmt.toml

$ cat Cargo.toml
[package]
publish = false
name = "jiff-play"
version = "0.1.0"
edition = "2021"

[dependencies]

[[bin]]
name = "jiff-play"
path = "main.rs"

[profile.release]
debug = true

$ cat Cargo.lock
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 4

[[package]]
name = "jiff-play"
version = "0.1.0"

Now let's do a cargo add jiff:

$ cargo add jiff --verbose
    Updating crates.io index
      Adding jiff v0.2.1 to dependencies
             Features:
             + alloc
             + std
             + tz-system
             + tzdb-bundle-platform
             + tzdb-concatenated
             + tzdb-zoneinfo
             - js
             - logging
             - serde
             - tzdb-bundle-always
    Updating crates.io index
     Locking 22 packages to latest compatible versions
      Adding jiff v0.2.1
      Adding jiff-tzdb v0.1.2
      Adding jiff-tzdb-platform v0.1.2
      Adding log v0.4.26
      Adding portable-atomic v1.11.0
      Adding portable-atomic-util v0.2.4
      Adding proc-macro2 v1.0.94
      Adding quote v1.0.39
      Adding serde v1.0.218
      Adding serde_derive v1.0.218
      Adding syn v2.0.99
      Adding unicode-ident v1.0.17
      Adding windows-sys v0.59.0
      Adding windows-targets v0.52.6
      Adding windows_aarch64_gnullvm v0.52.6
      Adding windows_aarch64_msvc v0.52.6
      Adding windows_i686_gnu v0.52.6
      Adding windows_i686_gnullvm v0.52.6
      Adding windows_i686_msvc v0.52.6
      Adding windows_x86_64_gnu v0.52.6
      Adding windows_x86_64_gnullvm v0.52.6
      Adding windows_x86_64_msvc v0.52.6

I don't see anything being downloaded, although it's not obvious to me that Cargo would even tell me if something were downloaded? Maybe we can do a little better by first clearing Cargo's .crate cache:

$ rm -rf ~/.cargo/{git,registry}
$ tree -a ~/.cargo/
/home/andrew/.cargo/
├── bin
│   ├── find-invalid-utf8
│   ├── ppbinary
│   ├── sampler
│   ├── searchsms
│   ├── setup-system-links
│   └── write-weird-unicode-data
├── .crates2.json
├── .crates.toml
├── env
├── .global-cache
├── .package-cache
└── .package-cache-mutate

2 directories, 12 files

OK, reverting my project back to a clean state (no dependencies, no target, cache is still clear), let's see what happens when I do cargo add jiff:

$ cargo add jiff
$ tree -a ~/.cargo/
/home/andrew/.cargo/
├── bin
│   ├── find-invalid-utf8
│   ├── ppbinary
│   ├── sampler
│   ├── searchsms
│   ├── setup-system-links
│   └── write-weird-unicode-data
├── .crates2.json
├── .crates.toml
├── env
├── .global-cache
├── .package-cache
├── .package-cache-mutate
└── registry
    ├── CACHEDIR.TAG
    └── index
        └── index.crates.io-1949cf8c6b5b557f
            ├── .cache
            │   ├── 3
            │   │   ├── l
            │   │   │   └── log
            │   │   └── s
            │   │       └── syn
            │   ├── ji
            │   │   └── ff
            │   │       ├── jiff
            │   │       ├── jiff-tzdb
            │   │       └── jiff-tzdb-platform
            │   ├── po
            │   │   └── rt
            │   │       ├── portable-atomic
            │   │       └── portable-atomic-util
            │   ├── pr
            │   │   └── oc
            │   │       └── proc-macro2
            │   ├── qu
            │   │   └── ot
            │   │       └── quote
            │   ├── se
            │   │   └── rd
            │   │       ├── serde
            │   │       └── serde_derive
            │   ├── un
            │   │   └── ic
            │   │       └── unicode-ident
            │   └── wi
            │       └── nd
            │           ├── windows_aarch64_gnullvm
            │           ├── windows_aarch64_msvc
            │           ├── windows_i686_gnu
            │           ├── windows_i686_gnullvm
            │           ├── windows_i686_msvc
            │           ├── windows-sys
            │           ├── windows-targets
            │           ├── windows_x86_64_gnu
            │           ├── windows_x86_64_gnullvm
            │           └── windows_x86_64_msvc
            └── config.json

23 directories, 36 files

Now something about jiff-tzdb has been downloaded. But I would have expected this to "just" be metadata and not the entire crate. Indeed:

$ ls -lh ~/.cargo/registry/index/index.crates.io-1949cf8c6b5b557f/.cache/ji/ff/jiff-tzdb
-rw-rw-r-- 1 andrew users 736 Mar  3 09:21 /home/andrew/.cargo/registry/index/index.crates.io-1949cf8c6b5b557f/.cache/ji/ff/jiff-tzdb
$ cat ~/.cargo/registry/index/index.crates.io-1949cf8c6b5b557f/.cache/ji/ff/jiff-tzdb
etag: "2594d81cb0e2c0cec60e4f576a859764"0.0.1{"name":"jiff-tzdb","vers":"0.0.1","deps":[],"cksum":"fe66b2d044d8462898bed063b4f86674148955185353b04ec8a269207cad90a0","features":{},"yanked":false}0.1.0{"name":"jiff-tzdb","vers":"0.1.0","deps":[],"cksum":"05fac328b3df1c0f18a3c2ab6cb7e06e4e549f366017d796e3e66b6d6889abe6","features":{},"yanked":false,"rust_version":"1.70"}0.1.1{"name":"jiff-tzdb","vers":"0.1.1","deps":[],"cksum":"91335e575850c5c4c673b9bd467b0e025f164ca59d0564f69d0c2ee0ffad4653","features":{},"yanked":false,"rust_version":"1.70"}0.1.2{"name":"jiff-tzdb","vers":"0.1.2","deps":[],"cksum":"cf2cec2f5d266af45a071ece48b1fb89f3b00b2421ac3a5fe10285a6caaa60d3","features":{},"yanked":false,"rust_version":"1.70"}%

OK cool, makes sense. And now let's try a cargo build and see what happens:

$ cargo b
  Downloaded log v0.4.26
  Downloaded serde v1.0.218
  Downloaded jiff v0.2.1
  Downloaded 3 crates (798.4 KB) in 0.18s
   Compiling jiff v0.2.1
   Compiling jiff-play v0.1.0 (/home/andrew/tmp/x/rust/jiff-play)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.16s
$ tree -a ~/.cargo
/home/andrew/.cargo/
├── bin
│   ├── find-invalid-utf8
│   ├── ppbinary
│   ├── sampler
│   ├── searchsms
│   ├── setup-system-links
│   └── write-weird-unicode-data
├── .crates2.json
├── .crates.toml
├── env
├── .global-cache
├── .package-cache
├── .package-cache-mutate
└── registry
    ├── cache
    │   └── index.crates.io-1949cf8c6b5b557f
    │       ├── jiff-0.2.1.crate
    │       ├── log-0.4.26.crate
    │       └── serde-1.0.218.crate
    ├── CACHEDIR.TAG
    ├── index
    │   └── index.crates.io-1949cf8c6b5b557f
    │       ├── .cache
    │       │   ├── 3
    │       │   │   ├── l
    │       │   │   │   └── log
    │       │   │   └── s
    │       │   │       └── syn
    │       │   ├── ji
    │       │   │   └── ff
    │       │   │       ├── jiff
    │       │   │       ├── jiff-tzdb
    │       │   │       └── jiff-tzdb-platform
    │       │   ├── po
    │       │   │   └── rt
    │       │   │       ├── portable-atomic
    │       │   │       └── portable-atomic-util
    │       │   ├── pr
    │       │   │   └── oc
    │       │   │       └── proc-macro2
    │       │   ├── qu
    │       │   │   └── ot
    │       │   │       └── quote
    │       │   ├── se
    │       │   │   └── rd
    │       │   │       ├── serde
    │       │   │       └── serde_derive
    │       │   ├── un
    │       │   │   └── ic
    │       │   │       └── unicode-ident
    │       │   └── wi
    │       │       └── nd
    │       │           ├── windows_aarch64_gnullvm
    │       │           ├── windows_aarch64_msvc
    │       │           ├── windows_i686_gnu
    │       │           ├── windows_i686_gnullvm
    │       │           ├── windows_i686_msvc
    │       │           ├── windows-sys
    │       │           ├── windows-targets
    │       │           ├── windows_x86_64_gnu
    │       │           ├── windows_x86_64_gnullvm
    │       │           └── windows_x86_64_msvc
    │       └── config.json
    └── src
        └── index.crates.io-1949cf8c6b5b557f
            ├── jiff-0.2.1
            │   ├── Cargo.lock
            │   ├── .cargo-ok
            │   ├── Cargo.toml
            │   ├── Cargo.toml.orig
            │   ├── .cargo_vcs_info.json
            │   ├── CHANGELOG.md
            │   ├── COMPARE.md
            │   ├── COPYING
            │   ├── DESIGN.md
            │   ├── LICENSE-MIT
            │   ├── PLATFORM.md
            │   ├── README.md
            │   ├── src
            │   │   ├── civil
            │   │   │   ├── date.rs
            │   │   │   ├── datetime.rs
            │   │   │   ├── iso_week_date.rs
            │   │   │   ├── mod.rs
            │   │   │   ├── time.rs
            │   │   │   └── weekday.rs
            │   │   ├── duration.rs
            │   │   ├── error.rs
            │   │   ├── fmt
            │   │   │   ├── friendly
            │   │   │   │   ├── mod.rs
            │   │   │   │   ├── parser_label.rs
            │   │   │   │   ├── parser.rs
            │   │   │   │   └── printer.rs
            │   │   │   ├── mod.rs
            │   │   │   ├── offset.rs
            │   │   │   ├── rfc2822.rs
            │   │   │   ├── rfc9557.rs
            │   │   │   ├── serde.rs
            │   │   │   ├── strtime
            │   │   │   │   ├── format.rs
            │   │   │   │   ├── mod.rs
            │   │   │   │   └── parse.rs
            │   │   │   ├── temporal
            │   │   │   │   ├── mod.rs
            │   │   │   │   ├── parser.rs
            │   │   │   │   ├── pieces.rs
            │   │   │   │   └── printer.rs
            │   │   │   └── util.rs
            │   │   ├── lib.rs
            │   │   ├── logging.rs
            │   │   ├── now.rs
            │   │   ├── signed_duration.rs
            │   │   ├── span.rs
            │   │   ├── timestamp.rs
            │   │   ├── tz
            │   │   │   ├── concatenated.rs
            │   │   │   ├── db
            │   │   │   │   ├── bundled
            │   │   │   │   │   ├── disabled.rs
            │   │   │   │   │   ├── enabled.rs
            │   │   │   │   │   └── mod.rs
            │   │   │   │   ├── concatenated
            │   │   │   │   │   ├── disabled.rs
            │   │   │   │   │   ├── enabled.rs
            │   │   │   │   │   └── mod.rs
            │   │   │   │   ├── mod.rs
            │   │   │   │   └── zoneinfo
            │   │   │   │       ├── disabled.rs
            │   │   │   │       ├── enabled.rs
            │   │   │   │       └── mod.rs
            │   │   │   ├── mod.rs
            │   │   │   ├── offset.rs
            │   │   │   ├── posix.rs
            │   │   │   ├── system
            │   │   │   │   ├── android.rs
            │   │   │   │   ├── mod.rs
            │   │   │   │   ├── unix.rs
            │   │   │   │   ├── wasm_js.rs
            │   │   │   │   └── windows
            │   │   │   │       ├── mod.rs
            │   │   │   │       └── windows_zones.rs
            │   │   │   ├── testdata.rs
            │   │   │   ├── tzif.rs
            │   │   │   └── zic.rs
            │   │   ├── util
            │   │   │   ├── array_str.rs
            │   │   │   ├── borrow.rs
            │   │   │   ├── cache.rs
            │   │   │   ├── common.rs
            │   │   │   ├── crc32
            │   │   │   │   ├── mod.rs
            │   │   │   │   └── table.rs
            │   │   │   ├── escape.rs
            │   │   │   ├── fs.rs
            │   │   │   ├── libm.rs
            │   │   │   ├── mod.rs
            │   │   │   ├── parse.rs
            │   │   │   ├── rangeint.rs
            │   │   │   ├── round
            │   │   │   │   ├── increment.rs
            │   │   │   │   ├── mode.rs
            │   │   │   │   └── mod.rs
            │   │   │   ├── sync.rs
            │   │   │   ├── t.rs
            │   │   │   └── utf8.rs
            │   │   └── zoned.rs
            │   ├── tests
            │   │   └── lib.rs
            │   └── UNLICENSE
            ├── log-0.4.26
            │   ├── benches
            │   │   └── value.rs
            │   ├── Cargo.lock
            │   ├── .cargo-ok
            │   ├── Cargo.toml
            │   ├── Cargo.toml.orig
            │   ├── .cargo_vcs_info.json
            │   ├── CHANGELOG.md
            │   ├── .github
            │   │   └── workflows
            │   │       └── main.yml
            │   ├── .gitignore
            │   ├── LICENSE-APACHE
            │   ├── LICENSE-MIT
            │   ├── README.md
            │   ├── src
            │   │   ├── kv
            │   │   │   ├── error.rs
            │   │   │   ├── key.rs
            │   │   │   ├── mod.rs
            │   │   │   ├── source.rs
            │   │   │   └── value.rs
            │   │   ├── lib.rs
            │   │   ├── macros.rs
            │   │   ├── __private_api.rs
            │   │   └── serde.rs
            │   └── triagebot.toml
            └── serde-1.0.218
                ├── build.rs
                ├── Cargo.lock
                ├── .cargo-ok
                ├── Cargo.toml
                ├── Cargo.toml.orig
                ├── .cargo_vcs_info.json
                ├── crates-io.md
                ├── LICENSE-APACHE
                ├── LICENSE-MIT
                ├── README.md
                └── src
                    ├── de
                    │   ├── ignored_any.rs
                    │   ├── impls.rs
                    │   ├── mod.rs
                    │   ├── seed.rs
                    │   ├── size_hint.rs
                    │   └── value.rs
                    ├── format.rs
                    ├── integer128.rs
                    ├── lib.rs
                    ├── macros.rs
                    ├── private
                    │   ├── de.rs
                    │   ├── doc.rs
                    │   ├── mod.rs
                    │   └── ser.rs
                    ├── ser
                    │   ├── fmt.rs
                    │   ├── impls.rs
                    │   ├── impossible.rs
                    │   └── mod.rs
                    └── std_error.rs

56 directories, 177 files

I don't see any jiff-tzdb.crate file, but I do see jiff.crate, log.crate and serde.crate. Which, actually, is quite confusing! Why are log.crate and serde.crate being downloaded? Even the cargo build output shows this, but they aren't being compiled. I don't get that.

And now if I forcefully enable jiff-tzdb on Linux, I can see that jiff-tzdb.crate is now downloaded:

$ cargo add jiff --features tzdb-bundle-always
    Updating crates.io index
      Adding jiff v0.2.1 to dependencies
             Features:
             + alloc
             + std
             + tz-system
             + tzdb-bundle-always
             + tzdb-bundle-platform
             + tzdb-concatenated
             + tzdb-zoneinfo
             - js
             - logging
             - serde
$ tree -a ~/.cargo/ | rg jiff-tzdb
    │       │   │       ├── jiff-tzdb
    │       │   │       └── jiff-tzdb-platform
$ cargo b
  Downloaded jiff-tzdb v0.1.2
  Downloaded 1 crate (82.2 KB) in 0.11s
   Compiling jiff-tzdb v0.1.2
   Compiling jiff v0.2.1
   Compiling jiff-play v0.1.0 (/home/andrew/tmp/x/rust/jiff-play)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.11s
$ tree -a ~/.cargo/ | rg jiff-tzdb
    │       ├── jiff-tzdb-0.1.2.crate
    │       │   │       ├── jiff-tzdb
    │       │   │       └── jiff-tzdb-platform
            ├── jiff-tzdb-0.1.2

So I think this is enough to convince me, absent other evidence, that putting the data into a separate crate does indeed avoid downloading it at all on Unix systems.

@robertbastian
Copy link
Contributor Author

Ok, so this reduces jiff's one-time crates.io download size by 11%. It's an odd thing to optimise if you ask me, you could split out more code into other crates (and pay the semver price) if that is really an issue.

@BurntSushi
Copy link
Owner

I feel I was appropriately circumspect in my comment above:

With that said, I'm not sure how much that's worth it. According to crates.io, jiff is over 600KB and jiff-tzdb is only 80KB. And that's before your de-duplication. So adding the tzdb to Jiff proper doesn't seem like it would expand the size much. On the other hand, I haven't spent a lot of time optimizing the size of Jiff, so maybe it's not such a good idea to bundle tzdb with all downloads of Jiff.

As I said above, I generally don't consider the costs you've highlighted to be significant. So it's hard to choose between eliminating said costs versus the costs of extra bandwidth for every download of jiff.

I don't see anything immediately actionable here on my end, so I'm going to close this. Happy to revisit if new data arises that influences the trade-offs here.

One other thing that isn't captured in this issue but I meant to mention is that temporal_rs is currently depending on jiff-tzdb. IDK if that's their long term plan or not, but having the data in a separate crate is beneficial for things like that. I wouldn't call it a primary use case, but it's worth noting here.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2025
@robertbastian
Copy link
Contributor Author

Isn't the jiff-tzdb-platform crate still unnecessary? And can jiff-tzdb not still return parsed Tzif structs (defined in that crate)?

@BurntSushi
Copy link
Owner

BurntSushi commented Mar 3, 2025

Isn't the jiff-tzdb-platform crate still unnecessary?

I don't see how to get rid of jiff-tzdb-platform in a way that avoids unnecessarily downloading jiff-tzdb on Unix systems. To avoid downloading jiff-tzdb, I believe the "which platform is this used on" logic has to be embedded into Cargo.toml somehow. And since we want that outcome to be different depending on the platform, you need a middle-man to do it. Note for example that jiff-tzdb-platform is a target specific dependency of Jiff:

[target.'cfg(any(windows, target_family = "wasm"))'.dependencies]
jiff-tzdb-platform = { version = "0.1.2", path = "crates/jiff-tzdb-platform", optional = true }

If we cut out jiff-tzdb-platform, then how do we make the tzdb-bundle-platform feature only download jiff-tzdb on platforms where it is needed?

This idiom was suggested as far back as 2016.

And can jiff-tzdb not still return parsed Tzif structs (defined in that crate)?

I think so. There is a slight unknown though in supporting two different types of &'static pointers in a const context via pointer tagging. Seems surmountable, but I'm not 100% certain. Maybe forcing them to have different alignments would do the trick if pointer arithmetic doesn't.

I guess I'll re-open this for that issue, but honestly, it's not going to be a priority of mine and I'm not completely convinced it's even worth doing. I've said a few times now that I don't think the costs you've highlighted are significant, and the existence of the proc macro should act as a feasible alternative for cases where those costs are significant. So I'm not sure I really want to go through the effort of making jiff-tzdb exposed a more structured representation and likely be bigger in size than it is today.

@BurntSushi BurntSushi reopened this Mar 3, 2025
@robertbastian
Copy link
Contributor Author

If we cut out jiff-tzdb-platform, then how do we make the tzdb-bundle-platform feature only download jiff-tzdb on platforms where it is needed?

[target.'cfg(not(any(windows, target_family = "wasm")))'.features]
tzdb-bundle-platform = []

[target.'cfg(any(windows, target_family = "wasm"))'.features]
tzdb-bundle-platform = ["dep:jiff-tzdb", "alloc"]

@BurntSushi
Copy link
Owner

Where did you get that from? I don't see that documented anywhere, and if I try it, I get warnings from Cargo saying that it is unused:

warning: /home/andrew/code/rust/jiff/pr/Cargo.toml: unused manifest key: target.cfg(any(windows, target_family = "wasm")).features
warning: /home/andrew/code/rust/jiff/pr/Cargo.toml: unused manifest key: target.cfg(not(any(windows, target_family = "wasm"))).features

Please read through rust-lang/cargo#1197

@robertbastian
Copy link
Contributor Author

Oh I misread that post.

Anyway, the jiff-tzdb-platform hack is orthogonal to this issue. Returning parsed structs from jiff-tzbd would not only reduce runtime costs, but would actually provide a fully no-alloc tzdb.

@BurntSushi
Copy link
Owner

BurntSushi commented Mar 4, 2025

Anyway, the jiff-tzdb-platform hack is orthogonal to this issue.

Yes, indeed, I know...

Returning parsed structs from jiff-tzbd would not only reduce runtime costs, but would actually provide a fully no-alloc tzdb.

The no-alloc aspect of it is compelling, I'll grant you that.

I will likely experiment with this after I get the proc macro released, which will be another way to create a fully no-alloc tzdb. The advantage of the proc macro approach is that you can choose to include a subset of all time zones into your binary, where as with jiff-tzdb, you're kinda forced to include everything into the binary. (Short of introducing build.rs filtering shenanigans, like what chrono-tz does, but I don't want to do.)

To summarize, the remaining concerns I have here are:

  • Increased binary/download size by switching from a &'static [u8] for all tzdb to Rust source code. I don't know how much is "too much" here.
  • Whether the pointer tagging representation of TimeZone can be made to support two different &'static TZif structures.

I think the only way to figure out whether those concerns are blocking is to try it out.

@robertbastian
Copy link
Contributor Author

One thing I just realised is that the bundled database cache requires std, as it uses a global RwLock. This means a no-std (but alloc) caller will do a lot of parsing.

The advantage of the proc macro approach is that you can choose to include a subset of all time zones into your binary, where as with jiff-tzdb, you're kinda forced to include everything into the binary.

I'm not convinced that that is something clients really need. You either handle time zones yourself, in which case you don't need a TZDB (i.e. the current macros are fine), or you need to handle TZDB IDs, in which case it's risky to not understand all of them.

I think a more useful size-reduction strategy would be to drop old time zone transitions. Many programs for example don't handle the past at all, and even those that do probably don't need zic's 1970 cutoff (which has not changed since 1990). Dropping transitions can remove a lot of data, while still supporting all TZDB IDs (a lot of zones will deduplicate, and many will become small POSIX zones).

Unlike the zone filtering, this is something you could extend to jiff-tzdb, e.g. have features like tzdb-bundled and tzdb-bundled-future.

@robertbastian
Copy link
Contributor Author

Running zic with -r @1741101498 (now in UNIX time) reduces concatenated-zoneinfo.dat from 202kB to 16kB.

Looking at the files, even most of that is TZIF headers overhead, so preparsing would cut that down again.

% hexdump -C out/Europe/Zurich
00000000  54 5a 69 66 32 00 00 00  00 00 00 00 00 00 00 00  |TZif2...........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  00 00 00 00 00 00 00 01  00 00 00 01 00 00 00 00  |................|
00000030  00 00 00 54 5a 69 66 32  00 00 00 00 00 00 00 00  |...TZif2........|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 01 00  00 00 02 00 00 00 08 00  |................|
00000060  00 00 00 67 c7 19 11 01  00 00 00 00 00 00 00 00  |...g............|
00000070  0e 10 00 04 2d 30 30 00  43 45 54 00 0a 43 45 54  |....-00.CET..CET|
00000080  2d 31 43 45 53 54 2c 4d  33 2e 35 2e 30 2c 4d 31  |-1CEST,M3.5.0,M1|
00000090  30 2e 35 2e 30 2f 33 0a                           |0.5.0/3.|
00000098

@BurntSushi
Copy link
Owner

One thing I just realised is that the bundled database cache requires std, as it uses a global RwLock. This means a no-std (but alloc) caller will do a lot of parsing.

You mean this as a downside of the status quo? Yes, I agree. In general I am more than fine for users without std to need to pay extra CPU costs for the same functionality. The same is true in regex in some cases, although it's perhaps not as severe here.

I'm not saying this to say that we should therefore not care about this, but just as an expression of my general design philosophy. I also perceive the proc macro as a release valve of sorts here.

I'm not convinced that that is something clients really need. You either handle time zones yourself, in which case you don't need a TZDB (i.e. the current macros are fine), or you need to handle TZDB IDs, in which case it's risky to not understand all of them.

Risky yes. But it does increase flexibility. Being able to use a TimeZoneDatabase, and also set it as the global tzdb (much like log lets you see a logger), I think would greatly improve ergonomics.

It may be prudent to hold off adding this macro for now, but it's a very small amount of work and unlocks new workflows.

I think a more useful size-reduction strategy would be to drop old time zone transitions. Many programs for example don't handle the past at all, and even those that do probably don't need zic's 1970 cutoff (which has not changed since 1990). Dropping transitions can remove a lot of data, while still supporting all TZDB IDs (a lot of zones will deduplicate, and many will become small POSIX zones).

I'm hesitant to do this in jiff-tzdb because I think it could be potentially very surprising to users, and the failure modes are likely silent. Users likely won't know anything is wrong until it shows up as an inconsistency with some other system.

I am okay with making such size reductions opt-in, but that is hard to do with jiff-tzdb since the data needs to be embedded into the crate. Everything gets downloaded, although using Cargo features for opting into size reductions would still help binary size. With the proc macro though, I think it's potentially easier to expose these size reductions because the proc macro owns the generation of TZif data. For jiff-tzdb, that all has to be fixed into the source code shipped to crates.io.


This issue has used up a lot of my time so far. I have some action items, as discussed above, to experiment with. When I do that, I'll report back with my results.

@robertbastian
Copy link
Contributor Author

You mean this as a downside of the status quo?

Yes

I'm hesitant to do this in jiff-tzdb because I think it could be potentially very surprising to users, and the failure modes are likely silent.

They shouldn't be:

The output files use UT offset 0 and abbreviation “-00” in place of the omitted timestamp data.

This is not a problem specific to this proposal, different zoneinfos can already use different cutoffs. My Mac's Europe/Zurich starts in 1853, but technically pre-1970 is out of scope for the TZDB, so I wouldn't be surprised if other systems excluded them.

Everything gets downloaded, although using Cargo features for opting into size reductions would still help binary size.

I don't think it's fair to evaluate such proposals on the grounds of crates.io download sizes. It doesn't "still help binary size" despite the download size, it helps binary size, significantly, and if some dev has to download 10kB (1.3% of jiff + jiff-tzdb's size) more, once, so what?

With the proc macro though, I think it's potentially easier to expose these size reductions because the proc macro owns the generation of TZif data. For jiff-tzdb, that all has to be fixed into the source code shipped to crates.io.

I agree that the proc macro is the better place for this, as that can offer an opt-in, fully customisable TZDB. However, I think that for the sake of usability, these options should also be exposed in easier ways, and for the sake of maintainability, jiff-tzdb's build script should start using the proc macro code at some point.

This issue has used up a lot of my time so far.

Yeah, same. But I think it's worth the time, because I think jiff is a better library than chrono in many areas, just not this right now. And it's not my goal to tell you what to work on, I'm happy to work on this myself if we can agree on an approach.

@BurntSushi
Copy link
Owner

This is not a problem specific to this proposal, different zoneinfos can already use different cutoffs. My Mac's Europe/Zurich starts in 1853, but technically pre-1970 is out of scope for the TZDB, so I wouldn't be surprised if other systems excluded them.

I'm aware. The problem arises when one application uses different tzdb data than another. "When I use java.time I get this result, but now when I switch to Jiff I get this other result."

I don't think it's fair to evaluate such proposals on the grounds of crates.io download sizes. It doesn't "still help binary size" despite the download size, it helps binary size, significantly, and if some dev has to download 10kB (1.3% of jiff + jiff-tzdb's size) more, once, so what?

I don't really have the energy to engage with you on this. Bandwidth matters for lots of reasons and I don't agree with your framing. I am not making a specific proposal as to how much weight to attach to it. I don't put a ton of weight on it. I would very likely prioritize binary size over bandwidth. But it is a concern that is on my mind, and it matters if there is a release valve (like the proc macro) for achieving smaller binary size when that is important.

Yeah, same. But I think it's worth the time, because I think jiff is a better library than chrono in many areas, just not this right now. And it's not my goal to tell you what to work on, I'm happy to work on this myself if we can agree on an approach.

I laid out what I think the next steps are in this comment. None of the comments that have followed have changed that. The next step is to actually try out the proposed jiff-tzdb design and see how that impacts binary size and whether it's sound to achieve with pointer tagging.

@BurntSushi
Copy link
Owner

I am un-subscribing from this thread. Future responses may take longer that past responses have.

@robertbastian robertbastian closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2025
@BurntSushi BurntSushi reopened this Mar 4, 2025
@BurntSushi
Copy link
Owner

With #287 merged, these are now the type definitions used for representing Tzif data:

#[derive(Clone, Debug)]
pub struct Tzif<STR, ABBREV, TYPES, TIMESTAMPS, STARTS, ENDS, INFOS> {
    pub fixed: TzifFixed<STR, ABBREV>,
    pub types: TYPES,
    pub transitions: TzifTransitions<TIMESTAMPS, STARTS, ENDS, INFOS>,
}

#[derive(Clone, Debug)]
pub struct TzifFixed<STR, ABBREV> {
    pub name: Option<STR>,
    pub version: u8,
    pub checksum: u32,
    pub designations: STR,
    pub posix_tz: Option<PosixTimeZone<ABBREV>>,
}

#[derive(Clone, Copy, Debug)]
pub struct TzifLocalTimeType {
    pub offset: i32,
    pub is_dst: bool,
    pub designation: (u8, u8),
    pub indicator: TzifIndicator,
}

#[derive(Clone, Copy, Debug)]
pub enum TzifIndicator {
    LocalWall,
    LocalStandard,
    UTStandard,
}

#[derive(Clone, Debug)]
pub struct TzifTransitions<TIMESTAMPS, STARTS, ENDS, INFOS> {
    pub timestamps: TIMESTAMPS,
    pub civil_starts: STARTS,
    pub civil_ends: ENDS,
    pub infos: INFOS,
}

#[derive(Clone, Copy, Debug)]
pub struct TzifTransitionInfo {
    pub type_index: u8,
    pub kind: TzifTransitionKind,
}

#[derive(Clone, Copy, Debug)]
pub enum TzifTransitionKind {
    Unambiguous,
    Gap,
    Fold,
}

#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, PartialOrd, Ord)]
pub struct TzifDateTime {
    bits: i64,
}

impl TzifDateTime {
    pub const ZERO: TzifDateTime = TzifDateTime::new(0, 0, 0, 0, 0, 0);

    pub const fn new(
        year: i16,
        month: i8,
        day: i8,
        hour: i8,
        minute: i8,
        second: i8,
    ) -> TzifDateTime {
        let mut bits = (year as u64) << 48;
        bits |= (month as u64) << 40;
        bits |= (day as u64) << 32;
        bits |= (hour as u64) << 24;
        bits |= (minute as u64) << 16;
        bits |= (second as u64) << 8;
        // The least significant 8 bits remain 0.
        TzifDateTime { bits: bits as i64 }
    }

    pub const fn year(self) -> i16 {
        (self.bits as u64 >> 48) as u16 as i16
    }

    pub const fn month(self) -> i8 {
        (self.bits as u64 >> 40) as u8 as i8
    }

    pub const fn day(self) -> i8 {
        (self.bits as u64 >> 32) as u8 as i8
    }

    pub const fn hour(self) -> i8 {
        (self.bits as u64 >> 24) as u8 as i8
    }

    pub const fn minute(self) -> i8 {
        (self.bits as u64 >> 16) as u8 as i8
    }

    pub const fn second(self) -> i8 {
        (self.bits as u64 >> 8) as u8 as i8
    }
}

I've excluded POSIX time zones, and for now, this does include the IANA time zone identifier too (ref #285).

So for this issue, jiff-tzdb would, I believe, need to define its own copy of these types. It would not be able to use Jiff's copy, since jiff depends on jiff-tzdb. Further, in order to make this work in core-only environments, jiff::tz::TimeZone must have a const method that accepts a &'static jiff_tzdb::Tzif value and returns a TimeZone. Since these types are defined in jiff-tzdb, and are therefore distinct from the types generated by jiff-static, it follows that we will indeed need to represent this as a distinct pointer tag. So the problem of adding tag bits to a &'static T in a const context will need to be solved.

Short of that, this also implies that the internal TZif routines for TZ lookups will need to be generic over both of these types. That's going to be very annoying to do, but I believe is possible.

A possible alternative is if we somehow guaranteed that the representation of the types defined in jiff were always exactly the same as the representation of the types defined in jiff-tzdb. Then we could convert between them using unsafe without cost and, I believe, in a const context. This seems somewhat precarious to me, and I'm unsure if I want to do it. Because it means evolving the internal representation, even a little bit, of Tzif will need to go through a semver bump of jiff-tzdb. Yuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants