Skip to content

Failed to read manifest: Unexpected magic byte in manifest #21672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tmeinlschmidt opened this issue Mar 31, 2025 · 5 comments
Open

Failed to read manifest: Unexpected magic byte in manifest #21672

tmeinlschmidt opened this issue Mar 31, 2025 · 5 comments
Assignees
Labels

Comments

@tmeinlschmidt
Copy link

Steps to Reproduce Issue

Node was running just fine and then failed, since then it's not working. Unfortunately no further logs were preserved.

sui 1.45.3

config:

authority-store-pruning-config:
  num-latest-epoch-dbs-to-retain: 3
  epoch-db-pruning-period-secs: 3600
  num-epochs-to-retain: 2
  num-epochs-to-retain-for-checkpoints: 2
  max-checkpoints-in-batch: 10
  max-transactions-in-batch: 1000
  pruning-run-delay-seconds: 60

using both state-archive-write and read configs

logs:

2025-03-31T21:09:20.035790Z  INFO sui_core::authority::authority_store: Cur epoch: 714
2025-03-31T21:09:20.058502Z  INFO mysten_network::client: DISABLE_CACHING_RESOLVER: false
2025-03-31T21:09:20.417449Z  INFO AuthorityPerEpochStore::new{epoch=714}: sui_core::authority::authority_per_epoch_store: epoch flags: [UseVersionAssignmentTablesV3, DataQuarantineFromBeginningOfEpoch]
2025-03-31T21:09:20.417794Z  INFO AuthorityPerEpochStore::new{epoch=714}: sui_core::authority::authority_per_epoch_store: authenticator_state enabled
2025-03-31T21:09:20.424295Z  INFO sui_node: created epoch store
2025-03-31T21:09:20.605856Z  INFO sui_node: creating state sync store
2025-03-31T21:09:20.605877Z  INFO sui_node: creating index store
2025-03-31T21:09:21.920642Z  INFO sui_node: creating archive reader
2025-03-31T21:09:21.921513Z  INFO sui_config::object_storage_config: Object Store bucket=Some("tatum-sui-archive") object_store_type="S3"
2025-03-31T21:09:21.921866Z  INFO object_store::aws::builder: Using Static credential provider
2025-03-31T21:09:21.934611Z  INFO sui_node: P2p network started on 0.0.0.0:30087 server_name="sui-35834a8a"
2025-03-31T21:09:21.935294Z  INFO sui_node: start state archival
2025-03-31T21:09:21.935302Z  INFO sui_config::object_storage_config: Object Store bucket=Some("tatum-sui-archive") object_store_type="S3"
2025-03-31T21:09:21.935346Z  INFO object_store::aws::builder: Using Static credential provider
2025-03-31T21:09:21.940220Z  INFO sui_config::object_storage_config: Object Store directory=Some("/data/data/archive") object_store_type="File"
2025-03-31T21:09:21.940811Z  INFO sui_network::discovery: Discovery started
2025-03-31T21:09:21.941052Z  INFO sui_network::randomness: Randomness network event loop started
2025-03-31T21:09:21.941075Z  INFO sui_network::state_sync: State-Synchronizer started
2025-03-31T21:09:21.941372Z  INFO connection-manager{peer=75b8d479}: anemo::network::connection_manager: ConnectionManager started
2025-03-31T21:09:22.137815Z  WARN request{route=/sui.Discovery/GetKnownPeersV2 remote_peer_id=c7bf6cb9 direction=outbound}: anemo_tower::trace::on_failure: response failed error=Error: connection lost latency=16 ms
2025-03-31T21:09:22.137901Z  WARN request{route=/sui.StateSync/GetCheckpointSummary remote_peer_id=c7bf6cb9 direction=outbound}: anemo_tower::trace::on_failure: response failed error=Error: connection lost latency=16 ms
2025-03-31T21:09:22.140696Z  WARN request{route=/sui.Discovery/GetKnownPeersV2 remote_peer_id=3227f8a0 direction=outbound}: anemo_tower::trace::on_failure: response failed error=Error: connection lost latency=5 ms
2025-03-31T21:09:22.145535Z  WARN request{route=/sui.StateSync/GetCheckpointSummary remote_peer_id=3227f8a0 direction=outbound}: anemo_tower::trace::on_failure: response failed error=Error: connection lost latency=10 ms
2025-03-31T21:09:22.249064Z  WARN request{route=/sui.StateSync/GetCheckpointSummary remote_peer_id=c619a5e0 direction=outbound}: anemo_tower::trace::on_failure: response failed error=Error: connection lost latency=11 ms
2025-03-31T21:09:22.252420Z  WARN request{route=/sui.Discovery/GetKnownPeersV2 remote_peer_id=c619a5e0 direction=outbound}: anemo_tower::trace::on_failure: response failed error=Error: closed latency=0 ms
2025-03-31T21:09:22.284468Z  INFO sui_network::state_sync: retrying checkpoint sync after 9.996784589s
2025-03-31T21:09:22.321489Z ERROR telemetry_subscribers: panicked at /sui/crates/sui-archival/src/writer.rs:364:18:
Failed to read manifest: Unexpected magic byte in manifest: 2065850742

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: sui_archival::read_manifest_from_bytes
   2: sui_node::SuiNode::start_async::{{closure}}
   3: <core::pin::Pin<P> as core::future::future::Future>::poll
   4: tokio::runtime::task::harness::Harness<T,S>::poll
   5: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   6: tokio::runtime::scheduler::multi_thread::worker::Context::run
   7: tokio::runtime::context::runtime::enter_runtime
   8: tokio::runtime::scheduler::multi_thread::worker::run
   9: tokio::runtime::task::core::Core<T,S>::poll
  10: tokio::runtime::task::harness::Harness<T,S>::poll
  11: tokio::runtime::blocking::pool::Inner::run
  12: std::sys::backtrace::__rust_begin_short_backtrace
  13: core::ops::function::FnOnce::call_once{{vtable.shim}}
  14: std::sys::pal::unix::thread::Thread::new::thread_start
  15: start_thread
  16: clone panic.file="/sui/crates/sui-archival/src/writer.rs" panic.line=364 panic.column=18

thread 'sui-node-runtime' panicked at /sui/crates/sui-archival/src/writer.rs:364:18:
Failed to read manifest: Unexpected magic byte in manifest: 2065850742

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
   1: sui_archival::read_manifest_from_bytes
   2: sui_node::SuiNode::start_async::{{closure}}
   3: <core::pin::Pin<P> as core::future::future::Future>::poll
   4: tokio::runtime::task::harness::Harness<T,S>::poll
   5: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   6: tokio::runtime::scheduler::multi_thread::worker::Context::run
   7: tokio::runtime::context::runtime::enter_runtime
   8: tokio::runtime::scheduler::multi_thread::worker::run
   9: tokio::runtime::task::core::Core<T,S>::poll
  10: tokio::runtime::task::harness::Harness<T,S>::poll
  11: tokio::runtime::blocking::pool::Inner::run
  12: std::sys::backtrace::__rust_begin_short_backtrace
  13: core::ops::function::FnOnce::call_once{{vtable.shim}}
  14: std::sys::pal::unix::thread::Thread::new::thread_start
  15: start_thread
  16: clone
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: sui_node::SuiNode::start_async::{{closure}}
   4: <core::pin::Pin<P> as core::future::future::Future>::poll
   5: tokio::runtime::task::harness::Harness<T,S>::poll
   6: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
   7: tokio::runtime::scheduler::multi_thread::worker::Context::run
   8: tokio::runtime::context::runtime::enter_runtime
   9: tokio::runtime::scheduler::multi_thread::worker::run
  10: tokio::runtime::task::core::Core<T,S>::poll
  11: tokio::runtime::task::harness::Harness<T,S>::poll
  12: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Copy link
Contributor

Thank you for opening this issue, a team member will review it shortly. Until then, please do not interact with any users that claim to be from Sui support and do not click on any links!

@tmeinlschmidt
Copy link
Author

some more info:

Tried to remove all references to our s3 and node is up and syncing. Didn't check the epoch yet, but the error message is weird and deserves to be more verbose.
Anyway, found that we have stored epochs 705 to 713 (in our s3), all (except 713) have manifest file, but some are missing folders checpoints, epochs, indexes and rpc-index, looks weird.
like - epoch 705,706 has these folders, epochs 707-709 nope, 710, 711 have, 712,713 nope

@sadhansood
Copy link
Contributor

Sounds like you might be using the same bucket for state archive and db checkpoints? State archives are not supposed to have indexes, rpc-index, etc

@tmeinlschmidt
Copy link
Author

yes, I did. Is it mandatory to have it separated?

@tmeinlschmidt
Copy link
Author

if so, can you please help with proper config for sort of archive node? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants