[spike] pluggable datastores for CDP #5526

mollykarcher · 2024-11-12T15:17:50Z

What problem does your feature solve?

Ultimately, we want to make it as easy as possible for external contributors to create data lakes using different technologies (s3, r2, mongo, mqtt, etc). With how the code is currently structured, if someone were to create a new data storage option, they would have to contribute it back to this repository (go monorepo). This makes it so that the responsibility for quality and future maintenance for all data stores technically lies with the maintainers of this repo (SDF). We don't want to slow things down by putting ourselves in the middle here, and we don't want to be the arbiters of what people can build and how they build it.

What would you like to see?

A spike or design proposal that outlines how we could restructure our code or repositories in a way that would allow Galexie to accept pluggable datastores. This could mean that the interface for how to create a datastore is public, and in some separate repo that is maintained in an SDF-owned repo, but the implementations live in different, disperse repos. Also keep in mind that we already want to pull out the consumption components of CDP (#5525) into their own repo.

The "dream" dev journey could look something like:

Implement some interface for a datastore in my own github repository
Download/install Galexie, and it's configuration accepts my pluggable datastore interface/config with no code changes to Galexie necessary
Pull the ingest SDK in my language of choice, and it's configuration accepts my pluggable datastore config with no code changes necessary

sreuland · 2024-12-02T19:16:58Z

wanted to post an idea for consideration, pluggable datastore as separate 0/S process, leverage inter-process rpc:

Create a new datastore.PluggableDatastore as an implementation of the datastore.Datastore interface, it encapsulates the remote Datastore interactions, managing the process lifecycle, converting all datastore.Datastore methods into equivalent rpc messages via 0MQ REQ/RSP exchanges. Decouples Datastores from being a binary dependency in application code.

Pluggable Datastore service could be implemented in any programming language that can be compiled to an o/s binary and
0MQ provides an SDK, which is most.

Pluggable Datastore service implementation needs to follow the contract for pluggability which
is to initiate a 0MQ socket and support REQ handlers for each of datastore interface methods.

[edit...couple days later]
looking at other blockchain projects for precedence in distributed processing rather than monolithic or dynamic loading for application runtime composition, one such project is - Tendermint ABCI, that architecture is synonymous, with the Tendermint core being similar to Galexie/Consumer and the Application being the remote Datastore instance proposed here, and the ABCI is the Datastore interface.

leighmcculloch · 2025-03-02T23:52:22Z

pluggable datastore as separate 0/S process, leverage inter-process rpc

We should speak to stellar-core maintainers (@stellar/core-committers) about their experiences of using separate processors for data integrations. stellar-core runs commands when interacting with archives that are typically configured using curl. I understand from talking to @graydon that this has proven to be inefficient. I don't think what you're proposing here is exactly the same design because I think you're suggesting a long running process, so maybe doesn't suffer the same problems.

Another thing we should keep in the back of our mind is that running separate processes isn't trivial, and in containers has it's own challenges. e.g. child process management, child process reaping, zombie-ing, etc. This isn't a blocker, I just feel the urge to mention this early due to past scars.

leighmcculloch · 2025-03-03T00:02:50Z

allow Galexie to accept pluggable datastores

An idea that was shared a couple years ago with the RPC was to make it contain a pluggable functionality¹ interface as well. Back then one of the options we discussed was using Wasm, and I think that might be viable option here too, albeit with similar performance concerns as above that'd need evaluating. Go as of 1.21 has support for WASI² and as of 1.24 supports reactor³ wasi apps that make it easier to build pluggable interfaces across a Wasm API boundary.

sreuland · 2025-03-03T21:16:50Z

I don't think what you're proposing here is exactly the same design because I think you're suggesting a long running process, so maybe doesn't suffer the same problems.

yes, the design was assuming the o/s process for the remote/pluggable datastore takes on some type of continuous/longer running timeframe, aligned to the client app process if it spawned it as child process, or if it's just a standalone micro-service then always up.

An idea that was shared a couple years ago with the RPC was to make it contain a pluggable functionality1 interface as well. Back then one of the options we discussed was using Wasm, and I think that might be viable option here too, albeit with similar performance concerns as above that'd need evaluating. Go as of 1.21 has support for WASI2 and as of 1.24 supports reactor3 wasi apps that make it easier to build pluggable interfaces across a Wasm API boundary.

cool suggestion, just for high level confirm of this approach, it would enable the existing compiled Go process(such as Galexie) at runtime to load a WASM file compiled to WASI standard format from any programming language that has tooling support for WASM/WASI compilation, and then verify it supports a supplied app interface spec?

mollykarcher added the cdp-horizon-scrum label Nov 12, 2024

mollykarcher added this to the platform sprint 54 milestone Nov 12, 2024

mollykarcher added this to Platform Scrum Nov 12, 2024

github-project-automation bot moved this to Backlog in Platform Scrum Nov 12, 2024

urvisavla modified the milestones: platform sprint 54, platform sprint 53 Nov 12, 2024

sreuland modified the milestones: platform sprint 53, platform sprint 54 Dec 3, 2024

urvisavla self-assigned this Dec 3, 2024

mollykarcher modified the milestones: platform sprint 54, platform sprint 55 Jan 3, 2025

tamirms modified the milestones: platform sprint 55, platform sprint 56 Feb 4, 2025

sreuland mentioned this issue Feb 12, 2025

Document DataStore exported ledger metadata file format #5577

Open

mollykarcher modified the milestones: platform sprint 56, platform sprint 57 Feb 25, 2025

mollykarcher removed this from the platform sprint 57 milestone Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spike] pluggable datastores for CDP #5526

[spike] pluggable datastores for CDP #5526

mollykarcher commented Nov 12, 2024

sreuland commented Dec 2, 2024 •

edited

Loading

leighmcculloch commented Mar 2, 2025

leighmcculloch commented Mar 3, 2025

sreuland commented Mar 3, 2025

[spike] pluggable datastores for CDP #5526

[spike] pluggable datastores for CDP #5526

Comments

mollykarcher commented Nov 12, 2024

What problem does your feature solve?

What would you like to see?

sreuland commented Dec 2, 2024 • edited Loading

leighmcculloch commented Mar 2, 2025

leighmcculloch commented Mar 3, 2025

sreuland commented Mar 3, 2025

sreuland commented Dec 2, 2024 •

edited

Loading