Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow jobs to be run in a different project #1181

Open
withnale opened this issue Mar 21, 2025 · 0 comments · May be fixed by #1180
Open

Allow jobs to be run in a different project #1181

withnale opened this issue Mar 21, 2025 · 0 comments · May be fixed by #1180
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API.

Comments

@withnale
Copy link

withnale commented Mar 21, 2025

Is your feature request related to a problem? Please describe.

It would be ideal that python-bigquery-sqlalchemy allowed for a separation between GCP projects where the data lives in one project and the jobs are run in a different projects. Often the datasets live in one project which is notionally owned by a different team. Allowing for this provides:

  • better isolation of concerns and therefore a better understanding of security roles and permissions
  • a clear separation of usage where the visualisation cost of a product (namely BQ slots) can be clearly recognised and budgeted for
  • simplified incident resolution in case of 'slowness' where BQ backend usage can be correlated to a given product instance

This has mainly come from using superset, where it heavily relies on sqlachemy for its support for different data platforms.

Describe the solution you'd like

Ideally, there is a mechanism where a job-project-id has been attached as an optional engine parameter and if present, it will use this in preference to the project_id present in the bigquery:// URL.

There are two list_dataset() calls in the codebase at present which do not specify the target project. By passing the project explicitly most of the work is done.

Describe alternatives you've considered

The superset issue here apache/superset#32789 outlines the work I have attempted to try to hotwire this logic into superset. However all of this logic feels very brittle and is unlikely to work reliably. The scope of the changes required in superset would be significant.

Also, implementing it within the library makes it available not only to superset but to any other project that relies on sqlalchemy to support pluggable data backends.

I've attached a PR for discussion since it's a pretty small change.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Mar 21, 2025
@yirutang yirutang removed their assignment Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants