Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalize field mode #701

Merged
merged 1 commit into from
Feb 9, 2024
Merged

normalize field mode #701

merged 1 commit into from
Feb 9, 2024

Conversation

benkonz
Copy link
Contributor

@benkonz benkonz commented Feb 5, 2024

GCP docs mention that "NULLABLE" is the default value for getMode, however the field is marked as optional, and sometimes returns null, which I believe we can treat the same as "NULLABLE" for the sake of field comparisons

Copy link

codecov bot commented Feb 5, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (a42c883) 71.32% compared to head (60f6555) 71.34%.

Files Patch % Lines
...main/scala/com/spotify/ratatool/BigQueryUtil.scala 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #701      +/-   ##
==========================================
+ Coverage   71.32%   71.34%   +0.02%     
==========================================
  Files          40       41       +1     
  Lines        1747     1752       +5     
  Branches      259      252       -7     
==========================================
+ Hits         1246     1250       +4     
- Misses        501      502       +1     
Flag Coverage Δ
ratatoolCli 2.98% <0.00%> (-0.02%) ⬇️
ratatoolCommon 0.00% <0.00%> (∅)
ratatoolDiffy 31.62% <71.42%> (+0.16%) ⬆️
ratatoolExamples 16.00% <28.57%> (+0.01%) ⬆️
ratatoolSampling 62.35% <66.66%> (-0.03%) ⬇️
ratatoolScalacheck 81.48% <66.66%> (-0.24%) ⬇️
ratatoolShapeless 4.30% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// a null TableFieldSchema mode can be treated as "NULLABLE", which is the
// default value according to the docs
private def getNormalizedFieldMode(f: TableFieldSchema): String = {
if (f.getMode == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there cases where it returns null? And why hasn't this been encountered previously

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unclear to me what scenarios the BigQuery API returns null vs NULLABLE, but different implementations treat this differently. For example the GCP console shows "NULLABLE" for both cases
image

however the bq CLI doesn't show any mode for nullable fields

<~>-2-> bq show telemetry-playground:staging.federicac_page_aggregate_daily_20240101
Table telemetry-playground:staging.federicac_page_aggregate_daily_20240101

   Last modified                Schema                Total Rows   Total Bytes     Expiration      Time Partitioning   Clustered Fields   Total Logical Bytes   Total Physical Bytes   Labels
 ----------------- --------------------------------- ------------ ------------- ----------------- ------------------- ------------------ --------------------- ---------------------- --------
  05 Feb 07:12:41   |- partition_timestamp: integer   15518727     3298969346    06 Mar 07:12:41                                          3298969346            3815264611
                    |- client_id: string
                    |- os_name: string
                    |- correlation_id: string
                    |- user_id: string (required)
                    |- device_type: string
                    |- page_id: string
                    |- navigational_root: string
                    |- app_version: string
                    |- foreground_time: integer
                    |- page_id_count: integer

<~>-> bq show foreground-aggregates:page_interval_daily_aggregate_sample.page_interval_daily_aggregate_sample_v1_20240101
Table foreground-aggregates:page_interval_daily_aggregate_sample.page_interval_daily_aggregate_sample_v1_20240101

   Last modified                Schema                Total Rows   Total Bytes     Expiration      Time Partitioning   Clustered Fields   Total Logical Bytes   Total Physical Bytes   Labels
 ----------------- --------------------------------- ------------ ------------- ----------------- ------------------- ------------------ --------------------- ---------------------- --------
  05 Feb 09:57:02   |- partition_timestamp: integer   15518727     3298969346    02 Jan 22:29:20                                          3298969346            2290031265
                    |- client_id: string
                    |- os_name: string
                    |- correlation_id: string
                    |- user_id: string (required)
                    |- device_type: string
                    |- page_id: string
                    |- navigational_root: string
                    |- app_version: string
                    |- foreground_time: integer
                    |- page_id_count: integer

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I'm surprised this is the first time something like this has come up, nullable fields are very common in BQ, so I would expect this to have come up earlier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to merge this to unblock, can we at least open a ticket with google about this. I don't believe we should litter the codebase with google api semantics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@benkonz benkonz Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idreeskhan I was able to replicate this by creating a BQ table via a BQ SQL CREATE query, fields marked as nullable have a null mode in the BQ Java API rather than the correct value of NULLABLE.

I think BQ SQL create queries are uncommon enough that it makes sense this is the first time the issue has come up.

// a null TableFieldSchema mode can be treated as "NULLABLE", which is the
// default value according to the docs
private def getNormalizedFieldMode(f: TableFieldSchema): String = {
if (f.getMode == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to merge this to unblock, can we at least open a ticket with google about this. I don't believe we should litter the codebase with google api semantics

@benkonz benkonz force-pushed the normalize-field-mode branch from fe7d960 to 60f6555 Compare February 7, 2024 19:14
@benkonz benkonz merged commit 7ea7d27 into master Feb 9, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants