-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize field mode #701
normalize field mode #701
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #701 +/- ##
==========================================
+ Coverage 71.32% 71.34% +0.02%
==========================================
Files 40 41 +1
Lines 1747 1752 +5
Branches 259 252 -7
==========================================
+ Hits 1246 1250 +4
- Misses 501 502 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
// a null TableFieldSchema mode can be treated as "NULLABLE", which is the | ||
// default value according to the docs | ||
private def getNormalizedFieldMode(f: TableFieldSchema): String = { | ||
if (f.getMode == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there cases where it returns null? And why hasn't this been encountered previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear to me what scenarios the BigQuery API returns null
vs NULLABLE
, but different implementations treat this differently. For example the GCP console shows "NULLABLE" for both cases
however the bq
CLI doesn't show any mode for nullable fields
<~>-2-> bq show telemetry-playground:staging.federicac_page_aggregate_daily_20240101
Table telemetry-playground:staging.federicac_page_aggregate_daily_20240101
Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Clustered Fields Total Logical Bytes Total Physical Bytes Labels
----------------- --------------------------------- ------------ ------------- ----------------- ------------------- ------------------ --------------------- ---------------------- --------
05 Feb 07:12:41 |- partition_timestamp: integer 15518727 3298969346 06 Mar 07:12:41 3298969346 3815264611
|- client_id: string
|- os_name: string
|- correlation_id: string
|- user_id: string (required)
|- device_type: string
|- page_id: string
|- navigational_root: string
|- app_version: string
|- foreground_time: integer
|- page_id_count: integer
<~>-> bq show foreground-aggregates:page_interval_daily_aggregate_sample.page_interval_daily_aggregate_sample_v1_20240101
Table foreground-aggregates:page_interval_daily_aggregate_sample.page_interval_daily_aggregate_sample_v1_20240101
Last modified Schema Total Rows Total Bytes Expiration Time Partitioning Clustered Fields Total Logical Bytes Total Physical Bytes Labels
----------------- --------------------------------- ------------ ------------- ----------------- ------------------- ------------------ --------------------- ---------------------- --------
05 Feb 09:57:02 |- partition_timestamp: integer 15518727 3298969346 02 Jan 22:29:20 3298969346 2290031265
|- client_id: string
|- os_name: string
|- correlation_id: string
|- user_id: string (required)
|- device_type: string
|- page_id: string
|- navigational_root: string
|- app_version: string
|- foreground_time: integer
|- page_id_count: integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although I'm surprised this is the first time something like this has come up, nullable fields are very common in BQ, so I would expect this to have come up earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to merge this to unblock, can we at least open a ticket with google about this. I don't believe we should litter the codebase with google api semantics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idreeskhan I was able to replicate this by creating a BQ table via a BQ SQL CREATE query, fields marked as nullable have a null mode in the BQ Java API rather than the correct value of NULLABLE.
I think BQ SQL create queries are uncommon enough that it makes sense this is the first time the issue has come up.
ratatool-scalacheck/src/main/scala/com/spotify/ratatool/scalacheck/TableRowGenerator.scala
Outdated
Show resolved
Hide resolved
// a null TableFieldSchema mode can be treated as "NULLABLE", which is the | ||
// default value according to the docs | ||
private def getNormalizedFieldMode(f: TableFieldSchema): String = { | ||
if (f.getMode == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to merge this to unblock, can we at least open a ticket with google about this. I don't believe we should litter the codebase with google api semantics
ratatool-diffy/src/main/scala/com/spotify/ratatool/diffy/BigDiffy.scala
Outdated
Show resolved
Hide resolved
fe7d960
to
60f6555
Compare
GCP docs mention that "NULLABLE" is the default value for getMode, however the field is marked as optional, and sometimes returns null, which I believe we can treat the same as "NULLABLE" for the sake of field comparisons