Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create single table BaseProperty class #354

Closed
amontanez24 opened this issue Jun 6, 2023 · 0 comments · Fixed by #364
Closed

Create single table BaseProperty class #354

amontanez24 opened this issue Jun 6, 2023 · 0 comments · Fixed by #364
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

amontanez24 commented Jun 6, 2023

Problem Description

As a developer, it would be easier to make changes to the QualityReport if our code abstractions matched our logical and user facing abstractions. It's hard to add new properties, modify aggregation in existing ones and handle errors now because the results are being collected in a way that doesn't match the desired output.

SDMetrics has a concept of properties. These are a collection of similar metrics that can be used to tell us about one aspect of the synthetic data (eg. column pair trends). The problem is, our code doesn't have this concept. Instead the metrics are all collected and then converted into properties for the user facing output. This causes inefficiencies in the way the metrics are collected (the same columns/tables being looped over multiple times) and makes the code harder to read.

To solve these problems we propose adding a new module called _properties to the reports/single_table folder and creating a BaseSingleTableProperty class.

Expected behavior

Attributes

  • metrics: A list of metrics that make up the property. (This may be unnecessary)
  • _details: A dataframe containing the details of each score and column/table involved. This will be used to compute averages, create graphs and return the details at a higher level.

Abstract methods

  • get_score(real_data, synthetic_data, metadata, progress_bar) - Returns a float that is the average score of all the individual metric scores computed.
  • get_visualization() - Returns a plotly.graph_objects._figure.Figure object.

Additional context

  • Put this base class in its own file. Make sure to name the module with an underscore since it is not intended to be public.
  • The proposal is to store the details as a dataframe. Currently, all of this information is stored in a dict on the QualityReport class called _metric_results. The benefit of storing it as a dataframe is that this is how it is returned to the user in QualityReport.get_details. The issue is that all of the utility functions for plotting the metrics are designed to take in the dict. We might want to investigate if it is worth changing the data structure we use to return the results and ultimately if we should update these plot functions.
  • We will be passing the progress bar created in the reports down to the get_score method so that it can appropriately update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant