Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve a few preprocessing/training bugs #394

Merged
merged 8 commits into from
Nov 11, 2021
Merged

Conversation

sammlapp
Copy link
Collaborator

@sammlapp sammlapp commented Nov 10, 2021

This branch addresses #327, #372, #392.

It makes one api change:

  • shape arguments are now (height, width) instead of (width, height) in Spectrogram.to_image, SpecToImage action, and Preprocessor classes. (while h,w is the convention for images, w,h or rows,columns is the convention for all arrays/tensors in numpy, torch, etc. Spectrograms and CNN samples are really arrays, though we visualize them as images.)

It also adds code to CnnPreprocessor that automatically drops duplicate index values from overlay_df. It is possible that a user would intend to supply duplicated values in overlay_df index (for instance, to change the representation of samples) and this change will silently delete those duplicates. The reason was to resolve issue #392: when there was a duplicated index in overlay_df, getting the labels for that index resulted in a 2d array instead of 1-d array in ImgOverlay action with update_labels=True. While one option would be to only select one of the rows, or randomly select a row, there isn't an obvious "correct" behavior when the rows have different values - so instead I decided to enforce that the index of overlay_df is unique. Since CnnPreprocessor hard codes many decisions about preprocessing, I felt comfortable adding a line that automatically drops rows with duplicated indices in an overlay_df passed to CnnPreprocessor.init().

#372 was a bug caused by the assertion on overlay_weight: simply needed to rewrite the input validation to allow either a float or a range like [0.1,0.8]

#327 was half resolved already. I added an error message for the type of IndexError in SafeDataset that should never actually happen: indexing past the end of self.df

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant