Resolve a few preprocessing/training bugs #394

sammlapp · 2021-11-10T21:26:29Z

This branch addresses #327, #372, #392.

It makes one api change:

shape arguments are now (height, width) instead of (width, height) in Spectrogram.to_image, SpecToImage action, and Preprocessor classes. (while h,w is the convention for images, w,h or rows,columns is the convention for all arrays/tensors in numpy, torch, etc. Spectrograms and CNN samples are really arrays, though we visualize them as images.)

It also adds code to CnnPreprocessor that automatically drops duplicate index values from overlay_df. It is possible that a user would intend to supply duplicated values in overlay_df index (for instance, to change the representation of samples) and this change will silently delete those duplicates. The reason was to resolve issue #392: when there was a duplicated index in overlay_df, getting the labels for that index resulted in a 2d array instead of 1-d array in ImgOverlay action with update_labels=True. While one option would be to only select one of the rows, or randomly select a row, there isn't an obvious "correct" behavior when the rows have different values - so instead I decided to enforce that the index of overlay_df is unique. Since CnnPreprocessor hard codes many decisions about preprocessing, I felt comfortable adding a line that automatically drops rows with duplicated indices in an overlay_df passed to CnnPreprocessor.init().

#372 was a bug caused by the assertion on overlay_weight: simply needed to rewrite the input validation to allow either a float or a range like [0.1,0.8]

#327 was half resolved already. I added an error message for the type of IndexError in SafeDataset that should never actually happen: indexing past the end of self.df

sammlapp added 8 commits November 8, 2021 17:20

resolve #372

a4fbf1d

change image shape argument order. now (height, width)

a20a115

Merge branch 'develop' into issue_preprocess_train

b710be7

Merge branch 'develop' into issue_preprocess_train

085c03e

more informative index errors #327

81b91a8

fix bug #392 update_labels bug when overlay_df index not unique

f6ef644

remove print statement

f5af694

remove commented code

38630df

sammlapp merged commit fb9e171 into develop Nov 11, 2021

sammlapp deleted the issue_preprocess_train branch November 11, 2021 12:39

sammlapp linked an issue Nov 22, 2021 that may be closed by this pull request

Opso tutorial batch size #393

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve a few preprocessing/training bugs #394

Resolve a few preprocessing/training bugs #394

sammlapp commented Nov 10, 2021 •

edited

Loading

Resolve a few preprocessing/training bugs #394

Resolve a few preprocessing/training bugs #394

Conversation

sammlapp commented Nov 10, 2021 • edited Loading

sammlapp commented Nov 10, 2021 •

edited

Loading