Skip to content
This repository was archived by the owner on Feb 28, 2025. It is now read-only.

Silhouette plot improvements #213

Merged
merged 4 commits into from
Jan 25, 2023
Merged

Conversation

neelasha23
Copy link
Contributor

@neelasha23 neelasha23 commented Jan 15, 2023

  1. Removed figsize parameter from clustering tests and replaced baseline images with correct size.
  2. Deprecated silhouette_analysis_from_results function and converted it to an internal function _silhouette_analysis_one_cluster.
  3. Removed silhouette_analysis_from_results example from clustering user guide.
  4. Modified previous tests related to silhouette_analysis_from_results.

Issue ticket number and link

Closes #150

Checklist before requesting a review

  • I have performed a self-review of my code
  • I have added thorough tests (when necessary).
  • I have added the right documentation (when needed). Product update? If yes, write one line about this update.

📚 Documentation preview 📚: https://sklearn-evaluation--213.org.readthedocs.build/en/213/

@neelasha23 neelasha23 changed the title Issue 150 Silhouette plot improvements Jan 15, 2023
@coveralls
Copy link

coveralls commented Jan 15, 2023

Pull Request Test Coverage Report for Build 4009571746

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 28 of 30 (93.33%) changed or added relevant lines in 2 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.1%) to 94.336%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/sklearn_evaluation/plot/clustering.py 21 23 91.3%
Files with Coverage Reduction New Missed Lines %
src/sklearn_evaluation/plot/clustering.py 1 97.58%
src/sklearn_evaluation/plot/precision_recall.py 2 95.92%
Totals Coverage Status
Change from base Build 3989535447: 0.1%
Covered Lines: 2748
Relevant Lines: 2913

💛 - Coveralls

@edublancas
Copy link
Contributor

edublancas commented Jan 16, 2023

silhouette_analysis_from_results was added to be consistent with elbow_curve_from_results (user wants to generate plots from custom model). Do we need to make elbow_curve_from_results internal too?

the elbow curve plot fits the models internally and generates the plot, but in some cases the user might want to train the model themselves and then pass the data for plotting. at first I thought elbow_curve_from_results was doing something similar but it doesn't like like it (and correct me if I'm wrong)

silhouette_analysis iterates over multiple models (with different num of clusters) and generates one plot per model. however, silhouette_analysis_from_results only takes a single num of clusters value and produces one plot, so they're not exactly a match.

I think it's valuable to have silhouette_analysis_from_results, but then it looks like it'd need to be able to take a list of cluster_labels, because it can only take one at the moment. the idea is that I should be able to take some input data, call silhouette_analysis_from_results and reproduce the exact same plot as silhouette_analysis, but currently, I'd need to wrap silhouette_analysis_from_results in a for loop to achieve so

let me know if this explanation is clear!

@neelasha23
Copy link
Contributor Author

Yes it's clear. So, basically, for silhouette plot the user would be interested in comparing plots for different values of number of clusters it would be better to have a function that takes a list of values corresponding to multiple n_cluster values. Since silhouette_analysis_from_results is taking one set of values only as input, it wont be of much use to the end user it's better to be made an internal function.
However for elbow plot, the same plot visualises results of a range of the n_cluster value so it would be useful.

In the PR silhouette_analysis_from_results is deprecated and replaced with an internal one.

@edublancas
Copy link
Contributor

took a look at the code.

I think if you move the for loop over range_n_clusters to the from_results version, it'll work as I described right? I think we should keep it. and maybe move the logic that plots a single model one_cluster private.

@neelasha23
Copy link
Contributor Author

neelasha23 commented Jan 16, 2023

So there'll be two methods now :

  1. silhouette_analysis_from_results (public) : will loop over range_n_clusters and generate each plot
  2. _silhouette_analysis_one_cluster (private) : this will be internally called from silhouette_analysis_from_results with in the for loop for each value of the range_n_cluster

@edublancas
Copy link
Contributor

correct!

and silhouette_analysis should also call _silhouette_analysis_one_cluster

also, let's rename _silhouette_analysis_one_cluster -> _silhouette_analysis_one_model, it's clearer since the plot is for one clustering model as opposed to a single cluster

@neelasha23
Copy link
Contributor Author

I have made the changes discussed above.
@edublancas Please let me know if this looks fine

@edublancas
Copy link
Contributor

can you rebase? I've introduced some changes in the docs so the binder links use your code, but I still see the old version.

you can do:

git rebase -i master

# do the rebase process

git push --force

if you need help with the rebase command let me know

@neelasha23
Copy link
Contributor Author

I did the rebase but I'm not sure if anything got updated. Below is what I see:

Everything up-to-date
(sk-eval) neelashasen@192 sklearn-evaluation % git rebase -i master
Successfully rebased and updated refs/heads/issue_150.
(sk-eval) neelashasen@192 sklearn-evaluation % git push origin pr_curve --force
Everything up-to-date

There weren't any conflicts

@edublancas
Copy link
Contributor

did you pull from master first?

what's the output of this?

git checkout master
git pull

@idomic
Copy link
Contributor

idomic commented Jan 20, 2023

Were we able to solve the issue? Is it ready for merge?

@edublancas
Copy link
Contributor

we were, I'll review the code now

cluster_labels = kmeans.fit_predict(X)
cluster_labels.append(kmeans.fit_predict(X))
kmeans = KMeans(n_clusters=5, random_state=1, n_init=5)
cluster_labels.append(kmeans.fit_predict(X))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the _from_results methods (elbow and silhouette) out of this and put them as docstring examples (if we don't have them). then, let's link from here to the docstring so people know they exist. something like:

```{tip}
If you want to train the models yourself, you can use `from_results` to plot.
```

there's a way to link directly to a class definition without a hardcoded link, I think :class:, but I'm unsure if that will work with the markdown format,let's try and see. if it doesn't work, we can use raw rst format: https://jupyterbook.org/en/stable/file-types/restructuredtext.html

Copy link
Contributor Author

@neelasha23 neelasha23 Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think :class: directive works with classes but this one's a function.
I was able to link using heading labels like so

.. _silhouette-analysis-from-results-label:

silhouette_analysis_from_results
--------------------------------
.. autofunction:: sklearn_evaluation.plot.silhouette_analysis_from_results

and then using this label to create a link in the user guide using :ref:. But the tip or note directives are not working along with this reference. Still trying to find the issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: it was an indentation issue. Will push the change

@idomic
Copy link
Contributor

idomic commented Jan 24, 2023

@neelasha23 please fix conflicts

@idomic idomic requested a review from edublancas January 24, 2023 18:35
Neelasha Sen and others added 3 commits January 26, 2023 02:07
fixed axes issue

deprecation added

revert img changes

version update

removed from docs
Added in docs

Review comments
Lint

Empty commit

doc fix
@neelasha23
Copy link
Contributor Author

Fixed the review comments
@edublancas @idomic

@edublancas
Copy link
Contributor

great! I think there is an equivalent :func: which is preferred over the :ref: approach, so let's use that next time!

@edublancas edublancas merged commit 9c4cacc into ploomber:master Jan 25, 2023
@edublancas
Copy link
Contributor

btw, there is no need to include the github link in the changelog, we'll automatically add them with the deployment script

@neelasha23 neelasha23 deleted the issue_150 branch February 16, 2023 06:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

silhouette_analysis plot improvements
4 participants