Skip to content

Commit b8e41b2

Browse files
committed
Added tutorial conversion ML-Python
1 parent 324e964 commit b8e41b2

38 files changed

+993
-195
lines changed

content/courses/python-high-performance/_index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ For this tutorial, it is assumed that you have experience with programming in Py
2828

2929
To follow along for the [Serial Optimization](#serial-optimization-strategies) and [Multiprocessing](#multiprocessing) examples, you can execute the code examples on your own computer or on UVA's high-performance computing cluster. Examples described in the last section, [Distributed Parallelization](#distributed-parallelization), are best executed on UVA's high-performance computing platform.
3030

31-
If you are using your local computer for your personal applications, not related to work, you can install the Anaconda distribution (<a href="https://www.anaconda.com/distribution/" target="balnk_">download</a>) to run the code examples. Anaconda provides multiple Python versions, an integrated development environment (IDE) with editor and profiler, Jupyter notebooks, and an easy-to-use package environment manager. If you will or might use the installation for work, or just prefer a more minimal setup that you can more easily customize, we suggest Miniforge (https://github.com/conda-forge/miniforge).
31+
If you are using your local computer for your personal applications, not related to work, you can install the [Anaconda](https://www.anaconda.com) distribution to run the code examples. Anaconda provides multiple Python versions, an integrated development environment (IDE) with editor and profiler, Jupyter notebooks, and an easy-to-use package environment manager. If you will or might use the installation for work, or just prefer a more minimal setup that you can more easily customize, we suggest Miniforge (https://github.com/conda-forge/miniforge).
3232

3333
**If you are using UVA HPC, follow these steps to verify that your account is active:**
3434

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: Machine Learning for Python
3+
date: "2022-06-09T00:00:00"
4+
type: docs
5+
weight: 1
6+
7+
8+
menu:
9+
python-machine-learning:
10+
---
11+
12+
In this tutorial we will be covering the following topics:
13+
* Overview of Machine Learning
14+
* Decision Trees
15+
* Coding Decision Trees
16+
* Random Forest
17+
* Coding Random Forest
18+
* Overview of Neural Networks
19+
* Coding Neural Networks
20+
* Tensorflow/Keras
21+
* Coding Tensorflow
22+
* PyTorch
23+
* Coding PyTorch
24+
* Overview of Parallelizing Deep Learning
25+
* Coding
26+
27+
As mentioned above, example codes will be provided for respective topics. Prior experience with the Python programming language and some familiarity with machine learning concepts are helpful for this tutorial. Please download and unzip the following file to follow along on code activities.
28+
29+
{{< file-download file="notes/python-machine-learning/code/ML_with_Python.zip" text="ML_with_Python.zip" >}}
Binary file not shown.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
title: Decision Trees
3+
date: "2022-06-09T00:00:00"
4+
type: docs
5+
toc: true
6+
weight: 150
7+
menu:
8+
python-machine-learning:
9+
---
10+
11+
12+
Decision trees are a classification algorithm within supervised learning. The algorithm determines a set of questions or tests that will guide it toward a classification of an observation and it organizes a series of attribute tests into a tree-structure to help determine classification of the unlabeled data.
13+
14+
> Motivating Question:
15+
> Given a set of data, can we determine which attributes should be tested first to predict a category or outcome (i.e., which attributes lead to "high information gain")?
16+
17+
## Simple Scenario
18+
19+
Suppose we have:
20+
* a group of people, each one with a tumor, and
21+
* two measurements (x, y) for each tumor.
22+
23+
Plotting the data, and coloring the points red for malignant tumors and blue for benign tumors, we might see a plot as follows:
24+
25+
{{< figure src=/notes/python-machine-learning/img/pre_decision_plot.png caption="" width=60% height=60% >}}
26+
27+
Clearly, something happens near x=3.
28+
29+
{{< figure src=/notes/python-machine-learning/img/decision_plot.png caption="" width=60% height=60% >}}
30+
31+
With very few errors, we can use x=3 as our "decision" to categorize the tumor as malignant versus benign.
32+
33+
__Resulting decision tree:__
34+
35+
{{< figure src=/notes/python-machine-learning/img/result_decision_tree.png caption="" width=30% height=30% >}}
36+
37+
Unfortunately, it is not always this easy, especially if we have much more complex data. More layers of questions can be added with more attributes.
38+
39+
40+
## Example: What should you do this weekend?
41+
42+
{{< table >}}
43+
| Weather | Parents Visiting | Have extra cash | Weekend Activity |
44+
| :-: | :-: | :-: | :-: |
45+
| Sunny | Yes | Yes | Cinema |
46+
| Sunny | No | Yes | Tennis |
47+
| Windy | Yes | Yes | Cinema |
48+
| Rainy | Yes | No | Cinema |
49+
| Rainy | No | Yes | Stay In |
50+
| Rainy | Yes | No | Cinema |
51+
| Windy | No | No | Cinema |
52+
| Windy | No | Yes | Shopping |
53+
| Windy | Yes | Yes | Cinema |
54+
| Sunny | No | Yes | Tennis |
55+
{{< /table >}}
56+
57+
This table can be represented as a tree.
58+
59+
{{< figure src=/notes/python-machine-learning/img/tree_first.png caption="" width=65% height=65% >}}
60+
61+
This tree can be made more efficient.
62+
63+
{{< figure src=/notes/python-machine-learning/img/tree_second.png caption="" width=50% height=50% >}}
64+
65+
Also with complex data, it is possible that not all features are needed in the Decision Tree.
66+
67+
## Decision Tree Algorithms
68+
69+
There are many existing Decision Tree algorithms. If written correctly, the algorithm will determine the best question/test for the tree.
70+
71+
> How do we know how accurate our decision tree is?
72+
73+
## Decision Tree Evaluation
74+
75+
* A confusion matrix is often used to show how well the model matched the actual classifications.
76+
* The matrix is not confusing – it simply illustrates how "confused" the model is!
77+
* It is generated based on test data.
78+
79+
{{< figure src=/notes/python-machine-learning/img/decision_tree_chart.png caption="" width=70% height=70% >}}
80+
81+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
title: Coding a Decision Tree
3+
date: "2022-06-09T00:00:00"
4+
type: docs
5+
weight: 200
6+
menu:
7+
python-machine-learning:
8+
parent: Decision Trees
9+
---
10+
11+
12+
## The Data
13+
14+
* For our first example, we will be using a set of measurements taken on various red wines.
15+
* The data set is from
16+
* _P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009._
17+
* The data is located at
18+
* [https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv](https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv)
19+
* There are 12 measurements, taken on 1599 different red wines.
20+
21+
## Attribute Summary
22+
23+
{{< figure src=/notes/python-machine-learning/img/attribute_summary.png caption="" width=50% height=50% >}}
24+
25+
__Question: Can we predict the quality of the wine from the attributes?__
26+
27+
## Coding Decision Trees: General Steps
28+
1. Load the decision tree packages
29+
2. Read in the data
30+
3. Identify the target feature
31+
4. Divide the data into a training set and a test set.
32+
5. Fit the decision tree model
33+
6. Apply the model to the test data
34+
7. Display the confusion matrix
35+
36+
37+
### 1. Load Decision Tree Package
38+
```python
39+
from sklearn import tree
40+
```
41+
42+
### 2. Read in the data
43+
```python
44+
import pandas as pd
45+
data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
46+
wine = pd.read_csv(data_url, delimiter=';')
47+
print(wine.info())
48+
```
49+
50+
### 3. Identify the target feature
51+
```python
52+
#Split the quality column out of the data
53+
wine_target = wine['quality']
54+
wine_data = wine.drop('quality', axis=1)
55+
```
56+
For the functions that we will be using, the target values (e.g., quality) must be a separate object.
57+
58+
### 4. Divide the Data
59+
```python
60+
from sklearn import model_selection
61+
test_size = 0.30
62+
seed = 7
63+
train_data, test_data, train_target, test_target = model_selection.train_test_split(wine_data,
64+
wine_target, test_size=test_size,
65+
random_state=seed)
66+
```
67+
68+
### 5. Fit the Decision Tree Model
69+
```python
70+
model = tree.DecisionTreeClassifier()
71+
model = model.fit(train_data, train_target)
72+
```
73+
74+
### 6. Apply the Model to the Test Data
75+
```python
76+
prediction = model.predict(test_data)
77+
```
78+
79+
### 7. Display Confusion Matrix
80+
```python
81+
row_name ="Quality"
82+
cm = pd.crosstab(test_target, prediction,
83+
rownames=[row_name], colnames=[''])
84+
print(' '*(len(row_name)+3),"Predicted ", row_name)
85+
print(cm)
86+
```
87+
88+
## Activity: Decision Tree Program
89+
90+
Make sure that you can run the decisionTree code: `01_Decision_Tree.ipynb`
91+
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

content/notes/python-machine-learning/index.md

-182
This file was deleted.

0 commit comments

Comments
 (0)