Skip to content

Commit c66a515

Browse files
committed
update read me
1 parent 8b39f6d commit c66a515

File tree

3 files changed

+13
-12
lines changed

3 files changed

+13
-12
lines changed

README.md

+13-12
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# Q-Learning - Demo Notebook
2-
This repositoy contains a short Demo Notebook on how to implement a Reinforcement Learning agent, which learns to solve an OpenAI Gym environment.
1+
# Q-Learning - Jupyter Notebook
2+
This repositoy contains a Jupyter Notebook with an implemenation of a Q-Learning Agent, which learns to solve the n-Chain OpenAI Gym environment
33

44
This notebook is inspired by the following notebook: [Deep Reinforcement Learning Course Notebook](https://github.com/simoninithomas/Deep_reinforcement_learning_Course/blob/master/Q%20learning/Taxi-v2/Q%20Learning%20with%20OpenAI%20Taxi-v2%20video%20version.ipynb)
55

@@ -10,17 +10,15 @@ Download the repository:
1010
Run the Jupyter Notebook:
1111
`q_learning_notebook.ipynb`
1212

13-
## Introduction to Reinforcement Learning
13+
## Description of the Q-Learning Algorithm
1414

15-
The notebook contains a Q-Learning algorithm implementation and a training loop to solve the N-Chain OpenAI Gym environment.
15+
The notebook contains a Q-Learning algorithm implementation and a training loop to solve the n-Chain OpenAI Gym environment. The below imgage describes the Q-Learning Algorithm (an off-policy Temporal-Difference control algorithm):
1616

17-
## The Q-Learning Algorithm
17+
<img src="/images/Sutton_Barto.png" alt="TicTacToe Environment" width="600"/>
1818

19-
The below imgage describes the Q-Learning Algorithm (an off-policy Temporal-Difference control algorithm):
20-
21-
![Q-Learning](/Sutton_Barto.png)
22-
Q-Learning Algorithm: [Image](http://incompleteideas.net/book/the-book-2nd.html) taken from **Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second edition, 2014/2015, page 158**
19+
Q-Learning Algorithm - [Image](http://incompleteideas.net/book/the-book-2nd.html) taken from **Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second edition, 2014/2015, page 158**
2320

21+
Legend:
2422

2523
- Q: action-value function
2624
- s: state
@@ -35,15 +33,18 @@ Q-Learning Algorithm: [Image](http://incompleteideas.net/book/the-book-2nd.html)
3533
The n-Chain environment is taken from the OpenAI Gym module: [n-Chain](https://gym.openai.com/envs/NChain-v0/): Official Documentation
3634

3735
The image below shows an example of a 5-Chain (n = 5) environment with 5 states. "a" stands for action and "r" for the reward ([Image Source](https://adventuresinmachinelearning.com/reinforcement-learning-tutorial-python-keras/)).
38-
![NChain](/NChain-illustration.png)
36+
<!-- ![NChain](images/NChain-illustration.png)
37+
-->
38+
<img src="/images/NChain-illustration.png" alt="NChain" width="600"/>
39+
3940

40-
### States
41+
### Environment States
4142

4243
This environment contains of a chain with n positions, every chain position corresponds to a possible state the agent can be in:
4344
- state n: position n on the cahin
4445

4546

46-
### Actions and Rewards
47+
### Environment Actions and Rewards
4748

4849
The agent can move along the chain using two actions (for which the agent will get a different reward):
4950
- action 0: move forward along the chain - get no reward
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)