This project experiments with various uses of a reinforcement machine learning algorithm called Q-learning. A common (and simple) Q-learning example is called the Frozen Lake example. This example consists of a grid of tiles. The first tile is the start tile and the last tile is the goal tile. Each of the remaining tiles are either a frozen (safe) tiles or a hole (unsafe) tiles. An agent is placed on the start tile and has the goal of reaching the goal tile without stepping on any holes. This is a simple problem where the Q-learning algorithm can be implemented. For a given tile the agent can go in any of four directions; left, up, right or down. A Q-table is a matrix of values where each row represents a tile (state) and the columns represent the possible direction (actions). An example of a two by two frozen lake and its Q-table are demonstrated in Figure 1.
Each element in the Q-table is initiated to zero. Thus for any given state the agent has an equal probability of taking any of the four actions. The agent uses the Q-table to guide it through the frozen lake map. If the agent falls in a hole it goes back to the start tile and try again. If the agent finds it’s way to the goal, it is given one point for that iteration of the Q-table. Thus the probabilities of the Q-table start to changed depending on the state and action. Once the agent had attempted to complete the map a certain number of times (episodes) it creates a Q-table which gives higher probabilities to safe action and lower probabilities to unsafe actions and thus the agent solves the map.
For this project I built an interactive website where users can build a frozen lake map and train the agent on the map. I also began to play around with the idea of using this frozen lake example as a way to visualize data.
For much of this project I used the resources I had at ITP to build an understanding of tf.js and reinforcement Q-learning. I began by following Aiden’s example and creating a simple frozen lake program where the agent can learn the map (Figure 2).
I then added a user interaction to this example. Figure 3 shows an example of this interaction. The user can make a custom map and submit it to be trained by the agent. This example can also be played with at this link.
I then created a webpage where you could choose to make a custom map or to use a data map. For the data map I uploaded my own data about my gym activities. Safe spots on the map were days I went to the gym while unsafe spots were days that I did not go to the gym. The idea was that the agent would have to navigate through my data to get to the goal. The more I went to the gym, the easier it would be for the agent to find the goal. This was a simple experiment to explore how data could be implemented into this example. I would like to explore this concept further in future iteration of the application.
Embedded below is the final webpage. This example can also be viewed here.
While this project acted as an exploration into q-learning and its potential implementations I would like to go much further. One feature that I would like to make more clear in a future iteration is the training process of Q-learning. While I attempted to do this and was uncessful, I would like to try to visualize the Q-table as the agent is training. This would more clearly show the user how reinforcement learning works and example this algorithm to users who have never heard of it before. Mostly, I would like to further explore this example and a way to visualize data. The gym data set was a good start, but I would like to see how I could improve on this concept. First and foremost, I would need to implement a neural network in order to allow the frozen lake table to improve its resolution. Then I could perhaps visualize larger amounts of data and introduce multiple agents. Perhaps this visualization could turn into a habit motivation game where you keep track of a habit and your agent “friend” gains lives if it is able to reach the goal. Overall, this project was a great first step into exploring these q-learning implementations. I would definitely like to continue working on the data side of this project in order to build an interesting visualization.
Links to project: