Reinforcement Learning and TicTacToe not working?!

02/25/2015 12:55 Shadow992#1
Hello guys,

at the moment I am getting deeper into "Reinforcement Learning"-Topic.
As Ialready know basics of AI-Programming I started with Q-Learning and Neural Nets in AutoIt (just because i wanted to show whats possible with AutoIt). This did not work, so i decided to just get a really basic implementation of Q-Learning with State/Action-Table in AutoIt this also did not work.

I thought it may be some AuotIt specific problems, so I decided to implement an even more basic example in Java and tried again with similiar core code like the AutoIt Version.

In Java I wrote a little helper class which is called "QLearner" and is not much more than a Hashtable with some extended features. QLearner for itself works like a charm (no problem with adding/getting values or finding biggest value for a given state/action pair).

Then I wrote a basic class for TicTacToe board, this is working too.

After that I started implementing a class called "AiPlayer" and "Main".
Main-Class isnt even that complex so there should be no problems too.

AiPlayer itself (except learning) works great too. So it always takes the action with biggest QValue, etc.
But learning itself seems to fail.
I used for learning just the simple approach:

Remember a maximum of X moves.
After each move update the Q-Value of move before current move with reward=0.

If the game ends, do the same as above, so:
After each move update the Q-Value of move before current move with reward = 100 (if won), reward = 0 (if draw), reward= -100 (if lost).

This is so far the theory and it seemed logic and legit to me. But something (maybe a Code-Bug or a Brain-Bug) makes it failing.

The 2 important classes are the following (learning is done in method train() of AiPlayer-Class):

Main.java

AiPlayer.java

I really hope you can help me. I am sure something went wrong with Q-Value calculation or saving old states (even if saving old states itself seems to work).

The learning AI seems not to learn at all (maybe 2-4% but not much more).
I let him do millions of games but learning AI is still loosing too often (my Output):


Thanks in advance. :D

Edit:
If someone wants to understand what I am trying here, have some looks at:
[Only registered and activated users can see links. Click Here To Register...]

Edit:
I really found the problem, there were 3 problems:
1. Finding 3 in a row will take a long time if you are not doing simple checking.
2. My Q-Value calculation were wrong.
3. Loose/Draw/Win reward wasnt perfect

Fixes:
1. Just check also before each move if you can win game with setting next marker
2. Fixed in Code (coming later)
3. Loose-Reward=0, Draw-Reward=0.5, Win-Reward=1


New Code (only AiPlayer.java were changed):

Edit2:
After some more testing i found out that the reward function before (Win=1, Draw=0, Loose=-1) worked as well, maybe even little better. Thats why I changed the edited code again.
02/25/2015 14:47 XxharCs#2
I would like to help you more, but I didn't work with Reinforcement Learning :/ (thought it's an iteresting topic :D)

But I have found one nice example which works with TicTacToe Reinforcement Learning and NN :)
[Only registered and activated users can see links. Click Here To Register...]

Maybe this will help :)
02/25/2015 15:20 Shadow992#3
Quote:
Originally Posted by XxharCs View Post
I would like to help you more, but I didn't work with Reinforcement Learning :/ (thought it's an iteresting topic :D)

But I have found one nice example which works with TicTacToe Reinforcement Learning and NN :)
[Only registered and activated users can see links. Click Here To Register...]

Maybe this will help :)
Thanks for trying to help me but my code was inspired by this example. ;)
At first I tried to implement this code in AutoIt but i thought the code does not work because of different NN-Libs or because of AutoIt-Problems itself.

But because I just wanted to understand Q-Learning itself i started a basic approach in Java, which also failed (thats quite frustrating :D).

Maybe someone knows more about Q-Learning an can help me. :)
Is it possible that Q-Learning cannot learn "good" by playing against Random-Moves-Enemies?
I do not think so but did even not try it yet.

Hopefully someone can help me before i have to try all possible "Problems". :)
02/26/2015 18:22 VisionEP1#4
@edit englisch mit dem handy geht gar nicht klar, verbessere ich wenn ich daheim bin