|Yes sorry this is a known error I haven't been able to work on but will soon. |
There are a few problems which I haven't been had time to work on with the online version.
Firstly, the network has to play itself for quite a few iterations to be able to get anywhere (haven't done this yet)
Secondly, the way the network learns is by replicating the winner. This means the network does not know if it made an incorrect move because I don't tell it which moves are incorrect, I only tell it which moves are optimal. And this is because I wanted to whip up a quick demonstration.
I was hoping that more people playing would cover every aspect and teach it which moves lead to a win but not enough training has occurred so far.