Performance
We want games to run fast. This is true whether using statistical forward planning techniques such as MCTS, or Reinforcement Learning (once we get around to implementing a clear interface for this). The ForwardModel.next() and GameState.copy() methods are called multiple times, and should not be too slow. However, the usual software engineering adage of write a good solution first, and only optimise for performance when you have a problem, still holds.
To quote Donald Knuth
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
As a guide to times for the next and copy methods, compare the games outlined below (an exhaustive list as of January 2021):
Game | Copy (μs) | Next(μs) |
---|---|---|
Diamant | 6 | 3.3 |
Dominion | 29 | 6.2 |
Exploding Kittens | 25 | 4.2 |
Colt Express | 18 | 2.8 |
Virus | 18 | 5.9 |
Uno | 14 | 1.0 |
Dots and Boxes | 57 | 1.7 |
Love Letter | 7 | 1.6 |
These from a rather old desktop computer, but give rough order of magnitude. They are medians calculated with Random agent play using GameReport. Only bother optimising if your game differs significantly from these examples; and it will obviously always be dependent on the game complexity and number of mutable components that need to be copied/updated.
Using lambdas, method references and streams
One point of contention has been the relative performance of lambdas, method references and streams in code. See for example here, or here.
Out of curiosity we ran some rough-and-ready benchmark tests to see how much we should care. Firstly, we generated a list of 10,000 random integers and summed it using the three distinct implementations below:
double useMethodReference(List<Number> input) {
return input.stream().mapToDouble(Number::doubleValue).sum();
}
double useLambda(List<Number> input) {
return input.stream().mapToDouble(i -> i.doubleValue()).sum();
}
double useUglyJava(List<Number> input) {
double retValue = 0.0;
for (Number n : input) {
retValue += n.doubleValue();
}
return retValue;
}
The table below reports the results in microseconds. Several iterations were run so that initial transients as the JVM warmed up could be accounted for.
Iteration | Simple Java | Method Ref. | Lambda |
---|---|---|---|
1 | 736 | 6445 | 1401 |
2 | 419 | 167 | 159 |
3 | 110 | 151 | 147 |
4 | 82 | 149 | 148 |
5 | 81 | 152 | 145 |
6 | 82 | 142 | 154 |
This shows that once the JVM has optimised (at runtime) for the code patterns, then the non-stream approach is 40-50% faster. It is also noticeable that the java code ‘warms up’ faster, implying that the JVM has to do less funky runtime optimisation - which makes sense. This latter point is not that relevant, as in this context it is only bits of code called frequently enough to be fully runtime-optimised that we should really consider optimising. This test is highly abstract and does not do anything as complex as real game code.
Secondly, we therefore converted the core MCTS algorithm to use/avoid streams completely in three functions called multiple times in each iteration - namely to count the number of node visits, and the value of a node, and find all unexpanded nodes.
// Firstly two plain java functions
private double actionTotValueV1(AbstractAction action, int playerId) {
double retValue = 0.0;
for (SingleTreeNodeNoStreams node : children.get(action)) {
if (node != null)
retValue += node.totValue[playerId];
}
return retValue;
}
private List<AbstractAction> unexpandedActionsV1() {
List<AbstractAction> retValue = new ArrayList<>();
for (AbstractAction action : actionsFromState) {
if(children.get(action) == null)
retValue.add(action);
}
return retValue;
}
// and then their streamed/method referenced versions
private double actionTotValueV2(AbstractAction action, int playerId) {
return Arrays.stream(children.get(action))
.filter(Objects::nonNull)
.mapToDouble(n -> n.totValue[playerId])
.sum();
}
private List<AbstractAction> unexpandedActionsV2() {
return actionsFromState.stream().filter(a -> children.get(a) == null).collect(toList());
}
We then ran comparisons on two games.
- Dots and Boxes with 1s of time per move, OSLA opponents, 4 players, Closed Loop MCTS, and no rollouts or redeterminisation. The lack of rollouts and using Closed Loop means that we do very little work in the forward model, and hence the computational work is concentrated in the MCTS algorithm itself.
- Love Letter with 2s of time per move, random opponents, 4 players, Open Loop, Information Set MCTS but again with no rollouts.
Game | Stream Iter. | No Stream Iter | Gain |
---|---|---|---|
DotsAndBoxes | 3600 | 4100 | 10-15% |
LoveLetter | 16000 | 18000 | 10-15% |
The reported figures are median MCTS iterations conducted within the computational budget. The mean figure is warped by outliers of a small number of iterations (while the JVM warms up) and huge number of iterations at the end of a game, when we can easily run hundreds of thousands of iterations to decide on the final move. The median is robust to these outliers.
Conclusion
In this case we decided to convert the four relevant MCTS functions to the non-stream version. This is because even a 10% performance increase in a core method used by every game may be helpful. However, in general, using a more functional, stream-based style benefits from:
- clearer code
- more concise code
- less error-prone code
And outside of this quite specific core the decision would have gone the other way. So, knock yourself out with streams and lambdas and only bother optimising if there may actually be a problem.