Jan 19

2018

How to Get the Most out of Automated Testing—Part 2

Posted by: Rachel E. Towers | Comments (0)

As part of our support for the Choice of Games Contest for Interactive Novels, we will be posting an irregular series of blog posts discussing important design and writing criteria for games. We hope that these can both provide guidance for people participating in the Contest and also help people understand how we think about questions of game design and some best practices. These don’t modify the evaluation criteria for the Contest, and (except as noted) participants are not required to conform to our recommendations–but it’s probably a good idea to listen when judges tell you what they’re looking for.

If these topics interest you, be sure to sign up for our contest mailing list below! We’ll post more of our thoughts on game design leading up to the contest deadline on January 31, 2018.

In our previous blog post, we explained how to use randomtest for debugging, and included some examples, particularly with choice_randomtest. Here we’re going to continue to show how to exercise some of the more esoteric kinds of bugs, then we’re going to include a bit on how to use randomtest to balance our game.

Now, if you have any errors that quicktest or randomtest are able to catch, the tools will run until they find them, then stop and say what the error is. These errors usually say what exactly the problem is, but if you need more detailed help with one, our forum has a category dedicated to already answered questions, or you can post your own questions or problems in the same place if they haven’t already been asked.

There is, however, one relatively rare error for which randomtest won’t display an error message: when randomtest gets caught in an infinite loop. Of course, randomtest does still detect this kind of bug, but since randomtest has no way to determine what is a reasonable amount of time to spend going through a loop, it just continues to run until it is forced to stop. Debugging these kinds of problems can be difficult, but once again automated tools come in handy. Simply make sure “Show choices selected during game” is selected. In most cases you should now see a long string of the same choice(s) being made over and over again, which will tell you right about where the game gets caught in the loop. If it’s difficult to understand though, you can also recreate the playthrough by following the options made (which should reproduce the error in game). Occasionally we won’t see a loop though (and instead options will just suddenly stop), and so we can turn on “Show full text during game” to see if the loop is shown in the text.

Now that our game not only passes randomtest, but is also well on its way to being optimized, we can move on to using some of the features of randomtest for game balancing. One thing we commonly use randomtest for is to simulate a large number of playthroughs. We do this by running a number of ‘seeds’, or playthroughs of the full game made by randomly selecting options when each choice comes up. We have already mentioned line coverage a little in this guide, but now we’re going to actually look at it. To do so, all we have to do is make sure that we tick the box that says “Show line coverage statistics after test” when running the HTML version of randomtest, then, at the bottom of the output, it will display a set of lines that look something like this (This example is taken from the example game when you download ChoiceScript):

variables 10000: *set leadership 10
variables 10000: *set strength 10
variables 10000:
variables 10000: What do you prefer?
variables 10000: *choice
variables 10000: #Leadership
variables 5000: *set leadership +10
variables 5000: *goto action
variables 10000: #Strength
variables 5000: *set strength +10
variables 5000:
variables 10000: *label action
variables 10000: What will you do today?
variables 10000: *choice
variables 10000: #Run for class president
variables 5000: *if leadership > 15
variables 2506: You win the election.
variables 2506: *finish
variables 2494: You lose the election.
variables 2494: *finish
variables 10000: #Lift weights
variables 5000: *if strength > 15
variables 2506: You lift the weights.
variables 2506: *finish
variables 2494: You drop the weights and hurt yourself badly. You never recover.
variables 2494:
variables 2494: *goto_scene death

Looking at this, we see two different types of choices in a very simple form. First, we have What do you prefer?, which is an Establishing Choice (more on establishing choices can be found here), while second, we have What will you do today?, which is Testing Choice (more about types of choices can be be read here). Combined, these two choices are one of the simplest examples of how we use delayed branching. First we choose to either be strong, or we choose to be a leader, then when faced with an option about what we’re going to do, if we pick the one that properly reflects our skill set, we succeed, while if try to do something we have no skill in, we fail.

Having run randomtest, we can see how this actually plays out. Randomtest will select whether the PC specializes in Strength or Leadership, then will randomly select which skill they are going to attempt to use. This means that randomtest has about a 50/50 chance of matching the PC’s preferred skill to the test, and looking at the output, we can see that’s the case. Each of the *ifs are hit 5,000 times (out of 10,000 runs), and each of the successes are displayed a little over 2,500 times. Of course a real game is more complex than this, and once we add a few more Establishing Choices, a couple more variables, and change everything to fairmath, we’re going to lose track of what the ranges our our stats are going to be.

So here are a few rules of thumb to keep in mind. First, if a player is given X stats to choose from, a test with a 1/X success rate for that stat is usually moderately difficult. For example in the above test we had two stats, so a success rate of 1/2 is moderately difficult. If we were to add a third variable (and options) then a success rate of 1/3 would be roughly equivalent. At about double this level of difficulty (again, measuring by how many people actually succeed on the test), any player that puts in any significant effort into raising the appropriate stats should be able to succeed, while at about half of it, and a player has to very specifically raise that stat to succeed at that test.

In addition to that, there is the consideration of the actual number of successes. Any line that is hit around a 100 times in 10,000 seeds, the default number of seeds run, is very difficult. That’s 1% of random players ending up with the right choices. Any lines that are hit around 10 times in the same number of seeds, are exceedingly difficult to reach, and should only be used for are very specific situations, such as singular lines covering rare scenarios, or very rare, special goals, which should virtually always be signposted with an achievement.

Occasionally though we’ll have lines that are never hit. When that happens we have a problem, because we’re unable to confirm whether or not they are at all reachable. Sometimes this is as easy to fix as tweaking some numbers up or down (by giving higher stats to player, or by making the difficulties lower), but occasionally we might find that’s not possible. For example we might be writing a heroic fantasy, and have a scene specifically for clerics that carry swords, but found that it never fires. For this we might insert choice_randomtest into our code (remembering to remove it once we’re done testing), forcing the game to only play clerics that carry swords (of course, remembering in the process that clerics are forbidden from using swords).

There are a couple other ways in which we can use randomtest too. For example, randomtest is how we determine playthrough length. To do this, we run some seeds with the “Show full text during game” option selected, which then shows the word count near the bottom when it’s finished. For a 100,000 word game, we generally expect the average playthrough to be around 20% of the game, or 20,000 words, but as a game’s total word count gets longer, actual playthrough tends to be a smaller percent of the game, although we don’t recommend going below 10% of the total length (otherwise so much of the game is in specific playthroughs, that it will feel short regardless of the total length).

This is of course, not an exhaustive list of uses for our testing tools; we might also use various other tricks, like a series of *ifs with *comments to tell us how many people reach a certain set of code with certain stats in randomtest, or the *bug command, which is documented here, to make quicktest pass, or get more information from randomtest. Maybe you’ll even come up with your own!

Leave a Comment

Your email is never published nor shared. Required fields are marked *

Subscribe by E-mail