last update 12.17.01

What in Sam Hell Possessed Me to Waste My Time With This

On a recent blustery autumn day, I discovered that I had suddenly and unexpectedly become borderline competent at the Solitaire Hangman puzzles in GAMES Magazine. My mortality rate had previously been around 50%, but on the day in question I won 12 of 12 games, and, the egomaniac I am, completely bought into their praise ("If you can solve 8 or more, you're either psychic or have a remarkable gift for words"). To determine the extent of this fluke, I scrounged up some back issues in search of more puzzles, whereupon I embarked on a 32-game winning streak finally to be broken by MUSKRAT. (My guesses were, in this order, AEOIUDNG. What I was I thinking, not guessing S, T or R before the other consonants? I blame poor nutrition.)

So that got me wondering if it were possible to achieve a 100% win rate. I set out to write a program to answer this very pressing question.

The Rules

One player selects a word and draws an underline or box for every letter in the word. The second player then declares a letter believed to be in the word. If the declared letter is in the word, the first player reveals all occurrences of the letter. If the declared letter is not in the word, it is a "miss." Misses are tallied by drawing stick figure parts: on the first miss, the first player draws a head; on the second miss, a body; on the third through sixth misses, arms and legs.

As kids, we'd also draw hats, hair, ears, facial expressions, hands, fingers, shoes, genitals, etc. But enlightened adults can't subscribe to the naive idealism of such soft-on-crime policies. No, we cannot. There are rules now, and the rules are that the guesser has exactly six misses. No more. So: Head. Body. Two arms. Two legs. Six. Boom, you're finished. Or perhaps head, body, two arms, one leg and one genital unit. But THAT'S IT. DONE. If the second player misses six times, he is "hanged" and the first player wins. If the second player reveals all the letters in the word, he escapes execution and is declared the victor.

The Algorithm

The program's algorithm is pretty straightforward. For a given wordlist, the program determines how many valid "patterns" each letter has. For example, if it were to start with the letter A, it would find patterns such as:

[A---, **]
[-A--, **]
[-A-A, **]
[----, A]

(The last pattern means "The word does not have an A.")

The program then filters the existing wordlist through that pattern to form a new list. For example, after filtering a wordlist through the pattern:


[-A--, **]

...the new wordlist would contain all entries from the old wordlist that had the letter A in second position, such as:

EATS
JAZZ
PAST
SAIL
TAKE

With this new wordlist, the program again determines all valid patterns of each letter, filters the wordlist into another wordlist, and so on. This continues until either the number of misses totals six (hanged), or until the length of the wordlist is filtered down to one (solved).

I added some features to improve speed, such as ending the search for the best letter as soon as a letter is found that solves all of the words. The program also aborts the analysis of a letter as soon as its performance is known to be inferior to a previously analyzed letter.

Despite these features, all that looping and recursion takes far too long. So I instead analyzed only the seven most frequently hit letters at each step. (This does not mean I only analyze the seven most frequently found letters in the English alphabet. For example, when analyzing the pattern -UE--, the program would look at the filtered wordlist and discover that Q is one of the most common letters in the wordlist.) Will this shortcut miss a key strategy? Probably not. In all of the observations I've made, the best letter was one of the four most common letters.

The Dictionary

For this program, I used the Enable dictionary found at this URL: http://www.puzzlers.org/secure/wordlists/dictinfo.html

Summary

Against a 2-letter word, you should start with A. Against 8, 13, 14, and 15 letter words, you should start with I. Against all else, start with E.

I can't seem to discern any pattern one should follow after the first guess. I'd initially theorized that you should choose vowels until the word is filled out a bit, then switch to consonants, but this doesn't appear to be a hard and fast rule. It does seem that the first three guesses of longer words are usually in the subset {AEIORSTLN}.

I wonder what the missed 9-letter and 11-letter words are.



Win Probability

Number of lettersNumber solved/possibleWin probabilityFirst guess
244 out of 960.458333A
3321 out of 9720.330247E
41793 out of 30730.583469E
55921 out of 86360.685618E
613121 out of 153230.861410E
721809 out of 231090.943744E
828249 out of 284200.993983I
924871 out of 248730.999919E
1020302 out of 203021.000000E
1115503 out of 155040.999935E
1211358 out of 113581.000000E
137827 out of 78271.000000I
145127 out of 51271.000000I
153192 out of 31921.000000I

For Poop's Sake, The Strategy Files Already

Okay! Click here for stuffed Mac-formatted textfiles, and here for zipped Unix or Windows textfiles.

Or you can look at individual strategies here.

And here's the program's source code. The current version of the code solves the 10-, 12-, 13-, 14-, and 15-letter games very quickly. The other games require more time; the 9-letter game took upwards of 15 hours on a P3/600 machine.

Feeble Explanations

To use the strategy files, search, either manually or with a word processor, for a situation, using the format: [word pattern, misses]. For example, let's say you're at the start of a six-letter game. Since you haven't made any guesses yet, the pattern you want to search for is [no pattern, no misses] which looks like this:

**, **

The search will return the following entry:


0. [**, **]    [E]    [13121/15323]

This entry means that the first letter you guess should be E, and by doing so, and by continuing to follow the correct strategy, you should be able to solve 13121 out of 15323 words.

Let's say you guess E, and it misses. You should now do a search for:


**, E

The search will return the following entry:

  1. [**, E]    A    [5043/5905]

This means that you should now guess A, and by doing so you can expect to solve 5043 out of 5905 words. So you guess A, and an A is revealed in the fourth position. You should do a search for:

---A--, E

The search will return the following entry:

    2. [---A--, E]    S    [347/355]

This means that you should now guess S, and by doing so you can expect to win 347 out of 355 times. And so forth.

Eventually, you will come to an entry followed by a wordlist. For example:


            6. [-O-AR-, EIS]    
             BOYARD
             COWARD
             DOTARD
             HOLARD
             NONART
             NOTARY
             TOWARD
             VOTARY
             ZONARY

(Note that missed letters are arranged in alphabetical order.)

I set the program up to simply print the wordlist once it contains fewer than 12 words, leaving the user to his or her own devices. I won't go into the boring reasons for presenting strategy in this halfassed manner. Looks like in this case, D might be the best choice, then N (if D is a miss) or T (if D is a hit).

If you follow the strategy but lose the game, a pattern search will yield a list of words that might have defeated you. For example, let's say you've lost a five-letter game, ending with the following pattern:


---AN-, CEGIRS

A search for that pattern will return the following entry:


                8. [--AN-, CEGIRS]    
                 BLAND
                 BLANK
                 FLANK
                 LLANO
                 PLANK
                 PLANT
                 QUANT
                 THANK

Frequently Volleyed Ridicule, Frequently Proffered Shrugging

Q1a. What good is this strategy if my opponent knows I use it and so only chooses words that will cause the strategy to fail?

A1. If you were to always go Rock in Roshambo, it wouldn't take long for people to figure out how to beat you. Similarly, if you blindly follow a hangman strategy that has less than a 100% success rate, you're vulnerable to exploitation. Take this example:


            6. [A-O-, EINS]  
             AGOG
             AHOY
             ALOW
             AMOK
             APOD
             ATOM
             ATOP
             AVOW
             AWOL

The best choice is W, then if you miss, P. But if you always follow this strategy, and if your opponent knows that you always follow this strategy, then your opponent will always use AGOG, AHOY, AMOK, or ATOM and will always beat you. The way to defend against this is with a "mixed strategy", which basically means you should guess other letters some of the time.

A game-theoretically optimal strategy would dictate an exact guess frequency for each letter. For example, the following strategy would result in a win 40% of the time:

The first guess should be W 40% of the time, P 30%, M 30%. If first guess is a hit, the game is solved. Otherwise: if W was the first guess, the second guess should be P 10% of the time, M 10%, G 40%, H 40%. If P was the first guess, the second guess should be M 20% of the time, G 40%, H 40%. If M was the first guess, the second guess should be P 20% of the time, G 40%, H 40%.

However, if you only mix your strategy late in the game, as in the above example, your opponent will still have a huge edge over you. You may need mix it up at all stages, possibly even from the very first guess. For example, instead of guessing E, guess A, I, or O. But then where to go from there? The strategy files don't instruct how to proceed except for when the initial guess is E!

True enough. For now, that information isn't available. I could fancy up the program to develop optimal strategies given a particular first guess, after which I could invite one of my poker guru friends to provide some instruction on the proper way to mix strategy. If I sense any interest, I'll consider furthering the analysis in this manner.

To reiterate: mixing your strategy is necessary only if the strategy is imperfect. If, for example, your opponent chooses a 12-letter word, then adhering to the strategy is a sure win, so there's no reason to stray from it.



Q2a. What if the wordmaster only chooses "hard" words?
Q2b. What if the wordmaster won't use words ending in a plural S?
Q2c. What if the wordmaster is an affectatious blowhard who insists on using British words and spelling?
Q2d. The wordlist you use is incomplete/abridged/unofficial/non-standard! You suck!
Q2e. Can you run this program for other languages?
Q2f. The REAL version of Hangman allows for seven misses, not six! What-EVER!

A2. I could run this program for any wordlist, be it truncated, elongated, or otherwise, and can also easily change the number of allowed misses. I've not yet done so, but could be persuaded to. I'd actually be very interested in a wordlist that doesn't have S-pluralized words; if you know where I can get one, send me e-mail.



Q3. No way you did this by yourself.
A3. Guilty as charged. Thanks to Aaron Bos for tech support and cpu!



Q4. <yawn!> When is AlphaCram 2.0 coming out?
A4. Soon!


Send comments to: neoncap@sharkfeeder.com.
Sign my guestbook!