Open Source Bananagrams Solver in JavaScript, PHP & MySQL

I had some time over the Christmas holidays and decided to find a programmatic solution to a game of Bananagrams… yeah, why not :)… So, I’ve built a variety of MySQL dictionaries from Aspell, HTML and CSS were used to style the page, then coded the backend in PHP and placed a solver logic into JavaScript web worker. Here is a short outline of entire process along with a web site and YouTube demo.

 

BANANAGRAMS INTRODUCTION

Bananagrams is a word game similar to Scrabble, invented by Abraham Nathanson in 2006.  From a bag of 144 lettered tiles, each player randomly picks the predetermined number of tiles (usually 21 letters). The goal of the game for each player is to use all of the given letters by creating the word grid of intersecting or interlocking letters that form valid English words. Players can rearrange their grid as many times they like. The player that uses up all of their tiles first, wins the game. The major skill that is required for this game is the ability to create anagram words formed by rearranging the letters by using all the original letters exactly once.

In general, the unwritten strategy for the game usually follows the following two simple rules:

  • Build the longest possible initial word
  • Keep the open layout structure, don’t cram the letters into a small grid

That’s the strategy I also use whenever I play the games. However, Saahil Agrawal and David Kwok of Stanford University in a paper named Bananas for Bananagrams offered another interesting idea on how to approach the problem. Their strategy suggested the use of a heuristic method of assigning a score for each of the words based on its component letters. Essentially they propose to give the less frequent English letters a higher number of points with a notion that starting the grid with letters that are more difficult to play, allows for the more flexible use of letters that are easier to play. Something similar is a strategy employed by Scrabble players, who create words with the highest possible combined letter value.

 

DEMO

The finished version of the Bananagrams Solver is online at: http://www.BananagramSolve.com

Here is a minute long video demo:

Looking at the final solution, it is not exactly ideal, has a long way to go, but

SOURCE CODE

Source Code for this little app is released under GNU Affero General Public License v3.0 on GitHub: https://github.com/JozefJarosciak/Bananagrams_Solver

 

 


 

 

Create MySQL English Word Dictionary

I decided to code the solution by using HTML, CSS, JavaScript on the frontend, PHP in the backend & MySQL as storage and word search engine.

Creating MySQL Dictionary of English Words

First I needed a good dictionary. To build one, I’ve converted a text file containing over 466k English words (found on https://github.com/dwyl/english-words) into MySQL database. I’ve also played with a dictionary of English words that I created from Aspell, here is a short article that describes How to dump and convert Aspell dictionary to wordlist or searchable MySQL/MariaDB database.

Once I had the database of English words in MySQL, I realized, that both dictionaries contain way too many words, some of them so unusual that they would likely never be regularly played by any human player. I needed to be able to select and search only among the words that are recognizable by most people. To resolve the problem, I implemented another database column, that added to each of the words their rank value based on the frequency in which the word occurs in a typical English text. You can read more about a word frequency on https://www.wordfrequency.info, who also show some examples of frequency lists based on the COCA corpus.

Once done, I had a list of English words and their associated frequency rank, which I could search by using SQL queries.

The following is my ‘dictionaries’ database and ‘en_english479k’ table definition:

Here is a SQL Dump in case someone wants to replicate it: 484k English Words with Frequency Rank.zip (3.8 MB Zip)

Search Query

MySQL Search – Search the longest possible initial word

The initial goal was to create a SQL query that would allow me to search the entire database for a word that contains as many of the letter tiles that I’ve randomly picked from the bag of lettered tiles. It’s somewhat of a tricky exercise because the query must also be capable of selecting words in situations where certain letters occur more than once, yet capable of finding words that might or might not contain any of the letters.

Ultimately, I’ve settled on using REGEX and combination of filters that used a MySQL BETWEEN condition, with order based on ranking the words by their total length (longer the better) and English word frequency rank. This is what the final SQL query looks like for a randomly selected letter sequence:

ANNEAGANRIT‘.

5 STEP MySQL Query

  1. First, build a Regex (1) based on the given lettered tiles.
  2. Then filter letters by using BETWEEN condition (2) for each character, defining the minimum and the maximum number of times a character can occur in the word. This part of the SQL will need to be generated programmatically, by counting the letter occurrence in the word.
  3. Select the allowed minimum and maximum gate for word frequency rank. In this case, I decided to only search my words among the top 50 thousand English words.
  4. Set the minimum word length. We want to solve the Bananagrams board only by using English words that are at least 3-letter long. I found that those solutions are the most readable.
  5. And finally closing the SQL by leveraging ORDER (3), sorting the words based on the descending word length and ascending frequency rank.

Let’s run the query on the randomly selected tiles ‘ANNEAGANRIT’:

As we can see the best result and the longest English word for the randomly selected tiles ‘ANNEAGANRIT’, is the word: ‘ARGENTINA‘.

Note: The importance of using the English word frequency rank is quite obvious when doing a search without the frequency rank, where the solution to randomly selected tiles ‘ANNEAGANRIT’ tiles is a quite obscure English word: ‘ANTENNARIA‘. While ‘Antennaria’ might be a valid English word, most of us would agree that it’s hardly recognizable and not really fit for purposes of building a human-like version of the Bananagrams solver. BTW. I had to look it up :) – Antennaria is a genus of herbaceous perennial plants.

PHP – Function to Find the Longest Word +Remainder of Characters

Now that we have a working SQL that can find the longest recognizable English words based on the given set tiles, we should wrap the SQL query into a PHP function that can be reused.

Let’s create PHP API endpoint that can be called from the JavaScript (or any other type of frontend UI), something like this:

  1. Input: Takes two URL variables, the sequence of characters and search offset.
  2. Result: Provide the longest found English word for the combination of submitted characters as well as the remainder of characters that could be used.

The API for finding words could look like this, where ‘q’ is the sequence of characters and ‘offset’ is the given results with 0 being the first/best word.

api.php?q=ANNEAGANRIT&offset=0

Here is the api.php file I’ve created to meet these requirements:

EXAMPLES:

Let’s test it on an example, by getting the best word for the random characters: ANNEAGANRIT

We’ll call the API like this: api.php?q=ANNEAGANRIT&offset=0

The result: ANNEAGANRIT|AN

This is what happened:

 

If we need to get the 2nd best word, we simply increase the offset by one:

We’ll call the API like this: api.php?q=ANNEAGANRIT&offset=1

The result: EARING|AANT

You get the point. PHP API lets us control which result we want to display, allowing us to control the dictionary search the way we want, which will be especially useful when we start to look for Bananagrams solution.

 

PHP – GRID SIZE

In terms of the starting grid size, I am not sure if there is an optimal grid based calculation for n-letters, but I assume for a single direction placement, we shouldn’t need height x width grid ratio larger than the total length of initially provided characters. However, that assumption changes, if we start placing tiles in the exact middle of the grid and stretch the board/grid in all directions (up, down, left, right).

For example, if we start with a 4 characters ‘STBE’, and we can construct the word from all 4 of them, then the X and Y axis in all directions would look like this:

The answer to grid size is not the quadruple (8×8) as one would assume, but rather (7×7) as shown above… At least I think :)

The formula should be:

(n-1)  x  (n-1)

Thus (n-1)  x  (n-1) is also the total number of tile combinations that will need to be constantly checked during the letter placement.

 

Finding the middle of the board should be the crucial part of the exercise, as the two-dimensional JavaScript array cannot extend below zero.

In JavaScript, the grid can be constructed by using a two-dimensional associative array, where X will be the nth element in the outer array and Y will be the nth element in the inner array:

grid[X][Y];

For example, if the initially given letters are ‘BEST’, then to start placing letters into the exact middle of the grid, we should use X: (n -1)  and Y: (n -1)

In our case start at position X:3 and Y:3

grid[3][3] = “B”;
grid[4][3] = “E”;
grid[5][3] = “S”;
grid[6][3] = “T”;

The final placing on the grid with all other unset positions would then look something like this:

 

 

JAVASCRIPT – GRID CHECK LOGIC

The grid checking is the most active part of the Bananagrams solver. Essentially, as we’re adding words into the grid, with each iteration, we need to check all rows and columns to ensure that each results in a valid English word. If there are multiple words on the row or in the column, we need to ensure they are separated by space(s). If the grid doesn’t pass the test, the word can’t be placed and a different word combination must be tried. We must try all combinations until there are no words left (Bananagrams doesn’t have a solution), or until we find a solution when there are no remaining letter tiles.

Thinking about the process, this is how I’d like to execute it, the following is an approximate idea of the algorithm logic:

 

 

I had the option to develop this logic either on the server side or on the client side, each having certain benefits. On the client side, I can get better messaging as to the state of each iteration, I didn’t want to be calling backend every single time. On the client side, having a single thread available in the browser is a nightmare to deal with, especially because of the browser UI locking. But that can be easily fixed by using the second web worker thread. Web workers as we know, don’t have access to DOM, but they can receive and also post messages from and to the main javascript.

So instead of placing the logic into PHP, I decided to code the login into a web worker script, which I called search.js:

Please note, as I coded the logic, I realized there are some holes in it, especially as I decided to not to use rank, but rather go for the longest word (or whatever is specified as a limit), which seemed like a better choice, but the reality is, I always only check the first word, I don’t go into deeper loops. In any case, the above logic, as much as it can be further improved, seems to be finding the results for me.

Search.js shown above is called from the main javascript.js, where the web worker is initialized:

The HTML file (index.html) for this is somewhat simple:

 

And so is the CSS stylesheet (styles.css):

 

Once everything was done, I’ve built a feature to scrap the word descriptions from online dictionaries. For example, to get a description from Merriam-Webster, I use a simple PHP script: get-description-en.php

And I also added multilanguage support, supporting following dictionaries in English, Czech, Danish, German, Spanish, French, Dutch, Polish, Portuguese and Slovak:

Then, registered a $3 .com domain BananagramSolve.com and this is what the result looks like for one of the queries in English:

 

 

 

If you feel like improving the code a bit, give it a try: https://github.com/JozefJarosciak/Bananagrams_Solver

Enjoy!

 

 

 

Facebook Comments