WordStuff #3

[Home]   [Puzzles & Projects]    [Delphi Techniques]   [Math topics]   [Library]   [Utilities]

 

 

Search

Search WWW

Search DelphiForFun.org

As of October, 2016, Embarcadero is offering a free release of Delphi (Delphi 10.1 Berlin Starter Edition ).     There are a few restrictions, but it is a welcome step toward making more programmers aware of the joys of Delphi.  They do say "Offer may be withdrawn at any time", so don't delay if you want to check it out.  Please use the feedback link to let me know if the link stops working.

 

Support DFF - Shop

 If you shop at Amazon anyway,  consider using this link. 

     

We receive a few cents from each purchase.  Thanks

 


Support DFF - Donate

 If you benefit from the website,  in terms of knowledge, entertainment value, or something otherwise useful, consider making a donation via PayPal  to help defray the costs.  (No PayPal account necessary to donate via credit card.)  Transaction is secure.

Mensa® Daily Puzzlers

For over 15 years Mensa Page-A-Day calendars have provided several puzzles a year for my programming pleasure.  Coding "solvers" is most fun, but many programs also allow user solving, convenient for "fill in the blanks" type.  Below are Amazon  links to the two most recent years.

Mensa® 365 Puzzlers  Calendar 2017

Mensa® 365 Puzzlers Calendar 2018

(Hint: If you can wait, current year calendars are usually on sale in January.)

Contact

Feedback:  Send an e-mail with your comments about this program (or anything else).

Search DelphiForFun.org only

 

 

 

Problem Description

This is the second (now third) installment of the  word manipulation series of programs. The first Wordstuff1 included a basic set of dictionaries, a dictionary maintenance program and a word completion program called CrosswordHelper.


 

Seven  programs are now  included:

Unscramble – an anagram solver that make one, two, or three-word phrases from a set of scrambled letters.

Decrypt – a cryptogram solver. Cryptograms are messages encoded by replacing each letter value in the message with a simple substitution code. In this type of encrypting, a each letter is replaced with a fixed letter value. I.e. all "a" with "f", all "b", with "k", etc.

Word Completion - (Crossword Helper).  Give it a partial word with unknown letters indicated by space or underscore characters, and it will return a list of all words in it's dictionary which fit the "mask". 

WordLadder - Change "dirt" to "gold"  in four steps.   This program was originally posted as a standalone program and is now also available as part of  Wordstuff.

Scrambled Pie:  Supply the the missing letter to the four letter groups in this "pie" and unscramble the resulting words.

Spellbound:  A word search helper for finding all words embedded in a scrambled set of letters.

WordStuff: A "wrapper" program that lets the user start any of the other six.

Background & Techniques

Unscramble: The trick to solving anagrams on the computer, once we have a dictionary available, is to find letter combinations that match dictionary entries. Rather that generate all possible letter combinations and see which are valid words, it’s generally more efficient to check words in the dictionary and see which contain the letters we’re matching. The complication here is that, in addition to single words, we want to generate 2 and 3 word phrases that use all of the given letters. The user provides minimum and maximum word lengths to search. Within this range we generate a firstwords list of matching words and a corresponding list of remainders, the unused letters. Then, for each of the words in this list, we’ll repeat the search procedure with the remainder list. If we’re looking for three word phrases, we repeat this one more time, making a remainders2 list to check for 3rd words.

A TWords object handles the actual checking of letters against the dictionary. The length range is determined by the max and min lengths specified by the user (except for the final search which uses a length equal to all remaining letters). A "FirstLetters" string is built containing the unique letters in the input letter string –  used as starting points for word retrieval from the dictionary.

Decrypt: Cryptogram solving is another example of trial and error problem solving. There are, however, 26! (26 factorial = 26x25x24…x3x2x1 or about 4X1026) possible substitution codes. Thus it is highly unlikely that we could decrypt the message by trying all codes. But we can use the dictionary to prune the search space significantly. I sort the encrypted words from longest to shortest on the (untested) theory that there are fewer long words than short words in the dictionary, thus we should prune invalid combinations quicker.  The trystring string is a list of the unique letters in order of appearance  this sorted list. Rather than generate a complete substitution code, we can start by creating one of the 26 single letter substitution codes. The order of the letters to translate to comes from the tryorder string. This list is currently defined as the frequency of occurrence of letters in the English language.

Now if we substitute the first letter in tryorder for the first letter in trystring everywhere it occurs and consider all other letters unknown, we basically have the word completion problem. If there is at least one possible dictionary word for each encrypted word, we can choose a second letter from tryorder, try it, then a 3rd letter, etc. If it fails, we just mark that letter as tried and go on to the next. By doing this recursively in the FindNextLetters function, we’ll eventually resolve all of the words or run out of letters to try.

WordStuff: This is an experimental example of creating a "wrapper" program than manages several other programs. It takes care of loading the dictionaries and calling one of three programs: CrosswordHelper, Unscramble, or Decrypt, based on used selections. The code required to allow the same unit to run as a main form or as a secondary form is fairly simple. Each of the 3 programs has a test in its FormCreate method to se if it is the main form (If applicaton.mainform=self). If it is the mainform, the program knows that it is responsible for managing the dictionary itself. Otherwise it can use the pubdic (and possibly privdic) dictionaries defined in the Udict unit and preloaded by the calling program.

Crossword Helper: This is a word completion program implemented and described previously as part of WordStuff1 

Addendum Feb 8, 2003:  I posted a new version of Wordstuff today which incorporates a 4th program (WordLadder) under the Wordstuff wrapper as well as some minor cleanup.  

Addendum April 5, 2003:  A revised version of Decrypt was installed in Wordstuff  today.  This version is a little smarter about deciphering encoded messages.  It builds a list of possible decipherments for long words in the message and tests those letters first.    Remaining letters are tested in the order of frequency of occurrence in the message matched against order of occurrence in the English language.    Message words not in the dictionary are a problem that still needs work.  Manual intervention has been the most effective solution so far, and I did add some "Hints and Tips"  to the Introduction  page.    A few additions were made to the "Full.dic" dictionary based on decryption testing.  The dictionary contained "cryptogram" but "cryptograms" for example.   The dictionary structure probably needs to be modified to better handle the whole problem of word suffixes.  

Addendum July 7, 2004:  In response to a user request, I've modified CrossWord Helper today to accept a list of letters to be excluded from search results.  This was to to help him in deciphering phrases in a computer game called Flip Words.   Once some letters are known, the player may guess the phrase at any time.    Excluding letters known to be in the phrase (but not in the word of interest) can reduce the number of potential answers.  

Addendum August 25, 2005:    A 5th program, Scrambled Pie, previously posted, was added to Wordstuff today.   

Addendum May 11,2006:  An update to Dicmaint was posted today with two changes:

bullet If the 'SaveAs' option was used to save the dictionary as a text file, capitalization information was lost.  Capitalized words are now saved as capitalized and the characters ",C" are appended to match the ",A" and ",F" used to flag abbreviations and foreign words.
bulletA user recent wrote about problems he was having trying to create a dictionary from a file with 350,000 French words.  Dicmaint was taking a very long time to process the file.  It turned out that the file was in Unix format (no Carriage returns) which made it look like a single record 3.7 million bytes in length to any windows standard program.  Dicmaint used a destructive Getword routine - each word was deleted from the record in memory as it was retrieved, a very slow process for a record of that length.  Getword  now extracts words without modifying the record.  It turns out that using the FileFix utility to correct the input file was even more significant it getting run time down to a few seconds.   But I left the new Getword routine in as an improvement anyway.  

Addendum November 17, 2006:  A few enhancements to Decrypt, the decryption function of Wordstuff were posted today.    Users may now selectively use or omit words in the decrypted message during the search for a solution.   Also specified plaintext words may be ignored and not counted as valid words in the deciphered message.    

Addendum December 7, 2006:  A small fix to Dicmaint was made today : When saving a text version of a dictionary, it should automatically have been saved as a text file.  The test for file extension was for '.ext' instead of '.txt'  so a .'.txt' dictionary would have been saved in compressed (.dic) format.

The change  uncovered the fact that the dictionary unit, UDict, which has been part of our DFF Library file some some time, was also included in the source code zip files for Dicmaint and Wordstuff.  As a result, naturally,   the UDict included with these programs was not the latest version.   As of today, UDict has been removed from the program source files and users wishing to recompile these programs will need  the DFF library zip file, version DFFLibV08 or later.     

Addendum July 8, 2007:  A few small minor program changes were posted today.  Namely:

bulletCrossword Helper was enhanced to solved a variation of the word completion problem.  My Mensa Daily Puzzle calendar has a puzzle type with a scrambled word that is missing one letter, kind of a combination of the "Crossword Helper" and "Unscramble" programs.  CrosswordHelper2 now uses some of the code from the Scrambled Pie solver to find the word when there is a single letter missing but no solution is found initially.  For example, the previous version would have completed "b?by" as "baby", but would have found no solution for "bby?".   The new version will ask if you want to search for anagrams and return "baby" if you click the Yes button.   It will also find the solution for the "ARUI?YNGR" calendar puzzle that led to the enhancement in the first place.
bulletSearch logic used in Scrambled Pie was moved to a new unit "USearchAnagrams" which is now used by both ScrambledPie and CrosswordHelper2 to implement the enhancement described above.
bulletI add words to full.dic whenever it failed to contain the answer to some word puzzle I'm working on.  The Dictionaries.zip download file was updated to contain the latest version of our full.dic dictionary file.  

Addendum August 10, 2007: Another small update to Crosswordhelper this month.  I'm not sure when it got broken, but a viewer recently pointed that the "Excluded letters" option did not work.  Crossword Helper displays a list of valid words from the current dictionary when  some of the letters and their positions are known.   The excluded letters list specifies letters that are not to be used in completing the word,  however the code was checking upper case word letters against lower case excluded letters.   Not surprisingly, it never found a match and included all letters in the resulting display.   Now fixed 

Addendum September 19, 2009: A recent laptop upgrade revealed some screen layout problems with some of the DFF programs, include the WordStuff modules described here.   See the DPI Scaling page for more details.  No significant changes otherwise.

Addendum April 25, 2010: Another minor change today to fix a problem that has bugged me for a long time.  When entering a partial word to be analyzed in "Word Completion" or letters to be analyzed in "Unscramble" it had been necessary to switch from keying to a mouse click on the Search button to find solutions.  Now, pressing the "Enter" key also triggers the search.  The simple code in the edit control's OnKeyPress exit looks like this (red text = comments):
    begin
        if ord(key)=13 then  
{Enter key was pressed}
        begin
            key:=char(0);
{null out the key}
            SolveBtnClick(sender);
{call the search button click routine}

        end;
    end;

October 14,  2010: A new program, Spellbound, was added to the Wordstuff whapper today.  Like the others, the source  is also available as a standalone program.  Spellbound helps in a game of the same name posted in the games section of the AARP website (http://games.aarp.org/games/spellbound.aspx).  The game's objective is to form as many words as possible from a given set of letters with a time limit on the search.   My program is a "helper" which searches a chosen dictionary for embedded words and displays the results.   I may add user play at a future date, but figuring out an efficient way to test all permutations of all subsets of a given set of letters was enough fun for awhile.  

June 17, 2011:  I recently posted a Delphi Techniques program about using Delphi's OnIdle exit to perform background processing without interfering with user actions.  As a more practical example, I just posted Version 2 of our Spellbound program to allow user play  (enter words which can be made from a given set of letters) while the program does its own search in the background.  If you approach it as a competition, I can pretty much guarantee that the program will win - at least that's the case for me.  It was a good programming exercise though to let the program search in lots of small chunks while you are thinking and still respond instantly when you type or click.  Wordstuff3 was recompiled to include the new version also.   

October 14, 2011:   Another update today to a program included under the Wordstuff umbrella, Unscramble this time.  I added an option to handle cases where, rather than being scrambled, words are "interlaced".  Letters for 2 or 3 words appear in the correct order but interspersed with letters of the other words.  So, for example, "CAT" and "'FROG" could appear as "CFRAOGT".  A new checkbox on the Unscramble form indicates that the letters represent interlaced words.   The sample set of 17 letters included on the form represent 3 synonymous words and are from a recent Mensa Puzzle Calendar page.  I did not successfully identify the words using human brainpower, but the program finds them in a minute or two. 

November 22, 2014:  Two programs changed with today's posting"

  1. Unscramble was enhanced to allow use of less than the full set of letters letters when single words anagrams are requested.  Previously only words containing all letters were "unscrambled".  Example "How many 5 letter words can you find from the letters FRIGATE?"  My program finds 10, the same count as the Mensa calendar puzzle for October 18, however I found "regia" which Mensa did not and they found  "retag" which I have now added to our "Full.dic" large dictionary.
  2. The Word Completion program (aka Crossword Helper) has an option to specify letter which that may not used to fill missing letter positions in the input template.  A bug in "Word Completion" caused words with an excluded letter in any position (prefilled or missing) to be omitted from the list of words found.   

February 1, 2015:  A small fix to Unscramble (now Version 2.2), to correct range of maximum and minimum word lengths to search for in order to keep them within the range based on the length of the current scrambled word. 

August 30,2015:  Version 2.21 of the Word Completion program allows filtering of words by specifying whether a set of given letters may or may not be used to complete partial words.

September 13, 2015:  The anagram feature of Word Completion was previously only available if no words were found when filling a single missing letter.  A recent puzzle required finding anagrams even when there were words found without anagramming.  Version 2.3 adds a check box to force anagram searches in either case. 

October 23,2016:  Two programs were updated today based on today's Mensa Calendar puzzle asking for 3 and 6 letter synonyms using all of the 9 letters in the word CARPOOLED.  My Unscramble program should have been able to solve it, but did not.  It turns out that one of the synonyms is PRO which our dictionary categorizes as an abbreviation.  Whether it is may be debated, but Unscramble had a hard-to-find option to include abbreviations.  Selecting non-standard words is now  much simpler with Unscamble V2,  also now do an alpha sort the 2 and 3 word lists to remove redundancies caused by word order.  Bottom line is that list are now much shorter and easier to scan for relevant solutions.    

In the process of researching the problem,   I also corrected some text scaling problems in the DicMaint program which allows words to be categorized as Abbreviation, Capitalized, or Foreign words as well as performing other dictionary maintenance tasks.  

November 16, 2016: As the result of testing Dictionary accessing under Delphi 10.1 Berlin edition, DicMaint and all seven programs in the WordStuff package have been recompiled  and modified to run under Delphi 7 and under the newer XE and Berlin Delphi versions.    See the Delphi 7/ Later versions Differences page.  

February 10, 2018:  Two minor irritants were removed today to create Version 3.2.1.

  1. The Unscramble program allows users to specify whether to create one, two, or three words from the input letters and also  to specify minimum and maximum length for individual words.   The default Maximum word length is always set to the number of input letters which is OK, but the minimum defaulted to 1 letter. The typical single word search is looking for words using all letters and required resetting the Minimum value before searching.  The default minimum word length for single word searches now matches the input letter count.
  2. WordSearch  (aka Crossword Helper) typically searches for words with known letters entered and placeholders inserted for  unknown letter positions.  At some point, an option was added to allow searching for anagrams of letter sets with a single letter missing.  However, once the option was selected it was disabled but still active for following searches.   The option is now available enabled when there is a single unknown letter, and reset and unavailable when there are multiple unknown letters.

 

 .      

Running/Exploring the Program 

The Dictionaries and DicMaint downloads below duplicate the download on the  WordStuff 1 page.  The dictionaries were last updated on 7/3/07.  If you have downloaded dictionaries  once since then, there is no need to download again.  If you have not previously downloaded the dictionaries,  you will need to do so, before running WordStuff3.  The current dictionary download also contains local additions as of November, 2014.

bulletDictionaries & DicMaint
bulletDownload Dictionaries
bullet Download DicMaint source
bullet Download  DicMaint executable
bulletWordStuff 
bullet Download source (for all 6 programs + WordStuff wrapper)
bulletDownload  WordStuff executable 
 
bulletDFF Library
bulletDownload current DFF Source code library (DFFLibV15) To recompile the source code for  any of the above programs, you will need the UDict unit contained in DFF library file, DFFLibV14_12Nov2016 or later.

  

bulletLazarus Code
bulletDownload Lazarus Dicmaint Source
bullet Download Lazarus Source (for all 6 programs + WordStuff wrapper)
bulletDownload Lazarus Source Code Library

Suggestions for Further Explorations

Decrypt has lots of room for study. Should  the "trystring" function be built based on frequency of letters in the message, or order of occurrence when words are sorted from shortest to longest? Perhaps doubled letters should be tried first, or look for digraphs (letter pairs that occur frequently), word endings, etc. The first step would be to implement timing code so that the efficiency of various techniques could be explored.
The current version of Decrypt is "all or  nothing",  a single encrypted word not in the dictionary will result in a "No solution found" message.  That's not a good thing.  
Personally, I’m going on to develop other word games using the dictionary. Next will be WordFinder – find as many valid words as possible from a random array of letters by moving horizontally, vertically, or diagonally from letter to letter. More path searching for the auto-solve part - oh boy!  

 

Created:  February 4,  2001

Modified: May 15, 2018

 

  [Feedback]   [Newsletters (subscribe/view)] [About me]
Copyright © 2000-2018, Gary Darby    All rights reserved.