next up previous contents index
Next: Construction Algorithms Up: Data for Applications Previous: Morphological Analysis of Unknown

Perfect Hashing

 

Hashing   maps words from a given set of n words into numbers (addresses). If the mapping is a bijection into a set tex2html_wrap_inline4976 (or tex2html_wrap_inline4978), the hashing is said to be ``perfect''.

It is possible to number the words of the language accepted by an automaton. The numbers associated with the words depend on the order in which they are stored in the automaton. The order is (roughly) lexicographical, i.e. to determine which of two words should be first, one compares the codes of their first letters. The smaller code means the corresponding word is stored first. If the codes are equal, the comparison is repeated with the words without their first letters, and so on, recursively. For example, in the automaton that stores the French regular endings of the verbs of the first group (fig. 4.3, page gif), if the words are numbered from 1, then aient receives the number 1, ais - 2, ait - 3, ant - 4, assent - 5, and so on. As the numbers result from the way the words are stored, all that is needed as input data for the algorithm that builds a perfect hashing automaton is a simple list of words. To match the order of words in the automaton, that list should be lexicographically sorted. For more information on perfect hashing, see section 6.4, page gif.

What we mean by ``words'' depends on the application. The words can be e.g. a list of entries in a dictionary holding the description of meanings of words, or a list of entries in an encyclopedia. In the first example, we can use a list of all inflected forms instead of the entries, which would provide a possibility of looking up the inflected forms as well. However, we are not limited to entire words being words in the natural language we process. The words may contain annotations similar to those described in the subsection 3.2.2. If we put categories as annotations in our dictionary example, we can provide full morphological analysis of inflected forms!  


next up previous contents index
Next: Construction Algorithms Up: Data for Applications Previous: Morphological Analysis of Unknown

Jan Daciuk
Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html