Using Automata - Acceptors

Next: Unknown Words Up: Morphology Previous: Using Transducers

Using Automata - Acceptors

Simple automata - acceptors can be used for morphological analysis. They can provide faster analysis, but the range of languages that can be described with such machines is more limited than in the case of transducers. Also, special encoding techniques are necessary for the contents of dictionaries (see section 3.2.2, page , for the instructions on how to prepare data for a dictionary in form of an automaton-acceptor).

Note that the strings in a dictionary have an internal structure. They have two parts: the first is the inflected form of a word, the second - its annotations, describing e.g. the corresponding lexeme or morphological categories . Those parts are separated with an annotation separator . The analysis has two phases corresponding to those parts. The first phase is the recognition of the inflected forms. The second one treats annotations.

Annotations may contain lexemes. The lexemes are coded (see section 3.2.2, page ) to reduce the size of the dictionary. They must be decoded during the analysis. The decoding consists of copying the inflected form without a few letters from the end, as indicated by the code. The procedures can be found in figure6.5.

Figure 6.5: Morphological analysis with automata-acceptors

Note that it may be useful to introduce prefixes in the dictionary in the same way as in the analysis of unknown words. An additional code would say how many characters should be rejected from the beginning of the word to get a lexeme. This variation has not yet been implemented.

Jan Daciuk
Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html