Chapter 2 introduces basic concepts and definitions that are used in the rest of this work. They are based mostly on [HU79], [Wat93a], [Wat93b], [RS95], and [Moh97]. A new type of automata is presented and discussed.

Chapter 3 describes input data formats for programs that build automata. That format is not interpreted by those programs, but it is interpreted by applications that use automata.

Chapter 4 presents two algorithms for constructing deterministic, acyclic finite state automata. The algorithms differ by their efficiency and by the need to sort input data. The problem of representation of automata (i.e. storing them) is discussed.

Chapter 5 describes an algorithm, or a family of algorithms for constructing a finite-state automaton that can be used for guessing lexemes and categories of inflected forms. The algorithm uses the data for an existing morphological lexicon. The data is prepared in a special way, an automaton is built out of it, and then the algorithms prune it to obtain some generalizations and a smaller size.

Once the automata are built, they can be used for a variety of applications in the NLP domain. Chapter 6 discusses the algorithms that use automata for spelling correction, diacritic restoration, morphological analysis, and perfect hashing.

The dissertation ends with a summary.

Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html