next up previous contents index
Next: Usage Up: Construction of Guessing Automata Previous: Related Work

Concluding Remarks

The method presented here leads to a device that can perform a morphological analysis of both known and unknown words as accurately as possible (see section 6.3.3 for an evaluation), while preserving the advantages of finite state automata (great speed and small size). We eliminate the need for two separate devices, and use less space.

We plan to experiment with other values for thresholds in rules R6 and R7. Another issue that needs farther investigation is the impact of the use of closed categories among the data for our guesser - how much they inflate the lexicon and affect the recognition process. Maybe exclusion of those categories from the lexicon would reduce its size. However, excluding those categories requires additional linguistic knowledge, so it contradicts our goals. On the other hand, the current treatment of prefixes requires that knowledge. However, we believe that existing algorithms (e.g. [DC92], [TC97]) make it possible to to process all data automatically.

We believe that our method can be used for lexical acquisition. Annotations may contain morphological descriptions (i.e. data for a morphology program). A small core morphology can be used to construct the guesser, and then to process new words from corpora to acquire new lexemes. It should be noted that on average the current method gives only not more than 1.6 lexemes per analyzed word.

The programs used in our experiments are available free of charge for research purposes from http://www.pg.gda.pl/ jandac/fsa.html. The French data comes from ISSCO, Université de Genève and is available for research purposes from ftp://issco-ftp.unige.ch/MULTEXT/ (file french.gz). The file french.gz is input data for mmorph, ISSCO's morphology program available from the same address. The Polish data comes from Zygmunt Vetulani from the Adam Mickiewicz University in Poznan, Poland, and is not available for research purposes.



Jan Daciuk
Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html