next up previous contents index
Next: More Generalization Up: Method Description Previous: Lexicon Reduction

Default Annotations

 

The process described above leads to the construction of a finite state automaton that contains all required information. However, the automaton is still big, and an effort can be made to further reduce its size. Looking at its contents, we can see many states with a majority of transitions leading to one state (we will use the term default state   ), but with other transitions as well. The target state must have one outgoing transition, and it must be labeled with the annotation separator . We can treat less frequent transitions as exceptions, assuming that all other transitions, even those that had not appeared in our lexicon, lead to the default state. Acting under this assumption, we can replace the frequent transitions and the default state with the transition leading from the default state. A limit can be imposed on the ratio of frequent/less frequent transitions to trigger the pruning.

There is a difference between rules R1, R2, ...R5, and this rule. Rule R6 introduces a generalization. While still 100% of words present in the lexicon are annotated correctly, R6 may select one annotation among many as the correct one, and hide other possibilities. This may speed up an annotation process, but it can also introduce errors: some correct (but less probable) possibilities may not be shown, as the lexicon may not contain data that associates their endings with their correct annotations.

tex2html_wrap5224 R6. If for a given state the number of transitions leading to one state (the default state) is greater or equal to the number of all other transitions multiplied by a small integer, and the default state has only one outgoing transition and it is labeled with annotation separator , then the default state can be removed, and all transitions that lead to it should be replaced by the transition leaving the default state.


next up previous contents index
Next: More Generalization Up: Method Description Previous: Lexicon Reduction

Jan Daciuk
Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html