next up previous contents index
Next: Experimental Results Up: Method Description Previous: Default Annotations

More Generalization

 

Sometimes, it is impossible to devise a rule that associates an ending with the correct annotation, because the choice is lexicalized, i.e. it depends on a particular word, and it seems arbitrary from the morphological point of view. For example, in Polish, there is a rule that transforms adjectival endings -sny in lexemes into -śniejszy in comparatives and superlatives. There is, however, another rule that transforms endings -sny into -śniejszy in comparatives and superlatives. So there is no other way of knowing what the lexeme might be from a comparative or superlative ending other than a dictionary lookup. R6 introduces artificial divisions, e.g.:

-raśniejszy tex2html_wrap_inline5226 -raśny
-iaśniejszy tex2html_wrap_inline5226 -iasny
-maśniejszy tex2html_wrap_inline5226 -maśny
-waśniejszy tex2html_wrap_inline5226 -waśny
jaśniejszy tex2html_wrap_inline5226 jasny
-ośniejszy tex2html_wrap_inline5226 -ośny
-dośniejszy tex2html_wrap_inline5226 -dosny
-ześniejszy tex2html_wrap_inline5226 -zesny
-oleśniejszy tex2html_wrap_inline5226 -olesny
-bleśniejszy tex2html_wrap_inline5226 -bleśny
-uśniejszy tex2html_wrap_inline5226 -uśny

while the right answer is that both annotations must be considered:

-śniejszy tex2html_wrap_inline5226 -śny
-śniejszy tex2html_wrap_inline5226 -sny

To cope with that situation, we introduce a new rule that strives to accommodate such cases. We will use the term first annotated state    to name a state that is a target of a transition labeled with the annotation separator (a state that begins an annotation or a set of annotations).

tex2html_wrap5252 R7. If for a given state the number of first annotated states that are reachable from the given state does not exceed a given limit, then:

Note that it is possible to introduce a lower limit on the number of states to be removed in order to insure that we are dealing with a case such that the one described above (sny and sny). The rule can then work in parallel with R6.

To make things clear, we need to describe what we mean by a union of states. We make it by constructing a new state that has all transitions from contributing states. For pairs of transitions that go to different states, we construct transitions going to unions of those states:

tex2html_wrap5254 A union of states A and B is a state having all transitions from A and B labeled with characters present once in all transitions from A and B, all transitions form A and B that have the same labels and go to the same states, and for all transitions from A and B that have the same labels, but go to different states, transitions of the same labels going to states being a union of target states.

It is worth noting that while the rule R6 introduces very detailed distinctions, R7 discards details. For the guesser, the result of applying R7 is that one gets more choices than without having applied R6 or R7. As to the lexicon size, R7 removes small differences between similar word forms, making it possible to infer more general and compact relations between endings and annotations.

Please note that although no annotation possibility is lost, and the automaton is much smaller, the answers for known words are no longer 100% accurate. The correct answer appears always, but it may be accompanied by other, incorrect possibilities. In many cases exceptions are merged with regular rules. A lower limit imposed on the number of states to be removed by this rule can solve the problem.


next up previous contents index
Next: Experimental Results Up: Method Description Previous: Default Annotations

Jan Daciuk
Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html