next up previous contents index
Next: Representation Up: Construction Algorithms Previous: From Data in Arbitrary

Algorithm by Dominique Revuz

 

We know of one algorithm that constructs minimal deterministic acyclic finite-state automata incrementally. Dominique Revuz ([Rev91]) does it in two stages: the first one is called a pseudominimization, the second one - the proper minimization. We give here an account of that algorithm based on [Rev91].

A notion of the longest common suffix  is introduced. The longest common suffix of a word and a set is the longest common suffix of the word and a suffix of a word of the set.

To simplify the search of the longest common suffix (written lsc(w,X) from French ``le plus long suffixe commun''), the words are appended in inversed lexicograhical order, in which the comparison between the letters of words is done from right to left. This order makes it possible to enter words one after another so that the longest common prefix is the longest common prefix of the last word with the set of words already entered.

While there is a transition labeled with the current letter (of the word) and the state reached with that transition has only one predecessor, one moves to that state using the transition.

While the transition exists but the target state has more than one predecessor, the target state is duplicated, and the transition is redirected towards the copy. One moves to that copy.

While the longest common suffix is not reached, one creates a new transition labeled with the current letter and a new state, to which one moves using the transition.

Now the longest common suffix is reached.

While the state of the longest common suffix belongs to the path in the automaton used by the prefix of the word, one creates a new transition labeled with the current letter and a new state, to which one moves using the transition.

If the state in the longest common prefix does does not belong to the state used by the prefix of the word, one creates a transition that leads to that state and one moves to that state.

Let d be the last state reached in one of the last two loops. If d is final, then:

That was the pseudominimization phase. It is followed by a true minimization phase, which consists in partitioning the states into classes that have the same maximal length of strings in their right languages. The states are sorted in their classes, starting from the class that have the maximal length of string in its right language equal to 0, and proceeding with 1, 2, etc. During the sorting, the equivalent states are eliminated.


next up previous contents index
Next: Representation Up: Construction Algorithms Previous: From Data in Arbitrary

Jan Daciuk
Wed Jun 3 14:37:17 CEST 1998

Software at http://www.pg.gda.pl/~jandac/fsa.html