This page is not infested with Javascript. Enjoy reading without necessity of analyzing the source in HTML.

Viewable With Any Browser


Finite-state automata (FSA) and directed acyclic word graphs (DAWG)

This page is an attempt to gather information about various automata-related and DAWG-related resources in one place. We want to make it the reference site. The need for it became obvious as more and more people are rediscovering the same algorithms and methods, sometimes using different terminology. Additions and corrections are welcome - send them to the maintainer.

Table of Contents

Definitions and general information

General introductory material

There are several books that contain introduction to automata theory. Some of them are listed below:

  1. Emmanuel Roche and Yves Schabes, Finite-State Language Processing, Bradford Book series, MIT Press, Cambridge, Massachusetts, USA, 1997.[bibtex]
  2. Michael A. Arbib and A.J. Kfoury and Robert N. Moll, Introduction to Formal Language Theory, Springer Verlag, New York, New York, USA, 1988.[bibtex]
  3. A. V. Aho, J. E. Hopcroft and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley Publishing Company, 1974.[bibtex]
  4. John E. Hopcroft and Jefferey D. Ullman, Introduction to Automata Theory, Languages, and Computation, Adison-Wesley Publishing Company, Reading, Massachusets, USA, 1979.[bibtex]
  5. Alfred V. Aho and Ravi Sethi and Jeffrey D. Ullman, Compilers. Principles, Techniques and Tools, Addison Wesley, 1986.[bibtex]
  6. Jacques Sakarovitch, Éléments de theorie des automates, éditions Vuibert, 2003. Table of Contents, preface and introductions to chapters available at http://perso.enst.fr/~jsaka/ETA/. [bibtex]

You can also find a few introductory courses on-line:

  1. A Concise Tutorial on Finite Automata from CoLoS, Milan
  2. CPS 360 Textbook
  3. Finite-State Methods in Natural-Language Processing
  4. Finite State Machine Design

Formal foundations

  1. Jean Berstel, Transductions and Context-Free Languages, Teubner Studienbücher, Stuttgart, 1979.[bibtex]
  2. Ch. Choffrut, Une caractérisation des fonctions séquentielles et des fonctions sous-séquentielles en tant que relations rationelles, Theoretical Computer Science, vol. 5, pp. 325--338, 1977.[bibtex]
  3. Dominique Perrin, Finite Automata, in Handbook of Theoretical Computer Science. Volume B: Formal Models and Semantics, J. van Leeuwen (ed.), Elsevier and the MIT Press, pp. 1--57, 1990.[bibtex]
  4. Dominique Perrin and Jean-Eric Pin, Infinite Words, (Automata, Semigroups,Logic and Games), Pure and Applied Mathematics Vol 141, Elsevier 2004, ISBN 0-12-532111-2.
  5. Meera Blattner and Tom Head, Single-valued a-transducers, Journal of Computer and System Sciences, 15(3), pp. 328--353, 1977.[bibtex]
  6. Mehryar Mohri, Finite-State Transducers in Language and Speech Processing, Computational Linguistics, 23(2), pp. 269-311, June, 1997.[bibtex]
  7. Sheng Yu, Regular Languages, in Handbook of Formal Languages, Vol. 1, G. Rozenberg and A. Salomaa eds., Springer, pp.41-110, 1998.[bibtex]

Short definitions

The NIST Dictionary of Algorithms and Data Structures provides many useful definitions. Wikipedia has an interesting entry on finite automata as well.

A Note on Memory Efficiency

This page makes it absolutely clear that there are various algorithms for achieving the same goal. They can differ in various aspects. One of them is memory requirements. It became quite fashionable in some circles, e.g. among computational linguists, but also (surprisingly) among some computer scientists, to refer to any concern about memory requirements as old-fashioned. An argument that an algorithm using less memory than another algorithm is better than that other algorithm is no argument for them. They say it used to be like that some ten years ago, but not anymore. They are clearly wrong.

Ten years ago (whenever you read it - the growth is exponential anyway), computers were about 100 times slower, and they had about 100 times smaller memories. It is amazing that the difference is so huge. Does it mean that we use only 1% of the total amount of memory in our computers, and the other 99% is simply wasted? Have we totally abandoned compression? The answer is no. The reason for that is the same as for the fact that we have not abandoned fast algorithms in favor of slower but simpler ones. The data we use grows with our computers. We also use new kinds of data in new applications. While they might require new algorithms, some old algorithms work perfectly well on them - they just have bigger input.

Our computers not only grow; they can have... offsprings. The mode of interaction with every generation changes. First dinosaurs were fed with punched cards, and the results were available from the computer center the next day. Most of us have computers on their desks. Some of us have laptops or notebooks that can be used while traveling. There will be computers in watches. Why cannot we use them for the same tasks as mainframes? Although they are subject to the same law of growth, they have clearly much more limited memories. Some programs just don't fit into them. Unless we use memory-efficient algorithms, of course.

Algorithms and methods

Creation

A taxonomy of automata construction algorithms can be found in:
  1. Bruce W. Watson, Taxonomies and Toolkits of Regular Language Algorithms, Ph.D. thesis, Eindhoven University of Technology, the Netherlands, 1995.[bibtex]
  2. Bruce W. Watson, A Taxonomy of Finite Automata Construction Algorithms, Eindhoven University of Technology, The Netherlands, 1993.[bibtex]
  3. Bruce W. Watson, A taxonomy of algorithms for constructing minimal acyclic deterministic automata, Proceedings of the Fourth Workshop on Implementing Automata, Potsdam, Germany, 1999.[bibtex]

Acyclic deterministic automata - recognizers

All the algorithms presented in this subsection take special advantage of the acyclicity of automata.

A traditional approach to constructing acyclic recognizers consists in building a trie (the algorithm for that is trivial), and then minimizing it using any of the well-known minimization algorithms. It can still be fast, but the trie can be enormous. Incremental (and to some extent also semi-incremental) algorithms presented below have much lower memory requirements.

An algorithm for incremental construction of minimal, deterministic, acyclic FSA from a sorted list of words (strings). It is very fast, and it uses the smallest amount of memory among all available methods for this task. Incremental means that words are added and integrated into a minimal automaton (or minimal except for one word) on-the-fly.

  1. Jan Daciuk, Stoyan Mihov, Bruce Watson, and Richard Watson, Incremental Construction of Minimal Acyclic Finite State Automata, Computational Linguistics, 26(1), pp. 3--16, April 2000.[bibtex]
  2. Jan Daciuk, Richard E. Watson, and Bruce W. Watson, Incremental Construction of Acyclic Finite-State Automata and Transducers, proceedings of Finite State Methods in Natural Language Processing, Bilkent University, Ankara, Turkey, June -- July, 1998.[bibtex]
  3. Stoyan Mihov, Direct Building of Minimal Automaton for Given List, Annuaire de l'Université de Sofia ``St. Kl. Ohridski'', Faculté de Mathematique et Informatique, Sofia, Bulgaria, volume 91, livre 1, February 1998.[bibtex]
  4. Stoyan Mihov, Direct Construction of Minimal Acyclic Finite States Automata, Annuaire de l'Université de Sofia ``St. Kl. Ohridski'', Faculté de Mathematique et Informatique, Sofia, Bulgaria, volume 92, livre 2, November, 1998.[bibtex]

An algorithm for incremental construction of minimal, deterministic, acyclic FSA from an unsorted list of words (strings). It is very fast, but slower than the algorithm described above, and it is also economic in its use of memory, but the fact that the words may come in arbitrary order may make an intermediate automaton and its associated structures (a register of states) larger.

  1. Jan Daciuk, Stoyan Mihov, Bruce Watson, and Richard Watson, Incremental Construction of Minimal Acyclic Finite State Automata, Computational Linguistics, 26(1), pp. 3--16, April 2000.[bibtex]
  2. Jan Daciuk, Richard E. Watson, and Bruce W. Watson, Incremental Construction of Acyclic Finite-State Automata and Transducers, proceedings of Finite State Methods in Natural Language Processing, Bilkent University, Ankara, Turkey, June -- July, 1998.[bibtex]
  3. K. Sgarbas, N. Fakotakis, and G. Kokkinakis, Two Algorithms for incremental construction of Directed Acyclic Word Graphs, International Journal on Artificial Intelligence Tools, World Scientific, 4(3), pp. 369--381, 1995.[bibtex]
  4. J. Aoe, K.Morimoto, and M.Hase, An Algorithm for Compressing Common Suffixes Used in Trie Structures, Trans. IEICE, 1992.[bibtex]
  5. Dominique Revuz, Dynamic Acyclic Minimal Automaton, CIAA 2000, Fifth International Conference on Implementation and Application of Automata, pp. 226-232, London, Canada.[bibtex]
  6. K. Sgarbas, N. Fakotakis, and G. Kokkinakis, Optimal Insertion in Deterministic DAWGs, Theoretical Computer Science, Elsevier, Vol.301/1-3, pp.103-117, 2003.[bibtex]
The penultimate paper contains only a sketch of the algorithm, but it is accompanied by a sketch of an algorithm for removing words (not much different from adding words). That algorithm has a generalization for cyclic automata.

An algorithm for semi-incremental construction of minimal, deterministic, acyclic FSA from a presorted list of words. This algorithm does not construct the minimal automaton in one step, and the intermediate automaton can be large. The strings are inverted, then sorted, and then inverted again to compress endings, but this does not give the true minimization. Actually, the algorithm is used without presorting in the INTEX tool. The algorithm has the best worst case computational complexity, but in practice it is slower than the algorithms described above, except for the case when the strings have very long common suffixes (then it performs the best)..

  1. Dominique Revuz, Dictionnaires et lexiques: méthodes et algorithmes, Thèse de doctorat, Institut Blaise Pascal, Paris, France, 1991, LITP 91.44.[bibtex]

An algorithm for semi-incremental construction of minimal, deterministic, acyclic FSA from a presorted list of words (strings). Words are sorted according to their decresing length.

  1. Bruce Watson, A Fast New (Semi-Incremental) Algorithm for the Construction of Minimal Acyclic DFAs, Third Workshop on Implementing Automata, Rouen, France, 17-19 September, 1998.[bibtex]

If you want to compare various construction methods for acyclic automata, you might be interested in this page.

It contains an extended version of:
  1. Jan Daciuk, Comparison of Construction Algorithms for Minimal, Acyclic, Deterministic, Finite-State Automata from Sets of Strings, Seventh International Conference on Implementation and Application of Automata CIAA '2002, Tours, France, 2002.[bibtex]

Acyclic nondeterministic automata - recognizers

An algorithm for incremental construction of nondeterministic, acyclic FSA from an unsorted list of words (strings).

  1. K. Sgarbas, N. Fakotakis, and G. Kokkinakis, Two Algorithms for incremental construction of Directed Acyclic Word Graphs, International Journal on Artificial Intelligence Tools, World Scientific, 4(3), pp. 369--381, 1995.[bibtex]
  2. K. Sgarbas, N. Fakotakis, and G. Kokkinakis,Incremental Construction of Compact Acyclic NFAs, Proc. ACL-2001, 39th Annual Meeting of the Association for Computational Linguistics, pp.474-481, Toulouse, France, 6-11 July, 2001.[bibtex]

Automata (recognizers) with cycles

This is a generalization of the
"unsorted data" algorithm for acyclic automata. This means that the algorithm is also incremental. It should perhaps be incremental modification, as it is not possible to introduce cycles by just adding simple strings. The paper also contains an algorithm to remove a string from the automaton.
  1. Rafael C. Carrasco and Mikel L. Forcada, Incremental Construction and Maintenance of Minimal Finite-State Automata, Computational Linguistics, 28(2), June 2002.[bibtex]

Acyclic deterministic transducers

  1. Stoyan Mihov and Denis Maurel, Direct Construction of Minimal Acyclic Subsequential Transducers, Fifth International Conference on Implementation and Application of Automata CIAA 2000, pp. 1--1, London, Ontario, Canada, July, 2000.[bibtex]

Pseudo-minimal automata

In these transducers, for each word recognized by them, there is a corresponding proper element in the automaton. That proper element is either a transition, or a final state, that is not shared with any other word.
  1. Denis Maurel, Pseudo-minimal transducer, Theoretical Computer Science, 231, pp. 129-139, 2000.[bibtex]

From regular expressions

There are two basic methods. Both lead to nondeterministic automata that can later be determinized and minimized.

The standard construction or Thompson construction is described in all major textbooks. It has been popularized by its use in grep and other UNIX tools. It comes in several variants.

  1. S. C. Kleene, Representations of Events in Nerve Sets and Finite Automata, In: C. E. Shannon and J. McCarthy, editors, Automata Studies, pages 3-42, Princeton, NJ, 1956. Princeton University Press.
  2. K. Thompson, Regular Expression Search Algorithm, Communications of the ACM, 11:419-422, 1968.[bibtex]

The Glushkov or McNaughton-Yamada construction produces smaller automata. It was discovered independently by Glushkov and by McNaughton and Yamada. It is less known than the standard construction. The first three papers describe the original algorithm, the next two bring results about its complexity and its relation to Brzozowski derivatives, the last two present an elegant inductive step-by-step construction (also called Champarnaud-Glushkov construction). The Champarnaud-Glushkov construction is a different algorithm than the original one, but it leads to the same automaton (Glushkov automaton), and it is easier to understand and to implement.

  1. V. M. Glushkov, On Synthesis Algorithm for Abstract Automata, Ukr. Mathem. Zhurnal, 12(2):147-156, 1960. In Russian.[bibtex]
  2. V. M. Glushkov, The Abstract Theory of Automata, Russian Mathematical Surveys, 16:1-53, 1961.[bibtex]
  3. R. McNaughton and H. Yamada, Regular Expressions and State Graphs for Automata, IEEE Transactions on Electronic Computers, 9:39-47, 1960.[bibtex]
  4. Anne Brügermann-Klein, Regular Expressions into Finite Automata, Proceedings of Latin '92, 1992.[bibtex]
  5. G. Berry and R. Sethi, From Regular Expressions to Deterministic Automata, Theoretical Computer Science, 48:117-126, 1986.[bibtex]
  6. Jean-Marc Champarnaud, From a Regular Expression to an Automaton, Rapport LIR-97.08, Laboratoire d'Informatique de Rouen, France, 1997.[bibtex]
  7. Jean-Marc Champarnaud, J.-L. Ponty, and Djelloul Ziadi, From Regular Expressions to Finite Automata, International Journal of Computer Mathematics, 72, 415--431, 1999. [bibtex]

The first two papers below introduce the ZPC-structure, that provides an ε-free alternative for Thompson's technique. The third paper essentially compares Thompson and ZPC-based techniques. The last one extends ZPC-based techniques to the case of weighted expressions.

  1. Djelloul Ziadi, J.-L. Ponty and Jean-Marc Champarnaud, Passage d'une expression rationnelle à un automate fini non-déterministe, Bulletin of the Belgian Mathematical Society Simon Stevin, 4(2), pp. 177-203, 1997. [bibtex]
  2. Djelloul Ziadi, J.-L. Ponty and Jean-Marc Champarnaud, A new quadratic algorithm to convert a regular expression into an automaton, in WIA'96, Lecture Notes in Computer Science, D. Raymond and D. Wood eds., Springer-Verlag, 1260, pp. 109-119, 1997. [bibtex]
  3. Jean-Marc Champarnaud, Evaluation of Three Implicit Structures to Implement Nondeterministic Automata from Regular Expressions, International Journal of Foundations of Computer Science, 13(1), pp. 99-113, 2004. [bibtex]
  4. Jean-Marc Champarnaud, E. Laugerotte, F. Ouardi and Djelloul Ziadi, From regular weighted expressions to finite automata, International Journal of Foundations of Computer Science, 15(5), pp. 687-700, 2004. [bibtex]

The three following papers deal with the construction of a quotient of the Glushkov automaton, namely Antimirov automaton. The last one allows to convert a regular expression into its follow automaton, via its ZPC-structure.

  1. Jean-Marc Champarnaud and Djelloul Ziadi, From Mirkin's Prebases to Antimirov's Word Partial Derivatives, Fundamenta Informaticae, 45, pp. 195--205, 2001. [bibtex]
  2. Jean-Marc Champarnaud and Djelloul Ziadi, From C-Continuations to New Quadratic Algorithms for Automaton Synthesis, International Journal of Algorithms and Computation, 11(6), pp. 707-735, 2001. [bibtex]
  3. Jean-Marc Champarnaud and Djelloul Ziadi, Canonical Derivatives, Partial Derivatives, and Finite Automaton Constructions, Theoretical Computer Science, 289, pp. 137-163, 2002. [bibtex]

For a comparison of Thompson and Glushkov constructions see the first paper. The second paper contains a chapter (about 50 pages) on various variants of both constructions (including determinization). The third one is different from that chapter in the second one.

  1. Dora Giammarresi, Jean-Luc Ponty, and Derick Wood, Glushkov and Thompson Constructions: A Synthesis, Tech. Report 98-17, Universita' Ca' Foscari di Venezia, 1998.
  2. Bruce W. Watson, Taxonomies and Toolkits of Regular Language Algorithms, Ph.D. thesis, Eindhoven University of Technology, the Netherlands, 1995.[bibtex]
  3. Bruce W. Watson, A taxonomy of algorithms for constructing minimal acyclic deterministic automata, Proceedings of the Fourth Workshop on Implementing Automata, Potsdam, Germany, 1999.[bibtex]

For an extension of regular expressions see:

  1. Dale Gerdemann and Gertjan van Noord, Transducers from Rewrite Rules with Backreferences, Ninth Conference of the European Chapter of the Association for Computational Linguistics, Bergen, Norway, 1999.[bibtex]
  2. Gertjan van Noord and Dale Gerdemann, An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing, Workshop on Implementing Automata; WIA99 Pre-Proceedings, O. Boldt and H. Juergensen and L. Robbins (eds.), Potsdam, Germany, 1999.[bibtex]

Minimization

Deterministic recognizers

A taxonomy of minimization algorithms can be found in:

  1. Bruce W. Watson, A Taxonomy of finite automata minimization algorithms, Eindhoven University of Technology, The Netherlands, Computing Science Note, 93/44, 1993.[bibtex]
  2. Bruce W. Watson, Taxonomies and Toolkits of Regular Language Algorithms, Ph.D. thesis, Eindhoven University of Technology, the Netherlands, 1995.[bibtex]

The fastest (O(|Q|log|Q|), where |Q| is the number of states) minimization algorithm (by John E. Hopcroft) is described in:

  1. John E. Hopcroft, An n log n algorithm for minimizing the states in a finite automaton, in The Theory of Machines and Computations, Z. Kohavi (ed.), pp. 189-196, Academic Press, 1971.[bibtex]
  2. David Gries, Describing an Algorithm by Hopcroft, Acta Informatica, vol. 2, pp. 97-109, 1973.[bibtex]
  3. A. V. Aho, J. E. Hopcroft and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley Publishing Company, 1974.[bibtex]
The third item on the list (i.e. the book) is probably the easiest to find, but beware that the description is somewhat hidden. Look for "partitioning". The problem of minimizing states in the set of states of an automaton is equivalent to finding the coarsest partition of the set of states, consistent with the initial partition into final and non-final states, such that if states s and t are in one block of the partition, then states delta(s, a) and delta(t, a) are also in one block of the partition for each input symbol a.

For an evaluation of various implementations of that algorithm, look at:

  1. George Anton Kiraz and Edmund Grimley-Evans, Multi-tape Automata for Speech and Language Systems: A Prolog Implementation, Automata Implementation. Second Internation Workshop on Implementing Automata, WIA '97, Derick Wood and Sheng Yu (eds.), Springer Lecture Notes in Computer Science 1436, pp. 87-103, 1997. [bibtex]

An algorithm that is considered by some people to be equally fast for frequently encountered data is Brzozowski's algorithm. It minimizes an automaton by performing reversal and determinization twice. The subset construction (determinization) is O(2|Q|), where Q is a set of states, but it depends on input (see master thesis by Ted Leslie in the Determinization section). The second paper here describes complexity of operations needed to implement Brzozowski's algorithm.

  1. J. A. Brzozowski, Canonical regular expressions and minimal state graphs for definite events, in Mathematical Theory of Automata, Volume 12 of MRI Symposia Series, pp. 529-561, Polytechnic Press, Polytechnic Institute of Brooklyn, N.Y., 1962.[bibtex]
  2. C. Câmpeanu, K. Culik II, and Sheng Yu, State Complexity of Basic Operations on Finite Languages, Automata Implementation. Proceedings of 4th International Workshop on Implementing Automata, WIA'99, Oliver Boldt and Helmut Jürgensen (Eds.), Postdam, Germany, LNCS 2214, Springer, 2001.[bibtex]

The following algorithm is different from all others because partial results can be used. It tests pairs of states for equivalence. The first paper describes a version with exponential time complexity. The second paper adds modifications that make it almost O(|Q|2).

  1. Bruce W. Watson, An incremental DFA minimization algorithm, Finite State Methods in Natural Language Processing 2001, ESSLLI Workshop, August 20-24, Helsinki, Finland, 2001.[bibtex]
  2. Bruce B. Watson, Jan Daciuk, An efficient incremental DFA minimization algorithm, Natural Language Engineering, Volume 9, Issue 01, March 2003. [bibtex]

The Hopcroft-Ullman is one of the best known. Its complexity is O(|Q|2), and it also requires O(|Q|2) memory, so it does not seem to be suitable for large automata. Pairs of states are considered, and the automaton has to be complete (i.e. it must have a sink, or dead state). A pair of states is distinguished if one of the states in the pair is final, and the other non-final. A pair of states is also distinguished if transitions labeled with the same symbol lead to another pair of states that was found distinguishable. For each pair of states, a list of pairs is found that are targets of transitions. Initially, pairs containing one final, and one non-final state are marked as distinguishable. Then all pairs that are on lists associated with those pairs are marked, and so on until no new pairs are marked. Pairs that are not marked are equivalent.

  1. John E. Hopcroft and Jefferey D. Ullman, Introduction to Automata Theory, Languages, and Computation, Adison-Wesley Publishing Company, Reading, Massachusets, USA, 1979.[bibtex]

The Aho-Sethi-Ullman algorithm has the same complexity as the Hopcroft-Ullman algorithm, i.e. O(|Q|2), but it needs only O(|Q|) memory. As in the Hopcroft algorithm, the set of states is partitioned, and the blocks are refined in each step, until no more divisions occur. Two states are put into different blocks if they have transitions leading to states in different partitions. In difference to the Hopcroft algorithm, forward transitions are used for that purpose (the Hopcroft algorithm uses back transitions to split blocks with transitions leading to the current block).

  1. Alfred V. Aho and Ravi Sethi and Jeffrey D. Ullman, Compilers. Principles, Techniques and Tools, Addison Wesley, 1986.[bibtex]

For acyclic automata, there are better methods than those given above. The minimization part of incremental and semi-incremental construction algorithms can be used.

Transducers

  1. Mehryar Mohri, Compact Representations by Finite-State Transducers, ACL'94, Morgan Kaufmann, San Francisco, California, USA, 1994.[bibtex]

Determinization

The term deterministic can mean two things:

In both cases it implies that recognition can be performed without backtracking.

Determinization of the first kind is performed using subset construction. Ted Leslie investigates various approaches to the problem, as well as the influence of various properties of automata (density etc.) on the complexity of determinization. Gertjan van Noord focuses on a subproblem of determinization - removal of epsilon-transitions. Jean-Marc Champarnaud investigates determinization of specific automata (the 4th paper), and then (with coauthors) applies the results to safe regular expression search. The last paper addresses the problem of reducing the number of unreachable states produced by a brute force determinization.

  1. Ted Leslie, Efficient Approaches to Subset Construction, masters thesis, Computer Science, University of Waterloo, 1995.[bibtex]
  2. J. Howard Johnson and Derick Wood, Instruction Computation in Subset Construction, in Automata Implementation, Darrell Raymond and Derick Wood and Sheng Yu (eds.), Lecture Notes in Computer Science 1260, pp. 64-71, Springer Verlag, 1997.[bibtex]
  3. Gertjan van Noord, The Treatment of Epsilon Moves in Subset Construction, Computational Linguistics, 26(1), pp. 61-76, April 2000. Available from www.let.rug.nl/~vannoord/papers/[bibtex]
  4. Jean-Marc Champarnaud, Subset Construction Complexity for Homogeneous Automata, Position Automata and ZPC-Structures, Theoretical Computer Science, 267, pp. 17-34, 2001.[bibtext]
  5. Jean-Marc Champarnaud, F. Coulon and T. Paranthoën, Compact and Fast Algorithms for Safe Regular Expression Search, International Journal of Computer Mathematics, 81(4), pp. 383-401, 2004. [bibtex]
  6. Jean-Marc Champarnaud, F. Coulon and T. Paranthoën, Brute Force Determinization of NFAs by Means of State Covers, International Journal of Foundations of Computer Science, 16(3), pp. 441-451, 2005. [bibtex]

For transducers, the first three below include weighted transducers:

  1. Mehryar Mohri, Finite-State Transducers in Language and Speech Processing, Computational Linguistics, 23(2), pp. 269-311, June 1997.[bibtex]
  2. Mehryar Mohri, On some applications of finite-state automata theory to natural language processing, Natural Language Engineering, vol. 2, pp. 61-80, 1996. Originally appeared in 1994 as Technical Report, institut Gaspard Monge, Paris.[bibtex]
  3. Mehryar Mohri and Fernando C.N. Pereira and Michael Riley, A Rational Design for a Weighted Finite-State Transducer Library, Automata Implementation. Second International Workshop on Implementing Automata, WIA '97, Lecture Notes in Computer Science 1436, Springer Verlag, 1998.[bibtex]
  4. Emmanuel Roche and Yves Schabes, Deterministic Part-of-Speech Tagging with Finite-State Transducers, Computational Linguistics, 21(2), pp. 227-253, June, 1995.[bibtex]

Operations on automata

Most operations are described in:

  1. Emmanuel Roche and Yves Schabes, Finite-State Language Processing, Bradford Book series, MIT Press, Cambridge, Massachusetts, USA, 1997.[bibtex]

These include weighted automata:

  1. Mehryar Mohri, On some applications of finite-state automata theory to natural language processing, Natural Language Engineering, vol. 2, pp. 61-80, 1996. Originally appeared in 1994 as Technical Report, institut Gaspard Monge, Paris.[bibtex]

Fast operations on automata

  1. A. Bergeron and S. Hamel, Fast Implementations of Automata Computations, Pre-Proc. 5th Int. Conf. on Implementation and Application of Automata (CIAA), M. Daley, M. G. Eramian, and S. Yu (eds.), pp. 16-25, the University of Western Ontario, London, Canada, July 2000.[bibtex]

Factorization of FSTs

One big transducer is divided into smaller ones. This economizes the space, but running two or more transducers is usually slower than running a single one.
  1. André Kempe, Factorization of Ambiguous Finite-State Transducers, in: Proc. CIAA 2000, London, Ontario, Canada, pp. 157-164, 2000.[bibtex]

"Optimization"/size reduction of FSTs

  1. André Kempe, Reduction of Intermediate Alphabets in Finite-State Transducer Cascades, in: Proc. TALN 2000, Lausanne, Switzerland, pp. 207-215, 2000.[bibtex]

Finite-state approximation of more powerful grammatical formalisms

A number of algorithms to approximate a context-free language by means of a smaller or larger regular language are discussed and compared empirically by (Nederhof, 2000; CL journal). Some theoretical arguments relating different algorithms are provided by (Nederhof, 2000; book chapter, Bunt & Nijholt, eds). Simple approximations in the form of grammar transformations are due to (Mohri & Nederhof, 2001). For approximation of feature grammars by regular languages see (Black, 1989) and (Johnson, 1998).
  1. A.W. Black, Finite state machines from feature grammars, International Workshop on Parsing Technologies, pp. 277-285, Pittsburgh, USA, 1989.[bibtex]
  2. Edmund Grimley Evans, Approximating Context-Free Grammars with a Finite-State Calculus, 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 452-459, Madrid, 1997.[bibtex]
  3. Mark Johnson, Finite-state Approximation of Constraint-based Grammars using Left-corner Grammar Transforms, COLING-ACL '98. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Proceedings of the Conference, Montreal, 1998.[bibtex]
  4. Mark-Jan Nederhof, Regular approximation of CFLs: a grammatical view, In H. Bunt and A. Nijholt, editors, Advances in Probabilistic and other Parsing Technologies, chapter 12, pages 221-241. Kluwer Academic Publishers, 2000.[bibtext]
  5. Mark-Jan Nederhof, Context-free Parsing through regular approximation, Finite-state Methods in Natural Language Processing, pp. 13-24, Bilkent University, Ankara, Turkey, 1998.[bibtex]
  6. Fernando C. N. Pereira and Rebecca N. Wright, Finite-State Approximation of Phrase-Structure Grammars, in Finite-State Language Processing, Emmanuel Roche and Yves Schabes (eds.), pp. 149-173, MIT Press, Cambridge, 1997.[bibtex]
  7. R.E. Stearns, A Regularity Test for Pushdown Machines, Information and Control, vol. 11, pp. 323-340, 1967.[bibtex]
  8. J.S. Ullian, Partial Algorithm Problems for Context Free Languages, Information and Control, vol. 11, pp. 80-101, 1967.[bibtex]
  9. C.M. Rood, Efficient Finite-State Approximation of Context Free Grammars, in Extended Finite State Models of Language, Proceedings of the ECAI'96 workshop, pp. 58-64, Budapest University of Economic Sciences, Hungary, 1996.[bibtex]
  10. L.G. Valiant, Regularity and Related Problems for Deterministic Pushdown Automata, Journal of the ACM, 22(1), pp. 1-10, 1975.[bibtex]
  11. Mark-Jan Nederhof, Practical Experiments with Regular Approximation of Context-Free Languages, Computational Linguistics, 26(1), April, 2000.[bibtex]
  12. Mehryar Mohri, Mark-Jan Nederhof, Regular Approximation of Context-Free Grammars through Transformation, Robustness in Language and Speech Technology, J.-C. Junqua and G. van Noord (eds.), Kluwer Academic Publishers, pp. 153-163, 2001.[bibtex]

Different and efficient representations of automata

Reversed alternating finite automata r-AFA

Alternating finite automata (AFA) are an extension of NFA (nondeterministic finite automata). They are almost the same as boolean automata introduced by Brzozowski and Leiss. It has been shown that an n-state AFA is equivalent to a 2n-state NFA (and a k-state NFA is equivalent to a 2k-state DFA). AFA are backward deterministic, i.e. they are deterministic if we consider them working on an input string from the rightmost letter to the leftmost. So the use of reversed AFA (r-AFA) instead of DFA guarantees a logarithmic reduction in the number of states. However, the transition function of an r-AFA can be very complex.
  1. K. Salomaa, X. Wu, and S. Yu, Efficient Implementation of Regular Languages Using r-AFA, Second International Workshop on Implementing Automata, WIA'97, Springer Verlag LNCS 1436, pp. 176-184, 1997.[bibtex]
  2. S. Huerter, K. Salomaa, X. Wu, and S. Yu, Implementing r-AFA Operations, Third International Workshop on Implementing Automata, WIA'98, Jean-Marc Champarnaud and Denis Maurel and Djelloul Ziadi (eds.), Rouen, France, Springer Verlag LNCS 1660, pp. 69-81, 1998.[bibtex]

Cover automata

A cover automaton for a finite language L is a finite automaton that accepts all words in L and possibly other words that are longer than any word in L. In other words, a cover automaton is a cyclic automaton and a counter (the maximal length of words) replacing an acyclic automaton (since L is final, a DFA for L is acyclic) in order to reduce the number of states. The first paper introduces cover automata. The second and the third one cover efficient minimization. The fourth one investigates the number of minimal cover automata on the basis of the similarity relation. That relation is also studied in the fifth paper, while the sixth one is an attempt to extend this notion to the notion of transducer cover. The last paper features incremental construction.
  1. C. C\^{a}mpeanu, N. Santean, and S. Yu, Minimal Cover-Automata for Finite Languages, Third International Workshop on Implementing Automata, WIA'98, Jean-Marc Champarnaud and Denis Maurel and Djelloul Ziadi (eds.), Rouen, France, Springer Verlag LNCS 1660, pp. 43-56, 1998. [bibtex]
  2. A. P\u{a}un, N. Seatan, and S. Yu, An O(n2) Algorithm for Constructing Minimal Cover Automata for Finite Languages, Fifth International Conference on Implementation and Application of Automata CIAA 2000, M. Dalej and M. G. Eramian and S. Yu (eds.), London, Ontario, Canada, (to appear in Springer Verlag LNCS 2088), 2000.[bibtex]
  3. Heiko Goeman, On Minimizing Cover Automata in O(n log n) Time, proceedings of the Seventh International Conference on Implementation and Application of Automata, Jean-Marc Champarnaud and Denis Maurel (eds.), University of Tours, France, July, 2002.[bibtex]
  4. Cezar C\^{a}mpeanu and Andrei P\u{a}un, The Number of Similarity Relations and the Number of Minimal Deterministic Finite Cover Automata, proceedings of the Seventh International Conference on Implementation and Application of Automata, Jean-Marc Champarnaud and Denis Maurel (eds.), University of Tours, France, July, 2002.[bibtex]
  5. Jean-Marc Champarnaud, F. Guingne and G. Hansel, Similarity Relations and Cover Automata, RAIRO-Theoretical Informatics and Applications, 39(1), pp. 115-123, 2005. [bibtex]
  6. Jean-Marc Champarnaud, F. Guingne and G. Hansel, Cover Transducers for Functions with Finite Domain, International Journal of Foundations of Computer Science, 16(5), pp. 851--865, 2005. [bibtex]
  7. Cezar C\^{a}mpeanu and Andrei P\u{a}un, Incremental Construction of Minimal Deterministic Finite Cover Automata, Theoretical Computer Science, vol. 363, no. 2, pp. 135-148, 2006.[bibtex]

Compression of automata

Many compression methods are described in:

  1. Tomasz Kowaltowski, Cláudio L. Lucchesi, and Jorge Stolfi, Minimization of Binary Automata, First South American String Processing Workshop, Belo Horizonte, Brasil, 1993.[bibtex]

Another compression method, mostly incompatible with those above, but giving superb recognition speed (but slower for other tasks) is given in the first paper below. It is based on the second paper here.

  1. Franklin Mark Liang, Word Hy-phen-a-tion by Computer, Ph.D. thesis, Stanford, 1983.[bibtex]
  2. Robert Endre Tarjan and Andrew Chi-Chih Yao, Storing a Sparse Table, Communications of the ACM, 22(11), pp. 606-611, November, 1979.[bibtex]

Text compression with automata

There is something by Dominique Revuz, but he does not even care to list his publications.

Text Indexation

  1. Max Silberztein, Text Indexation with INTEX, Computers and the Humanities, 33(3), pp. 265-280, 1999.[bibtex]

Perfect hashing

Perfect hashing means a 1:1 mapping between a set of n unique words and numbers 1...n (or 0...n-1). There are two basic, similar ways to implement that. At each state, a number of strings recognized by the part of the automaton reachable from that state is stored. During the word to number conversion, numbers from targets of transitions preceding the transition being followed are added along the path spelling out the word. During the number to word conversion, at each state, numbers stored at targets of transitions are added to the total as long as it does not become greater than the word number. Then the next transition is followed. The numbers can be stored in transitions instead of states, but it increases the size of the automaton.

The second method is to store in each transition the number of words that are recognized along all paths that precede the current transition. In other words, a number of words that have smaller numbers than any word such that the current transition belongs to its path is stored in the current transition. This method is faster than the previous one, it can be used together with the sparse matrix representation (which also gives very fast recognition speed), and allows for transition sharing, but as there are more transitions than states (even when some transitions are shared), it takes more space.

  1. Cláudio Lucchiesi and Tomasz Kowaltowski, Applications of Finite Automata Representing Large Vocabularies, Software Practice and Experience, 23(1), pp. 15--30, Jan. 1993.[bibtex]

Spelling correction

There are two papers describing the same method. The method is based on computing the edit distance using dynamic programming.

  1. Kemal Oflazer and Cemalettin Gúzey, Spelling Correction in Agglutinative Languages, 4th Conference on Applied Natural Language Processing, pp. 194-195, Stuttgart, Germany, October, 1994.[bibtex]
  2. Kemal Oflazer, Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction, Computational Linguistics, 22(1), pp. 73--89, March, 1996.[bibtex]

Tokenization and lexical analysis

  1. Alfred V. Aho and Ravi Sethi and Jeffrey D. Ullman, Compilers. Principles, Techniques and Tools, Addison Wesley, 1986.[bibtex]
  2. M. E. Lesk and E. Schmidt, Lex - a Lexical Analyzer Generator, CS Technical Report No. 39, Bell Laboratories, 1975.[bibtex]
  3. S. C. Johnson and M. E. Lesk, Language Development Tools, Bell System Technical Journal, 57(6), 1978.[bibtex]
  4. Gregory Grefenstette and Pasi Tapanainen, What is a word, What is a sentence? Problems of Tokenization, Xerox Research Centre Europe, MLTT, MLTT-004, 1994.[bibtex]

Speech Recognition

  1. Martin Oerder and Hermann Ney, Word Graphs: An Efficient Interface Between Continuous-Speech Recognition and Language Understanding, ICASSP Volume 2, pp. 119-122, 1993.[bibtex]
  2. Mehryar Mohri and Michael Riley and Don Hindle and Andrej Ljolje and Fernando Pereira, Full expansion of context-dependent networks in large vocabulary speech recognition, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), Seattle, 1998.[bibtex]
  3. Fernando C. N. Pereira and Michael D. Riley, LSpeech Recognition by Composition of Weighted Finite Automata, 1996.[bibtex]

Morphology and phonology

The classical work (two-level and co.)

Douglas Johnson showed that phonological rules can be modeled by finite-state transducers. His work was forgotten for a long time, and the same discovery was made independently by Ronald Kaplan and Martin Kay. Their paper circulated for years as a draft until it was published in 1994.
  1. C. Douglas Johnson, Formal Aspects of Phonological Description, Mouton, The Hague, 1972.[bibtex]
  2. Ronald M. Kaplan and Martin Kay, Regular Models of Phonological Rule Systems, Computational Linguistics, 20(3), pp. 331-378, September 1994.[bibtex]
Kimmo Koskenniemi observed that phonological rules can work in parallel instead of a cascade.
  1. Kimmo Koskenniemi, Two-Level Model for Morphological Analysis, IJCAI-83, pp. 683--685, Karlsruhe, Germany, 1983.[bibtex]
  2. Kimmo Koskenniemi, A General Computational Model for Word-Form Recognition and Production, COLING-84, pp. 178--181, Stanford University, California, USA, 1984.[bibtex]
  3. Kimmo Koskenniemi, Two-level Morphology: a General Computational Model for Word-form Recognition and Production, technical report 11, Department of General Linguistics, University of Helsinki, Finland, 1983.[bibtex]
A simplified form of two-level morphology, claimed to be easier for linguists to learn, was proposed by Herbert Ruessink. A compiler for a similar formalism is described by Edmund Grimley Evans, George Anton Kiraz, and Stephen G. Pulman.
  1. Herbert Ruessink, Two-level formalisms, Working Papers in Natural Language Processing, Katholieke Universiteit Leuven, Stichting Taaltechnologie Utrecht, vol. 5, 1989.[bibtex]
  2. Edmund Grimley Evans and George Anton Kiraz and Stephen G. Pulman, Compiling a Partition-Based Two-Level Formalism, Proceedings of the 16th International Conference on Computational Linguistics (COLING), Copenhagen, 1996.[bibtex]
The papers below build on the two-level formalism and extend it (introducing unification, etc.).
  1. Graeme Ritchie and Steve Pulman and Alan Black and Graham Russel, A Computational Framework for Lexical Description, Computational Linguistics, 13(3-4), 1987.[bibtex]
  2. John Bear, A Morphological Recognizer with Syntactic and Phonological Rules, proceedings of COLING-86, Bonn, 1986.[bibtex]
  3. George Anton Kiraz, Compiling Regular Formalisms with Rule Features into Finite-State Automata, 33th Annual Meeting of the Association for Computational Linguistics, Madrid, 1997.[bibtex]
Lauri Karttunen noticed that combining phonological rules with the lexicon produces a transducer of a manageable size (actually smaller than the transducer for phonological rules).
  1. Lauri Karttunen, Constructing Lexical Transducers, COLING-94, pp. 406--411, Kyoto, Japan, 1994.[bibtex]
The papers in this group propose extensions to regular expressions and phonological rules that prove to be useful in morphology.
  1. Lauri Karttunen,The Replace Operator, ACL-95, pp. 16--23, Boston, Massachusetts, 1995.[bibtex]
  2. Lauri Karttunen, Directed Replacement, The Proceedings of the 34rd Annual Meeting of the Association for Computational Linguistics ACL-96, Santa Cruz, California, USA, 1996.[bibtex]
  3. Lauri Karttunen, Finite-state Constraints, Proceedings International Conference on Current Issues in Computational Linguistics, Universiti Sains Malaysia, Penang, 1991.
  4. [bibtex]
  5. André Kempe and Lauri Karttunen, Parallel Replacement in the Finite-State Calculus, Proceedings of the 16th International Conference on Computational Linguistics (COLING), Copenhagen, Denmark, 1996.
  6. Lauri Karttunen and Jean-Pierre Chanod and and Anne Schiller, Regular Expressions for Language Engineering, Natural Language Engineering, 2(4), pp. 305-238, 1996.[bibtex]

A short history of two-level morphology was presented in talk given by Lauri Karttunen and Kenneth R. Beesley at Finite State Methods in Natural Language Processing 2001, ESSLLI Workshop, August 20-24, Helsinki.

Including probabilities

  1. Mehryar Mohri and Richard Sproat, An Efficient Compiler for Weighted Rewrite Rules, 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, 1996.[bibtex]

With recognizers

Dictionary-based morphological analysis:

  1. Tomasz Kowaltowski, Cláudio L. Lucchesi, and Jorge Stolfi, Finite Automata and Efficient Lexicon Implementation, technical report IC-98-02, University of Campinas, Brazil, 1998.[bibtex]
The same or similar methods are also used in INTEX and Unitex.

Approximative morphological analysis based on prefixes, infixes, and endings. The method basically consists of reversing words (prefixes and infixes have to be specially treated), appending the result (the analysis), constructing a DFA for such strings, and pruning the automaton by removing all states and transitions that all lead from a certain prefix to the same result. In the recognition phase, words are reversed and looked up in the automaton. When there are no more transitions to follow, all results reachable from the last state reached are printed.

  1. Jan Daciuk, Treatment of Unknown Words, proceedings of Workshop on Implementing Automata WIA'99, Postdam, Germany, 1999.[bibtex]

One-level morphology/phonology

  1. Steven Bird and T. Mark Ellison, One-Level Phonology: Autosegmental Representations and Rules as Finite Automata, Computational Linguistics, 20(1), pp. 55-90, 1994.[bibtex]
  2. Markus Walther, One-Level Prosodic Morphology, technical report MAL-1, MAL - Marburger Arbeiten zur Linguistik, Institüt für Germanistische Sprachwissenschaft, Philipps-Universität Marburg, 1999. cs.CL/9911011. See MAL archive at http://www.uni-marburg.de/linguistik/mal/.[bibtex]
  3. Markus Walther, Finite-State Reduplication in One-Level Prosodic Morphology, First Conference of the North American Chapter of the Association for Computational Linguistics, Seattle, 2000.[bibtex]
  4. Markus Walther, Temiar Reduplication in One-Level Prosodic Morphology, Proceedings of the Fifth Workshop of the ACL Special Interest Group in Computational Phonology, pp. 13-21, Luxembourg, August 6th, 2000. cs.CL/0008015.[bibtex]

Optimality theory

  1. Robert Frank and Giorgio Satta, Optimality Theory and the Generative Complexity of Constraint Violability, Computational Linguistics, 24(2), pp. 307--315, June, 1998.[bibtex]
  2. Lauri Karttunen, The Proper Treatment of Optimality in Computational Phonology, Proceedings of Finite State Methods in Natural Language Processing, pp. 1--12, Bilkent University, Ankara, Turkey, June -- July 1998.[bibtex]
  3. Dale Gerdemann and Gertjan van Noord, Approximation and Exactness in Finite State Optimality Theory, Coling Workshop Finite State Phonology, Luxembourg, 2000.[bibtex]
  4. Markus Walther, One-Level Prosodic Morphology, technical report 1, Institüt für Germanistische Sprachwissenschaft, Philipps-Universität Marbug, 1999. cs.CL/9911011.[bibtex]
  5. Jason Eisner, Efficient Generation in Primitive Optimality Theory, 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, 1998.[bibtex]
  6. Mark T. Ellison, Phonological derivation in optimality theory, Proceedings of the 15th International Conference on Computational Linguistics (COLING), pp. 1007-1013, Kyoto, 1994.[bibtex]

Multi-tape

  1. George Anton Kiraz, Multi-Tape Two-Level Morphology: a Case Study in Semitic Non-linear Morphology, Proceedings of the 15th International Conference on Computational Linguistics, Kyoto, Japan, 1994.[bibtex]
  2. George Anton Kiraz, Multitiered Nonlinear Morphology Using Multitape Finite Automata: A Case Study on Syriac and Arabic, Computational Linguistics, 26(1), pp. 77--105, 2000.[bibtex]

Restoration of diacritics

Tagging

Conversion of a Brill's tagger into a finite-state transducer

  1. Emmanuel Roche and Yves Schabes, Deterministic Part-of-Speech Tagging with Finite-State Transducers, Computational Linguistics, 21(2), pp. 227-253, June, 1995.[bibtex]

Approximation/conversion of HMM by/into FSTs

Many statistical taggers use HMMs.
  1. André Kempe, Finite State Transducers Approximating Hidden Markov Models, in: Proc. ACL'97, Madrid, Spain, pp. 460-467, 1997.[bibtex]
  2. André Kempe, Look-Back and Look-Ahead in the Conversion of Hidden Markov Models into Finite-State Transducers, in: Proc. NeMLaP'98, Sydney, Australia, pp. 29-37, 1998.[bibtex]
Also Voutilainen, but I must get a reference.

Parsing

  1. Kemal Oflazer, Dependency Parsing with an Extended Finite State Approach, Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Maryland, USA, June, 1999.
  2. D.T. Langendoen, Finite-State Parsing of Phrase-Structure Languages and the Status of Readjustment Rules in Grammar, Linguistic Inquiry, 6(4), pp. 533-554, 1975.[bibtex]
  3. Steven Krauwer and Louis des Tombe, Transducers and grammars as theories of language, Theoretical Linguistics, 8, pp. 173--202, 1981.[bibtex]
  4. Maurice Gross, The use of Finite Automata in the Lexical Representation of Natural Language, Lecture Notes in Computer Science, Springer Verlag, Berlin, 1989.[bibtex]
  5. Maurice Gross, Local Grammars, in: Finite-State Language Processing, Emmanuel Roche and Yves Schabes (eds.), pp. 330-354, MIT Press, Cambridge, 1997.[bibtex]
  6. Fred Karlsson, Atro Voutilainen, Juha Heikkila, and Atro Anttila, Constraint Grammar, A Language-independent System for Parsing Unrestricted Text, Mouton de Gruyter, 1995.[bibtex]
  7. Emmanuel Roche, Parsing with Finite-State Transducers, in: Finite-State Language Processing, Emmanuel Roche and Yves Schabes (eds.), pp. 241-281, MIT Press, Cambridge, 1997.[bibtex]
  8. Pasi Tapanainen, Applying a Finite-State Intersection Grammar, in: Finite-State Language Processing, Emmanuel Roche and Yves Schabes (eds.), pp. 311-327, MIT Press, Cambridge, 1997.[bibtex]
  9. Atro Voutilainen and Pasi Tapanainen, Ambiguity Resolution in a Reductionist Parser, Sixth Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, 1993.[bibtex]
  10. Atro Voutilainen, Designing a (Finite-State) Parsing Grammar, in: Finite-State Language Processing, Emmanuel Roche and Yves Schabes (eds.), pp. 283-310, MIT Press, Cambridge, 1997.[bibtex]
  11. Steven Abney, Partial Parsing via Finite-State Cascades, in: Workshop on Robust Parsing; Eight European Summer School in Logic, Language and Information, John Carroll (ed.), pp. 8-15, 1995.[bibtex]
  12. Steven Abney, Parsing By Chunks, in: Principle-Based Parsing, Robert Berwick and Steven Abney and Carol Tenny (eds.), Kluwer Academic Press, Dordrecht, 1991.[bibtex]
  13. Steven Abney, Tagging and Partial Parsing, in: Corpus-Based Methods in Language and Speech, Ken Church and Steve Young and Gerrit Bloothooft (eds.), An ELSNET volume. Kluwer Academic Publishers, Dordrecht, 1996.[bibtex]
  14. Gregory Grefenstette, Light Parsing as Finite-State Filtering, EACI 1996 Workshop Extended Finite-State Models of Language, Budapest, 1996.[bibtex]
  15. Jean-Pierre Chanod and Pasi Tapanainen, A Robust Finite-State Grammar for French, Workshop on Robust Parsing, John Carroll (ed.), Prague, 1996. These proceedings are also available as Cognitive Science Research Paper #435; School of Cognitive and Computing Sciences, University of Sussex.[bibtex]
  16. Fred Karlsson and Atro Voutilainen and Juha Heikkilä and Arto Anttila, Constraint Grammar: A Language-Independent Framework for Parsing Unrestricted Text, Mouton de Gruyter, Berlin/New York, 1995.[bibtex]

Pattern Matching

  1. Jan Lehoda and Borivoj Melichar, Pattern Matching in Text Coded by Finite Translation Automaton, in Proceedings of the 7th International Multiconference Information Society IS'2004. Ljubjana: Institut Jozef Stefan, 2004, vol. D, pp. 212-214, ISBN 961-6303065-1.[bibtex]

Image processing

DNA sequencing

VLSI design

Visualization

The first two allow for visualization on screen as well as printing, the last one is a LaTeX package. There is also another LaTeX package for drawing automata:
GasTeX.
  1. Eleftherios Koutsofios and Stephen C. North, Editing graphs with dotty, AT&T Bell Laboratories, dotty User Manual, 1994.[bibtex]
  2. G. Sander, Graph Layout through the VCG Tool, in R. Tamassia and I. G. Tollis (eds.), Graph Drawing, DIMACS International Workshop GD '94, Proceedings; Lecture Notes in Computer Science, pp. 194-205, Springer Verlag, 1995.
  3. Sylvain Lombardy and Jacques Sakarovitch, Vaucanson. A package for drawing automata. Presentation and user's manual, May 30, 2002. Available at http://perso.enst.fr/~lombardy/Vauc/.[bibtex]

DAWGs

A DAWG is either a suffix automaton or a factor automaton.
  1. Maxime Crochemore and Renaud Vérin, On Compact Directed Acyclic Word Graphs, in Structures in Logic and Computer Science, a selection of essays in honor of Andrzej Ehrenfeucht, Jan Mycielski, Grzegorz Rozenberg and Arto Salomaa, eds., LNCS 1261, Springer Verlag, 1997, pp. 192--211.[bibtex]
  2. Jan Holub, Maxime Crochemore, On the Implementation of Compact DAWG's. Proceedings of the 7th Conference on Implementation and Application of Automata, University of Tours, Tours, France, July 2002, LNCS 2608, Springer Verlag, 2003, pp. 289-294.[bibtex]

FSA/FST/DAWG related sites

A large
list is maintained by Sheng Yu at the University of Western Ontario, Canada.

Researchers

There are links on (some) authors of papers in previous sections. Of course, they are not the only researchers in the field.

Conferences

Most conferences come in series:

Courses

Additional information

There is a forum for disscussion of FSA-related issues at: http://groups.yahoo.com/group/FSA-Research. To subscribe, send message to FSA-Research-subscribe@yahoogroups.com. Since egroups were taken over by Yahoo, it is more difficult to register at the new site, and registering is required to view the list archive (but you can send and receive messages without registering). They require information about sex, age, occupation etc. Registering as a newly born Bolivian woman (Bolivia just happened to be under the cursor) did not work, because the system requires the users to be at least 2 years old. Registering as a two years old Bolivian woman did not work either because the system asked me to bring my parents. Now I'm 50 years old Bolivian woman...

If you register at the Yahoo site, don't forget to write down your password. Until recently, you could rely on cookies to let the server let you in. Now Yahoo became more aggressive, and they require you to resupply your password, or to give them your personal information - Big Brother is watching you. Perhaps we should move somewhere else... I'm glad this page is not on the Yahoo server.


Send new entries/corrections to the current maintainer Jan Daciuk (not a woman) at user@domain, where user=jandac, and domain=eti.pg.gda.pl. Several people contributed to this page: Gertjan van Noord, André Kempe, Mark-Jan Nederhof, Max Silberztein, Sheng Yu, Bruce Watson, Jean-Marc Champarnaud (let me know if I forgot someone). A few others promised to contribute...
The page has been moved from Groningen to http://www.eti.pg.gda.pl/~jandac/fsm_algorithms/. Due to local redirections here, a different address may show on the address bar in your browser. However, if you bookmark this page, please use the address given above.
This page is free of Javascript.

Viewable With Any Browser