Do state machine based programs have a processing loop – or even exist

finite-state machineprogramming-languages

Short Version

If we had an application based on a state machine – transitioning between states to run the application: is repetition, and in fact the entire state machine, based on the presence of an event loop?

Something like:

void Main()
{
    while (_currentState != State.Exit)
    {
       switch (_currentState)
       case State.CharacterEntityReference: DoCharacterEntityReference();
       case State.NamedCharacterReference: DoNamedCharacterReference();
       case State.NumericCharacterReference: DoNumericCharacterReference();
       case State.AmbiguousAmpersandState: DoAmbiguousAmpersandState();
       case State.HexadecimalCharacterReferenceState: DoHexadecimalCharacterReferenceState();
       case State.DecimalCharacterStartState: DoDecimalCharacterStartState();
       case State.DecimalCharacterReferenceState: DoDecimalCharacterReferenceState();
       default: break;
    }
}

I ask because I've never seen or imagined something like that before:

  • an entire application programmed as a state machine
  • entirely based on short subroutines, and a giant switch statement.

Is this a thing? Was this ever a thing? Was this programming in the 1950s?

Long Version

I know about state machines: a data structure of nodes and edges, with your current "location", and various state transitions triggered by some event.

  • But this isn't a data structure inside the program
  • this is the program.

I was looking at the whatwg HTML 5 spec, where they document the correct parsing of HTML (i.e. into a DOM tree).

And I was reading it as though it described subroutines and methods. But I got stuck when a "function" to read a decimal string seemed like it would only read one digit and then give up:

13.2.5.79 Decimal character reference state

Consume the next input character:

  • ASCII digit: Multiply the character reference code by 10. Add a numeric version of the current input character (subtract 0x0030 from the character's code point) to the character reference code.
  • U+003B SEMICOLON: Switch to the numeric character reference end state.
  • Anything else: This is a missing-semicolon-after-character-reference parse error. Reconsume in the numeric character reference end state.

My reasonable translation of that might have been something like:

void DecimalCharacterReferenceState()
{
   Consume();
   switch (_currentInputCharacter)
   {
      AsciiDigit:
         referenceCode *= 10;
         referenceCode += (Word)currentInputCharacter - 0x0030;
         break;
      default: 
         Reconsume();
         return;
   }
}

Except in all cases that method only reads one character, and then gives up. But this is the function that's supposed to collect a whole bunch of characters. That's when I realized that the function can only run again if:

  • the state machine's loop jumps back to the top,
  • realizes we're in this state,
  • and calls our method again.

I'd never considered structuring a real program like that. Is this how Chromium, Safari, or Firefox perform their work? In a giant case statement?

Or is the State Machine model just a mental model:

13.2.5 Tokenization

Implementations must act as if they used the following state machine to tokenize HTML. The state machine must start in the data state. Most states consume a single character, which may have various side-effects, and either switches the state machine to a new state to reconsume the current input character, or switches it to a new state to consume the next character, or stays in the same state to consume the next character. Some states have more complicated behavior and can consume several characters before switching to another state. In some cases, the tokenizer state is also changed by the tree construction stage.

I'm used to the idea of a state machine being a data structure that I query, and push to a new state (like TCP connection state diagram).

But I'm not used to a state machine being the program.

Or is this just a Turing machine? (but the Turing machine didn't have a while loop)

Best Answer

It depends on where you draw the line. (Von Neumann) Computers are by definition finite state machines in an infinite loop. Most programs aren’t modeled this way because, well, it’s hella tedious and impractical to list all of the states of an integer to make math work.

But parsers very often are implemented this way, because their grammars are almost always in BNF (or some variant) which can be trivially generated into this finite state machine style code. They do it because it is very important to cover all of the possible permutations of input in an unambiguous way. Protocols have similar sort of needs, so also tend to be state machines even if they are (usually) less rigorously defined.

Related Topic