Add some line feeds and it becomes fairly self-evident:
case state is
when SO =>
state <= S1;
when S1 =>
state <= S2;
when S2 =>
state <= S0;
end case;
What it does is schedules state
to be set to the next state, rotating from 2 back to 0. So S0->S1->S2->S0... If it is not within a clocked process, it will go as fast as it can forever, which is most likely not what you want to happen.
By omitting the "after" clause from the first assignment in your sequence of assignments (known as a waveform), you are implicitly implying "after 1 delta cycle". Delta cycles are the key to how the VHDL simulation is performed. A delta cycle can be thought of as an infinitesimally small delay, but in reality that is a gross simplification. So in your case this infinitesimally small delay is why you don't see the positive duration of the clock cycle last for very long, although this behaviour will vary slightly with different simulators.
I would suggest that you do some further reading on delta cycles and follow it up with reading your simulator's manual to understand their implementation.
Edit:
A concurrent statement is evaluated every time a signal on the right hand side changed. It may help to think about concurrent statements such as these in their equivalent process form. A concurrent statement is equivalent to a process that is sensitive to the signals it references on the right hand side of the statement. If there are no referenced signals then it is equivalent to a process with an empty sensitivity list and a final wait statement.
The "clk <= '0', not clk after 50ns" example is evaluated every time clk changed, clk will be scheduled with '0' after one delta cycle and "not clk" (ie '1') after 50ns. At 50ns clk has changed and so the statement is evaluated again, at which point clk is scheduled with '0' after one delta cycle and "not clk" 50ns later. Therefore the duration between when clk is '1' and '0' is one delta cycle.
The "clk <= '0', '1' after 50ns" example is only evaluated once because there are no signals on the right hand side for it be sensitive to. Therefore clk is scheduled with '0' after one delta cycle and '1' after 50ns at which point no further changes are scheduled.
Best Answer
There are many ways to answer your question, but the most important thing to remember is that VHDL was developed by a U.S. Department of Defense committee and therefore we should not expect things like logic or reason-- which is ironic because we're talking about logic.
VHDL is a "strongly typed" language. Normally in a strongly typed language there are many different data types that are similar to each other, but differ only in usage. For example, you could use an integer, SLV, or enumerated data type for a state machine variable. But for most situations only an enum works "optimally".
This method helps to prevent the programmer from introducing bugs, while giving the compiler the most freedom to create optimized code/logic. With a state-variable, the compiler can choose the most optimal state-encoding (one-hot, binary, etc.). If the state variable were implemented as an integer or SLV, the compiler would not have that ability. Also, with integer or SLV you could easily inadvertently assign invalid states to the state variable, creating bugs that can be hard to diagnose.
The same is true for Boolean, although it is harder to see the benefit. It forces the writer to be explicit where, without strong data typing, the code would be less "obvious". Essentially, it's harder to make a mistake with Boolean.
The side effect of strong data types is that some things are more "wordy" in VHDL. It's the price we pay.
But to directly answer your questions...
"When does 0/1 not cut it?" Technically, 0/1 would work. The main problem with 0/1 is that it doesn't tell you which state is "true". Is that signal active low, or active high? You can't get confused with the Boolean TRUE/FALSE. But in all other cases, 0/1 would be perfectly fine. Boolean is there, not for technical reasons, but for logistical programming reasons.
"Is Boolean implemented differently?" Not really. The logic generated isn't any different than the well-written alternative. Again, it's mostly there to keep the programmer from inadvertently doing something stupid.
I should also point out that when discussing strongly vs. weakly typed programming languages things quickly degenerate into a philosophical debate-- and the nuances of this debate cannot be fully understood unless you've written lots of code (1+ million lines of code). Another thing that is frequently overlooked is that there are levels of strongly and weakly typed data. I would describe C and being weakly typed, and C++ as being strongly typed, although tables on Wikipedia show almost every language as strongly typed.