All of your logic is sound, except that I think your understanding of functional programming is a bit too extreme. In the real world functional programming, just like object-oriented, or imperative programming is about mindset and how you approach the problem. You can still write programs in the spirit of functional programming while modifying application state.
In fact, you have to modify application state to actually do anything. The Haskell guys will tell you their programs are 'pure' because they wrap all of their state changes in a monad. However, their programs still do interact with the outside world. (Otherwise what is the point!)
Functional programming emphasis "no side effects" when it makes sense. However, to do real-world programming, like you said, you do need to modify the state of the world. (For example, responding to events, writing to disk, and so on.)
For more information on asynchronous programming in functional languages, I strongly urge you to look into F#'s Asynchronous Workflows programming model. It allows you to write functional programs while hiding all the messy details of thread transition within a library. (In a manner very similar to Haskell style monads.)
If the 'body' of the thread simply computes a value, then spawning multiple threads and having them compute values in parallel is still within the functional paradigm.
The one thing that is unequivocally bad practice here is claiming that something is a pure function when it isn't.
If mutable variables are used in a way that is truly and completely self-contained, the function is externally pure and everyone is happy. Haskell in fact supports this explicitly, with the type system even ensuring that mutable references can't be used outside the function that creates them.
That said, I think talking about "side effects" is not the best way to look at it (and is why I said "pure" above). Anything that creates a dependency between the function and external state makes things harder to reason about, and that includes things like knowing the current time or using concealed mutable state in a non-thread-safe way.
Best Answer
Writing your functions/methods without side effects - so they're pure functions - makes it easier to reason about the correctness of your program.
It also makes it easy to compose those functions to create new behaviour.
It also makes certain optimisations possible, where the compiler can for instance memoise the results of functions, or use Common Subexpression Elimination.
Edit: at Benjol's request: Because a lot of your state's stored in the stack (data flow, not control flow, as Jonas has called it here), you can parallelise or otherwise reorder the execution of those parts of your computation that are independent of each other. You can easily find those independent parts because one part doesn't provide inputs to the other.
In environments with debuggers that let you roll back the stack and resume computing (like Smalltalk), having pure functions means that you can very easily see how a value changes, because the previous states are available for inspection. In a mutation-heavy calculation, unless you explicitly add do/undo actions to your structure or algorithm, you cannot see the history of the computation. (This ties back to the first paragraph: writing pure functions makes it easier to inspect the correctness of your program.)