I would recommend OCaml.
In my personal point of view, the main basis of modern¹ functional programmings are higher-order functions, a static type system, and algebraic datatypes and pattern matching.
Between a Scheme, a ML and a Haskell, I would choose the ML because I think it's the most relevant to this definition. Scheme doesn't have static typing (there is Typed Racket, but it's not for the scheme beginners), and Haskell has too much other stuff (monads, lazy evaluation...) that, while interesting, may divert attention from the important basis.
SML and OCaml are equally interesting; I'm more used to OCaml, and it has a more "practical" feeling that is nice (but if you really want "practical" to the risk of losing your soul, you may as well pick F#).
Scheme and Haskell are also very interesting languages. Scheme emphasis on macros is interesting, but not directly related to functional programming (it's another of the things in the world you should definitely try, as well as logic programming, stack-based languages, and capability-oriented E).
Haskell is definitely a great language and, I think, a mandatory point for the aspiring gurus of functional programming. But as the core languages of OCaml and Haskell are very much similar (except for lazy evaluation that is distracting for the beginner), it's easy to learn Haskell once you know the basics of OCaml. Or rather, you can concentrate on the weird stuff, and you don't have to assimilate the basics at the same time.
Similarly, once you have seen OCaml, and possibly also Haskell, and still want to learn more, you should look at Coq or Agda. Yet few would recommend Coq or Agda for a first introduction to functional programming...
To make my point clear : I think that learning OCaml (or SML) then Haskell will make you as good a functional programmer as learning Haskell directly, but more easily (or less painfully).
Besides, OCaml and Haskell both have good things differentiating them, and it's interesting to know about the advanced features of both. Just learning Haskell is poorer in that aspect (though of course you could learn OCaml after Haskell; I think it's less logical and will get you more frustrated).
For learning OCaml I would recommend Jason Hickey's book draft (PDF).
¹this definition is controversial. Some Scheme folks will claim static typing has nothing to do with functional programming. Some Haskell people will claim that their definition of purity ("what Haskell does, but no more") is a sine qua non condition for being a functional language. I agree to disagree.
Other than the fact that Erlang was specifically developed to be run in concurrent/parallelized/distributed situations, the two main techniques that it employs making this possible are:
No side effects
This means, when you give a function a piece of data to execute against, it will not except in very strict cases affect anything else in the system/running process. This means that if you execute a function 300 times all at once concurrently, none of those 300 executions of the function will effect any of the others.
The implementation technique for ensuring no side effects is called "immutability" which roughly means, may not be mutated(changed). This means that as soon as you create a variable, the value of that variable may not be modified. Erlang implements this behavior with "single assignment" so after you assign a value to a variable, you may not assign a value to it again.
X = 1.
X = 2. // This is not a valid operation
This ensures no code may accidentally change the value of X causing a race condition, therefore it is inherently thread-safe and concurrent use becomes trivial. This is a very uncommon behavior among software languages and the biggest way Erlang manages to be so well suited for concurrent execution.
The actor model
This is a particular way of modelling that has shown to make the implementation and management of concurrent processing very simple for developers. Straight from Wikipedia:
The Actor model adopts the philosophy that everything is an actor.
This is similar to the everything is an object philosophy used by some
object-oriented programming languages, but differs in that
object-oriented software is typically executed sequentially, while the
Actor model is inherently concurrent. An actor is a computational
entity that, in response to a message it receives, can concurrently:
send a finite number of messages to other actors; create a finite
number of new actors; designate the behavior to be used for the next
message it receives. There is no assumed sequence to the above actions
and they could be carried out in parallel. Decoupling the sender from
communications sent was a fundamental advance of the Actor model
enabling asynchronous communication and control structures as patterns
of passing messages.
Best Answer
The only question I have is what is your web service doing? If the web service is truly a functional problem, then Haskell will be a better fit.
Erlang isn't necessarily a functional language. It's a procedural language with a very strong execution model for massively parallel systems. It was designed for the telecom industry, and it would definitely make an excellent fit for responding to web service requests.
See this page* for an overview of the differences between procedural and functional programming. (Apologies in advance for the ugly black on cyan page).
If your web service is doing a fair amount of pattern matching and applying rules, then Haskell is your choice. If you just want a scalable infrastructure that isn't too different from the languages you probably already know, choose Erlang.
(* link via Wayback machine. The original file has been removed)