Scala – What are the disadvantages to declaring Scala case classes

case-classscala

If you're writing code that's using lots of beautiful, immutable data structures, case classes appear to be a godsend, giving you all of the following for free with just one keyword:

Everything immutable by default
Getters automatically defined
Decent toString() implementation
Compliant equals() and hashCode()
Companion object with unapply() method for matching

But what are the disadvantages of defining an immutable data structure as a case class?

What restrictions does it place on the class or its clients?

Are there situations where you should prefer a non-case class?

Best Answer

First the good bits:

Everything immutable by default

Yes, and can even be overridden (using var) if you need it

Getters automatically defined

Possible in any class by prefixing params with val

Decent toString() implementation

Yes, very useful, but doable by hand on any class if necessary

Compliant equals() and hashCode()

Combined with easy pattern-matching, this is the main reason that people use case classes

Companion object with unapply() method for matching

Also possible to do by hand on any class by using extractors

This list should also include the uber-powerful copy method, one of the best things to come to Scala 2.8

Then the bad, there are only a handful of real restrictions with case classes:

You can't define `apply` in the companion object using the same signature as the compiler-generated method

In practice though, this is rarely a problem. Changing behaviour of the generated apply method is guaranteed to surprise users and should be strongly discouraged, the only justification for doing so is to validate input parameters - a task best done in the main constructor body (which also makes the validation available when using copy)

You can't subclass

True, though it's still possible for a case class to itself be a descendant. One common pattern is to build up a class hierarchy of traits, using case classes as the leaf nodes of the tree.

It's also worth noting the sealed modifier. Any subclass of a trait with this modifier must be declared in the same file. When pattern-matching against instances of the trait, the compiler can then warn you if you haven't checked for all possible concrete subclasses. When combined with case classes this can offer you a very high level level of confidence in your code if it compiles without warning.

As a subclass of Product, case classes can't have more than 22 parameters

No real workaround, except to stop abusing classes with this many params :)

Also...

One other restriction sometimes noted is that Scala doesn't (currently) support lazy params (like lazy vals, but as parameters). The workaround to this is to use a by-name param and assign it to a lazy val in the constructor. Unfortunately, by-name params don't mix with pattern matching, which prevents the technique being used with case classes as it breaks the compiler-generated extractor.

This is relevant if you want to implement highly-functional lazy data structures, and will hopefully be resolved with the addition of lazy params to a future release of Scala.

Related Solutions

Scala – the difference between a var and val definition in Scala

As so many others have said, the object assigned to a val cannot be replaced, and the object assigned to a var can. However, said object can have its internal state modified. For example:

class A(n: Int) {
  var value = n
}

class B(n: Int) {
  val value = new A(n)
}

object Test {
  def main(args: Array[String]) {
    val x = new B(5)
    x = new B(6) // Doesn't work, because I can't replace the object created on the line above with this new one.
    x.value = new A(6) // Doesn't work, because I can't replace the object assigned to B.value for a new one.
    x.value.value = 6 // Works, because A.value can receive a new object.
  }
}

So, even though we can't change the object assigned to x, we could change the state of that object. At the root of it, however, there was a var.

Now, immutability is a good thing for many reasons. First, if an object doesn't change internal state, you don't have to worry if some other part of your code is changing it. For example:

x = new B(0)
f(x)
if (x.value.value == 0)
  println("f didn't do anything to x")
else
  println("f did something to x")

This becomes particularly important with multithreaded systems. In a multithreaded system, the following can happen:

x = new B(1)
f(x)
if (x.value.value == 1) {
  print(x.value.value) // Can be different than 1!
}

If you use val exclusively, and only use immutable data structures (that is, avoid arrays, everything in scala.collection.mutable, etc.), you can rest assured this won't happen. That is, unless there's some code, perhaps even a framework, doing reflection tricks -- reflection can change "immutable" values, unfortunately.

That's one reason, but there is another reason for it. When you use var, you can be tempted into reusing the same var for multiple purposes. This has some problems:

It will be more difficult for people reading the code to know what is the value of a variable in a certain part of the code.
You may forget to re-initialize the variable in some code path, and end up passing wrong values downstream in the code.

Simply put, using val is safer and leads to more readable code.

We can, then, go the other direction. If val is that better, why have var at all? Well, some languages did take that route, but there are situations in which mutability improves performance, a lot.

For example, take an immutable Queue. When you either enqueue or dequeue things in it, you get a new Queue object. How then, would you go about processing all items in it?

I'll go through that with an example. Let's say you have a queue of digits, and you want to compose a number out of them. For example, if I have a queue with 2, 1, 3, in that order, I want to get back the number 213. Let's first solve it with a mutable.Queue:

def toNum(q: scala.collection.mutable.Queue[Int]) = {
  var num = 0
  while (!q.isEmpty) {
    num *= 10
    num += q.dequeue
  }
  num
}

This code is fast and easy to understand. Its main drawback is that the queue that is passed is modified by toNum, so you have to make a copy of it beforehand. That's the kind of object management that immutability makes you free from.

Now, let's covert it to an immutable.Queue:

def toNum(q: scala.collection.immutable.Queue[Int]) = {
  def recurse(qr: scala.collection.immutable.Queue[Int], num: Int): Int = {
    if (qr.isEmpty)
      num
    else {
      val (digit, newQ) = qr.dequeue
      recurse(newQ, num * 10 + digit)
    }
  }
  recurse(q, 0)
}

Because I can't reuse some variable to keep track of my num, like in the previous example, I need to resort to recursion. In this case, it is a tail-recursion, which has pretty good performance. But that is not always the case: sometimes there is just no good (readable, simple) tail recursion solution.

Note, however, that I can rewrite that code to use an immutable.Queue and a var at the same time! For example:

def toNum(q: scala.collection.immutable.Queue[Int]) = {
  var qr = q
  var num = 0
  while (!qr.isEmpty) {
    val (digit, newQ) = qr.dequeue
    num *= 10
    num += digit
    qr = newQ
  }
  num
}

This code is still efficient, does not require recursion, and you don't need to worry whether you have to make a copy of your queue or not before calling toNum. Naturally, I avoided reusing variables for other purposes, and no code outside this function sees them, so I don't need to worry about their values changing from one line to the next -- except when I explicitly do so.

Scala opted to let the programmer do that, if the programmer deemed it to be the best solution. Other languages have chosen to make such code difficult. The price Scala (and any language with widespread mutability) pays is that the compiler doesn't have as much leeway in optimizing the code as it could otherwise. Java's answer to that is optimizing the code based on the run-time profile. We could go on and on about pros and cons to each side.

Personally, I think Scala strikes the right balance, for now. It is not perfect, by far. I think both Clojure and Haskell have very interesting notions not adopted by Scala, but Scala has its own strengths as well. We'll see what comes up on the future.

Scala – Case objects vs Enumerations in Scala

One big difference is that Enumerations come with support for instantiating them from some name String. For example:

object Currency extends Enumeration {
   val GBP = Value("GBP")
   val EUR = Value("EUR") //etc.
}

Then you can do:

val ccy = Currency.withName("EUR")

This is useful when wishing to persist enumerations (for example, to a database) or create them from data residing in files. However, I find in general that enumerations are a bit clumsy in Scala and have the feel of an awkward add-on, so I now tend to use case objects. A case object is more flexible than an enum:

sealed trait Currency { def name: String }
case object EUR extends Currency { val name = "EUR" } //etc.

case class UnknownCurrency(name: String) extends Currency

So now I have the advantage of...

trade.ccy match {
  case EUR                   =>
  case UnknownCurrency(code) =>
}

As @chaotic3quilibrium pointed out (with some corrections to ease reading):

Regarding "UnknownCurrency(code)" pattern, there are other ways to handle not finding a currency code string than "breaking" the closed set nature of the Currency type. UnknownCurrency being of type Currency can now sneak into other parts of an API.

It's advisable to push that case outside Enumeration and make the client deal with an Option[Currency] type that would clearly indicate there is really a matching problem and "encourage" the user of the API to sort it out him/herself.

To follow up on the other answers here, the main drawbacks of case objects over Enumerations are:

Can't iterate over all instances of the "enumeration". This is certainly the case, but I've found it extremely rare in practice that this is required.
Can't instantiate easily from persisted value. This is also true but, except in the case of huge enumerations (for example, all currencies), this doesn't present a huge overhead.