Scala – Writing a functional and yet functional image processing library in Scala

functional programmingimage processingprogramming-languagesscala

We are developing a small image processing library for Scala (student project). The library is completely functional (i.e. no mutability). The raster of image is stored as Stream[Stream[Int]] to exploit the benefits of lazy evaluation with least efforts. However upon performing a few operations on an image the heap gets full and an OutOfMemoryError is thrown. (for example, up to 4 operations can be performed on a jpeg image sized 500 x 400, 35 kb before JVM heap runs out of space.)

The approaches we have thought of are:

Twiddling with JVM options and increase the heap size. (We don't know how to do this under IDEA – the IDE we are working with.)
Choosing a different data structure than Stream[Stream[Int]], the one which is more suited to the task of image processing. (Again we do not have much idea about the functional data structures beyond the simple List and Stream.)

The last option we have is giving up on immutability and making it a mutable library (like the popular image processing libraries), which we don't really want to do. Please suggest us some way to keep this library functional and still functional, if you know what I mean.

Thank you,
Siddharth Raina.

ADDENDUM:

For an image sized 1024 x 768, the JVM runs out of heap space even for a single mapping operation. Some example code from our test:

val image = Image from "E:/metallica.jpg"
val redded = image.map(_ & 0xff0000)
redded.display(title = "Redded")

And the output:

"C:\Program Files (x86)\Java\jdk1.6.0_02\bin\java" -Didea.launcher.port=7533 "-Didea.launcher.bin.path=C:\Program Files (x86)\JetBrains\IntelliJ IDEA Community Edition 10.0.2\bin" -Dfile.encoding=windows-1252 -classpath "C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\charsets.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\deploy.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\javaws.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\jce.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\jsse.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\management-agent.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\plugin.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\resources.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\rt.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\dnsns.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\localedata.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\sunjce_provider.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\sunmscapi.jar;C:\Program Files (x86)\Java\jdk1.6.0_02\jre\lib\ext\sunpkcs11.jar;C:\new Ph\Phoebe\out\production\Phoebe;E:\Inventory\Marvin.jar;C:\scala-2.8.1.final\lib\scala-library.jar;C:\scala-2.8.1.final\lib\scala-swing.jar;C:\scala-2.8.1.final\lib\scala-dbc.jar;C:\new Ph;C:\scala-2.8.1.final\lib\scala-compiler.jar;E:\Inventory\commons-math-2.2.jar;E:\Inventory\commons-math-2.2-sources.jar;E:\Inventory\commons-math-2.2-javadoc.jar;E:\Inventory\jmathplot.jar;E:\Inventory\jmathio.jar;E:\Inventory\jmatharray.jar;E:\Inventory\Javax Media.zip;E:\Inventory\jai-core-1.1.3-alpha.jar;C:\Program Files (x86)\JetBrains\IntelliJ IDEA Community Edition 10.0.2\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain phoebe.test.ImageTest
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at scala.collection.Iterator$class.toStream(Iterator.scala:1011)
    at scala.collection.IndexedSeqLike$Elements.toStream(IndexedSeqLike.scala:52)
    at scala.collection.Iterator$$anonfun$toStream$1.apply(Iterator.scala:1011)
    at scala.collection.Iterator$$anonfun$toStream$1.apply(Iterator.scala:1011)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:168)
    at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:168)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
    at scala.collection.immutable.Stream$$anonfun$flatten1$1$1.apply(Stream.scala:453)
    at scala.collection.immutable.Stream$$anonfun$flatten1$1$1.apply(Stream.scala:453)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565)
    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
    at scala.collection.immutable.Stream.length(Stream.scala:113)
    at scala.collection.SeqLike$class.size(SeqLike.scala:221)
    at scala.collection.immutable.Stream.size(Stream.scala:48)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:388)
    at scala.collection.immutable.Stream.toArray(Stream.scala:48)
    at phoebe.picasso.Image.force(Image.scala:85)
    at phoebe.picasso.SimpleImageViewer.<init>(SimpleImageViewer.scala:10)
    at phoebe.picasso.Image.display(Image.scala:91)
    at phoebe.test.ImageTest$.main(ImageTest.scala:14)
    at phoebe.test.ImageTest.main(ImageTest.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)

Process finished with exit code 1

Best Answer

If I understood correctly, you store each individual pixel in one Stream element, and this can be inefficient. What you can do is create your custom LazyRaster class which contains lazy references to blocks of the image of some size (for instance, 20x20). The first time some block is written, its corresponding array is initialized, and from there on changing a pixel means writing to that array.

This is more work, but may result in better performance. Furthermore, if you wish to support stacking of image operations (e.g. do a map - take - map), and then evaluating the image in "one-go", the implementation could get tricky - stream implementation is the best evidence for this.

Another thing one can do is ensure that the old Streams are being properly garbage collected. I suspect image object in your example is a wrapper for your streams. If you wish to stack multiple image operations (like mapping) together and be able to gc the references you no longer need, you have to make sure that you don't hold any references to a stream - note that this is not ensured if:

you have a reference to your image on the stack (image in the example)
your Image wrapper contains such a reference.

Without knowing more about the exact use cases, its hard to say more.

Personally, I would avoid Streams altogether, and simply use some immutable array-based data structure which is both space-efficient and avoids boxing. The only place where I potentially see Streams being used is in iterative image transformations, like convolution or applying a stack of filters. You wouldn't have a Stream of pixels, but a Stream of images, instead. This could be a nice way to express a sequence of transformations - in this case, the comments about gc in the link given above apply.

tl;dr

class C defines a class, just as in Java or C++.
object O creates a singleton object O as instance of some anonymous class; it can be used to hold static members that are not associated with instances of some class.
object O extends T makes the object O an instance of trait T; you can then pass O anywhere, a T is expected.
if there is a class C, then object C is the companion object of class C; note that the companion object is not automatically an instance of C.

Also see Scala documentation for object and class.

`object` as host of static members

Most often, you need an object to hold methods and values/variables that shall be available without having to first instantiate an instance of some class. This use is closely related to static members in Java.

object A {
  def twice(i: Int): Int = 2*i
}

You can then call above method using A.twice(2).

If twice were a member of some class A, then you would need to make an instance first:

class A() {
  def twice(i: Int): Int = 2 * i
}

val a = new A()
a.twice(2)

You can see how redundant this is, as twice does not require any instance-specific data.

`object` as a special named instance

You can also use the object itself as some special instance of a class or trait. When you do this, your object needs to extend some trait in order to become an instance of a subclass of it.

Consider the following code:

object A extends B with C {
  ...
}

This declaration first declares an anonymous (inaccessible) class that extends both B and C, and instantiates a single instance of this class named A.

This means A can be passed to functions expecting objects of type B or C, or B with C.

Additional Features of `object`

There also exist some special features of objects in Scala. I recommend to read the official documentation.

def apply(...) enables the usual method name-less syntax of A(...)
def unapply(...) allows to create custom pattern matching extractors
if accompanying a class of the same name, the object assumes a special role when resolving implicit parameters

Why hasn’t functional programming taken over yet

Because all those advantages are also disadvantages.

Stateless programs; No side effects

Real-world programs are all about side effects and mutation. When the user presses a button it's because they want something to happen. When they type in something, they want that state to replace whatever state used to be there. When Jane Smith in accounting gets married and changes her name to Jane Jones, the database backing the business process that prints her paycheque had better be all about handling that sort of mutation. When you fire the machine gun at the alien, most people do not mentally model that as the construction of a new alien with fewer hit points; they model that as a mutation of an existing alien's properties.

When the programming language concepts fundamentally work against the domain being modelled, it's hard to justify using that language.

Concurrency; Plays extremely nice with the rising multi-core technology

The problem is just pushed around. With immutable data structures you have cheap thread safety at the cost of possibly working with stale data. With mutable data structures you have the benefit of always working on fresh data at the cost of having to write complicated logic to keep the data consistent. It's not like one of those is obviously better than the other.

Programs are usually shorter and in some cases easier to read

Except in the cases where they are longer and harder to read. Learning how to read programs written in a functional style is a difficult skill; people seem to be much better at conceiving of programs as a series of steps to be followed, like a recipe, rather than as a series of calculations to be carried out.

Productivity goes up (example: Erlang)

Productivity has to go up a lot in order to justify the massive expense of hiring programmers who know how to program in a functional style.

And remember, you don't want to throw away a working system; most programmers are not building new systems from scratch, but rather maintaining existing systems, most of which were built in non-functional languages. Imagine trying to justify that to shareholders. Why did you scrap your existing working payroll system to build a new one at the cost of millions of dollars? "Because functional programming is awesome" is unlikely to delight the shareholders.

Imperative programming is a very old paradigm (as far as I know) and possibly not suitable for the 21th century

Functional programming is very old too. I don't see how the age of the concept is relevant.

Don't get me wrong. I love functional programming, I joined this team because I wanted to help bring concepts from functional programming into C#, and I think that programming in an immutable style is the way of the future. But there are enormous costs to programming in a functional style that can't simply be wished away. The shift towards a more functional style is going to happen slowly and gradually over a period of decades. And that's what it will be: a shift towards a more functional style, not a wholesale embracing of the purity and beauty of Haskell and the abandoning of C++.

I build compilers for a living and we are definitely embracing a functional style for the next generation of compiler tools. That's because functional programming is fundamentally a good match for the sorts of problems we face. Our problems are all about taking in raw information -- strings and metadata -- and transforming them into different strings and metadata. In situations where mutations occur, like someone is typing in the IDE, the problem space inherently lends itself to functional techniques such as incrementally rebuilding only the portions of the tree that changed. Many domains do not have these nice properties that make them obviously amenable to a functional style.

Best Answer

Related Solutions

Scala – Difference between object and class in Scala

tl;dr

object as host of static members

object as a special named instance

Additional Features of object

Why hasn’t functional programming taken over yet

Related Topic

`object` as host of static members

`object` as a special named instance

Additional Features of `object`