Memory Usage – Analyzing Memory Usage in Java vs C++

algorithmscdata structuresjavamemory

How does the memory usage of an integer object written in Java compare\contrast with the memory usage of a integer object written in C++? Is the difference negligible? No difference? A big difference? I'm guessing it's the same because an int is an int regardless of the language (?)

The reason why I asked this is because I was reading about the importance of knowing when a program's memory requirements will prevent the programmer from solving a given problem.

What fascinated me is the amount of memory required for creating a single Java object. Take for example, an integer object. Correct me if I'm wrong but a Java integer object requires 24 bytes of memory:

4 bytes for its int instance variable
16 bytes of overhead (reference to the object's class, garbage collection info & synchronization info)
4 bytes of padding

As another example, a Java array (which is implemented as an object) requires 48+bytes:

24 bytes of header info
16 bytes of object overhead
4 bytes for length
4 bytes for padding
plus the memory needed to store the values

How do these memory usages compare with the same code written in C++?

I used to be oblivious about the memory usage of the C++ and Java programs I wrote, but now that I'm beginning to learn about algorithms, I'm having a greater appreciation for the computer's resources.

Best Answer

It depends on the platform and implementation.

C++ guarantees that the size of char is exactly one byte and at least 8 bits wide. Then, size of a short int is at least 16 bits and not smaller than char. Size of an int is at least as big as size of short int. Size of long int is at least 32 bits and not smaller than int.

sizeof(char) == 1; sizeof(long int) >= sizeof(int) >= sizeof(short int) >= sizeof(bool) >= sizeof(char).

The actual memory model of C++ is very compact and predictable though. For example there is no metadata in objects, arrays or pointers. Structures and classes are contiguous just like arrays are, but padding may be placed where necessary and needed.

Frankly though, such comparison is silly at best as the Java memory usage depends more on the Java implementation than on the code it runs.

Related Solutions

Scala vs Java – Performance Comparison

There's one thing that you can do concisely and efficiently in Java that you can't in Scala: enumerations. For everything else, even for constructs that are slow in Scala's library, you can get efficient versions working in Scala.

So, for the most part, you don't need to add Java to your code. Even for code that uses enumeration in Java, there's often a solution in Scala that is adequate or good -- I place the exception on enumerations that have extra methods and whose int constant values are used.

As for what to watch out for, here are some things.

If you use the enrich my library pattern, always convert to a class. For example:

// WRONG -- the implementation uses reflection when calling "isWord"
implicit def toIsWord(s: String) = new { def isWord = s matches "[A-Za-z]+" }

// RIGHT
class IsWord(s: String) { def isWord = s matches "[A-Za-z]+" }
implicit def toIsWord(s: String): IsWord = new IsWord(s)

Be wary of collection methods -- because they are polymorphic for the most part, JVM does not optimize them. You need not avoid them, but pay attention to it on critical sections. Be aware that for in Scala is implemented through method calls and anonymous classes.
If using a Java class, such as String, Array or AnyVal classes that correspond to Java primitives, prefer the methods provided by Java when alternatives exist. For example, use length on String and Array instead of size.
Avoid careless use of implicit conversions, as you can find yourself using conversions by mistake instead of by design.
Extend classes instead of traits. For example, if you are extending Function1, extend AbstractFunction1 instead.
Use -optimise and specialization to get most of Scala.
Understand what is happening: javap is your friend, and so are a bunch of Scala flags that show what's going on.
Scala idioms are designed to improve correctness and make the code more concise and maintainable. They are not designed for speed, so if you need to use null instead of Option in a critical path, do so! There's a reason why Scala is multi-paradigm.
Remember that the true measure of performance is running code. See this question for an example of what may happen if you ignore that rule.

Haskell Memory Efficiency – Best Approaches

You can expand your matrix grammar into an ADT with perfect sharing with a little bit of trickery:

{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}

import Data.Map
import Data.Foldable
import Data.Functor
import Data.Traversable

-- | Type synonym for non-terminal symbols
type NonTerminal = String

-- | Data type for the right hand side of a production
data RHS a = DownStep NonTerminal NonTerminal NonTerminal NonTerminal | Terminal a
  deriving (Eq,Ord,Show,Read,Functor, Foldable, Traversable)

data G a = G NonTerminal (Map NonTerminal (RHS a))
  deriving (Eq,Ord,Show,Read,Functor)

data M a = Q (M a) (M a) (M a) (M a) | T a
  deriving (Functor, Foldable, Traversable)

tabulate :: G a -> M a
tabulate (G s pm) = loeb (expand <$> pm) ! s where
  expand (DownStep a11 a12 a21 a22) m = Q (m!a11) (m!a12) (m!a21) (m!a22)
  expand (Terminal a)               _ = T a

loeb :: Functor f => f (f b -> b) -> f b
loeb x = xs where xs = fmap ($xs) x

Here I generalized your grammars to allow for any data type, not just Int, and tabulate will take the grammar and expand it by folding it in upon itself using loeb.

loeb is described in an article by Dan Piponi

The resulting expansion as an ADT physically takes no more memory than the original grammar -- in fact it takes a fair bit less, because it doesn't need the extra log-factor for the Map spine, and it doesn't need to store the strings at all.

Unlike the naive expansion, using loeb lets me 'tie the knot' and share the thunks for all occurrences of the same non-terminal.

If you want to dip more into the theory of all of this, we can see that RHS could be turned into a base functor:

data RHS t nt = Q nt nt nt nt | L t

and then my M type is just the fixed point of that Functor.

M a ~ Mu (RHS a)

while G a would consist of a chosen string and a map from strings to (RHS String a).

We can then expand G into M by lookup up the entry in a map of expanded strings lazily.

This is sort of the dual of what is done in the data-reify package, which can take such a base functor, and something like M and recover the moral equivalent of your G from it. They use a different type for the non-terminal names, which is basically just an Int.

data Graph e = Graph [(Unique, e Unique)] Unique

and provide a combinator

reifyGraph :: MuRef s => s -> IO (Graph (DeRef s))

which can be used with an appropriate instance on the above data types to get a graph (MatrixGrammar) out of an arbitrary matrix. It won't do deduplication of identical but separately stored quadrants, but it'll recover all of the sharing that is present in the original graph.

Best Answer

Related Solutions

Scala vs Java – Performance Comparison

Haskell Memory Efficiency – Best Approaches

Related Topic