There's one thing that you can do concisely and efficiently in Java that you can't in Scala: enumerations. For everything else, even for constructs that are slow in Scala's library, you can get efficient versions working in Scala.
So, for the most part, you don't need to add Java to your code. Even for code that uses enumeration in Java, there's often a solution in Scala that is adequate or good -- I place the exception on enumerations that have extra methods and whose int constant values are used.
As for what to watch out for, here are some things.
If you use the enrich my library pattern, always convert to a class. For example:
// WRONG -- the implementation uses reflection when calling "isWord"
implicit def toIsWord(s: String) = new { def isWord = s matches "[A-Za-z]+" }
// RIGHT
class IsWord(s: String) { def isWord = s matches "[A-Za-z]+" }
implicit def toIsWord(s: String): IsWord = new IsWord(s)
Be wary of collection methods -- because they are polymorphic for the most part, JVM does not optimize them. You need not avoid them, but pay attention to it on critical sections. Be aware that for
in Scala is implemented through method calls and anonymous classes.
If using a Java class, such as String
, Array
or AnyVal
classes that correspond to Java primitives, prefer the methods provided by Java when alternatives exist. For example, use length
on String
and Array
instead of size
.
Avoid careless use of implicit conversions, as you can find yourself using conversions by mistake instead of by design.
Extend classes instead of traits. For example, if you are extending Function1
, extend AbstractFunction1
instead.
Use -optimise
and specialization to get most of Scala.
Understand what is happening: javap
is your friend, and so are a bunch of Scala flags that show what's going on.
Scala idioms are designed to improve correctness and make the code more concise and maintainable. They are not designed for speed, so if you need to use null
instead of Option
in a critical path, do so! There's a reason why Scala is multi-paradigm.
Remember that the true measure of performance is running code. See this question for an example of what may happen if you ignore that rule.
You can expand your matrix grammar into an ADT with perfect sharing with a little bit of trickery:
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
import Data.Map
import Data.Foldable
import Data.Functor
import Data.Traversable
-- | Type synonym for non-terminal symbols
type NonTerminal = String
-- | Data type for the right hand side of a production
data RHS a = DownStep NonTerminal NonTerminal NonTerminal NonTerminal | Terminal a
deriving (Eq,Ord,Show,Read,Functor, Foldable, Traversable)
data G a = G NonTerminal (Map NonTerminal (RHS a))
deriving (Eq,Ord,Show,Read,Functor)
data M a = Q (M a) (M a) (M a) (M a) | T a
deriving (Functor, Foldable, Traversable)
tabulate :: G a -> M a
tabulate (G s pm) = loeb (expand <$> pm) ! s where
expand (DownStep a11 a12 a21 a22) m = Q (m!a11) (m!a12) (m!a21) (m!a22)
expand (Terminal a) _ = T a
loeb :: Functor f => f (f b -> b) -> f b
loeb x = xs where xs = fmap ($xs) x
Here I generalized your grammars to allow for any data type, not just Int, and tabulate
will take the grammar and expand it by folding it in upon itself using loeb
.
loeb
is described in an article by Dan Piponi
The resulting expansion as an ADT physically takes no more memory than the original grammar -- in fact it takes a fair bit less, because it doesn't need the extra log-factor for the Map spine, and it doesn't need to store the strings at all.
Unlike the naive expansion, using loeb
lets me 'tie the knot' and share the thunks for all occurrences of the same non-terminal.
If you want to dip more into the theory of all of this, we can see that RHS
could be turned into a base functor:
data RHS t nt = Q nt nt nt nt | L t
and then my M type is just the fixed point of that Functor
.
M a ~ Mu (RHS a)
while G a
would consist of a chosen string and a map from strings to (RHS String a)
.
We can then expand G
into M
by lookup up the entry in a map of expanded strings lazily.
This is sort of the dual of what is done in the data-reify
package, which can take such a base functor, and something like M
and recover the moral equivalent of your G
from it. They use a different type for the non-terminal names, which is basically just an Int
.
data Graph e = Graph [(Unique, e Unique)] Unique
and provide a combinator
reifyGraph :: MuRef s => s -> IO (Graph (DeRef s))
which can be used with an appropriate instance on the above data types to get a graph (MatrixGrammar) out of an arbitrary matrix. It won't do deduplication of identical but separately stored quadrants, but it'll recover all of the sharing that is present in the original graph.
Best Answer
It depends on the platform and implementation.
C++ guarantees that the size of
char
is exactly one byte and at least 8 bits wide. Then, size of ashort int
is at least 16 bits and not smaller thanchar
. Size of anint
is at least as big as size ofshort int
. Size oflong int
is at least 32 bits and not smaller than int.sizeof(char) == 1; sizeof(long int) >= sizeof(int) >= sizeof(short int) >= sizeof(bool) >= sizeof(char).
The actual memory model of C++ is very compact and predictable though. For example there is no metadata in objects, arrays or pointers. Structures and classes are contiguous just like arrays are, but padding may be placed where necessary and needed.
Frankly though, such comparison is silly at best as the Java memory usage depends more on the Java implementation than on the code it runs.