How can I increase the memory available for Apache spark executor nodes?
I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the moment on 1 machine, so the driver and executor are on the same machine. The machine has 8 GB of memory.
When I try count the lines of the file after setting the file to be cached in memory I get these errors:
2014-10-25 22:25:12 WARN CacheManager:71 - Not enough space to cache partition rdd_1_1 in memory! Free memory is 278099801 bytes.
I looked at the documentation here and set spark.executor.memory
to 4g
in $SPARK_HOME/conf/spark-defaults.conf
The UI shows this variable is set in the Spark Environment. You can find screenshot here
However when I go to the Executor tab the memory limit for my single Executor is still set to 265.4 MB. I also still get the same error.
I tried various things mentioned here but I still get the error and don't have a clear idea where I should change the setting.
I am running my code interactively from the spark-shell
Best Answer
Since you are running Spark in local mode, setting
spark.executor.memory
won't have any effect, as you have noticed. The reason for this is that the Worker "lives" within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by settingspark.driver.memory
to something higher, for example 5g. You can do that by either:setting it in the properties file (default is
$SPARK_HOME/conf/spark-defaults.conf
),or by supplying configuration setting at runtime
Note that this cannot be achieved by setting it in the application, because it is already too late by then, the process has already started with some amount of memory.
The reason for 265.4 MB is that Spark dedicates spark.storage.memoryFraction * spark.storage.safetyFraction to the total amount of storage memory and by default they are 0.6 and 0.9.
So be aware that not the whole amount of driver memory will be available for RDD storage.
But when you'll start running this on a cluster, the
spark.executor.memory
setting will take over when calculating the amount to dedicate to Spark's memory cache.