Apache-spark – How does Spark paralellize slices to tasks/executors/workers

apache-spark

I have a 2-node Spark cluster with 4 cores per node.

        MASTER
(Worker-on-master)              (Worker-on-node1)

Spark config:

  • slaves: master, node1
  • SPARK_WORKER_INSTANCES=1

I am trying to understand Spark's paralellize behaviour. The sparkPi example has this code:

val slices = 8  // my test value for slices
val n = 100000 * slices
val count = spark.parallelize(1 to n, slices).map { i =>
  val x = random * 2 - 1
  val y = random * 2 - 1
  if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)

As per documentation:

Spark will run one task for each slice of the cluster. Typically you want 2-4 slices for each CPU in your cluster.

I set slices to be 8 which means the working set will be divided among 8 tasks on the cluster, in turn each worker node gets 4 tasks (1:1 per core)

Questions:

  1. Where can I see task level details? Inside executors I don't see task breakdown so I can see the effect of slices on the UI.

  2. How to programmatically find the working set size for the map function above? I assume it is n/slices (100000 above)

  3. Are the multiple tasks run by an executor run sequentially or paralell in multiple threads?

  4. Reasoning behind 2-4 slices per CPU.

  5. I assume ideally we should tune SPARK_WORKER_INSTANCES to correspond to number of cores in each node (in a homogeneous cluster) so that each core gets its own executor and task (1:1:1)

Best Answer

I will try to answer your question as best I can:

1.- Where can I see task level details?

When submitting a job, Spark stores information about the task breakdown on each worker node, apart from the master. This data is stored, I believe (I have only tested with Spark for EC2), on the work folder under the spark directory.

2.- How to programmatically find the working set size for the map function?

Although I am not sure if it stores the size in memory of the slices, the logs mentioned on the first answer provide information about the amount of lines each RDD partition contains.

3.- Are the multiple tasks run by an executor run sequentially or paralelly in multiple threads?

I believe diferent tasks inside a node run sequentially. This is shown on the logs indicated above, which indicate the start and end time of every task.

4.- Reasoning behind 2-4 slices per CPU

Some nodes finish their tasks faster than others. Having more slices than available cores distributes the tasks in a balanced way avoiding long processing time due to slower nodes.