Apache-spark – How to check for empty RDD in PySpark

apache-sparkpysparkrddspark-streaming

tweetStream.foreachRDD((rdd, time) => {
  val count = rdd.count()
  if (count > 0) {
    var fileName =  outputDirectory + "/tweets_" + time.milliseconds.toString    
    val outputRDD = rdd.repartition(partitionsEachInterval) 
    outputRDD.saveAsTextFile(fileName) 
}

I am trying to check count value or empty RDD in stream data in python way , hardy finding ways, also tried examples from the below link.
http://spark.apache.org/docs/latest/streaming-programming-guide.html

Best Answer

RDD.isEmpty:

Returns true if and only if the RDD contains no elements at all.

sc.range(0, 0).isEmpty()
True
sc.range(0, 1).isEmpty()
False
Related Topic