I am trying to map RDD to pairRDD in scala, so I could use reduceByKey later. Here is what I did:
userRecords is of org.apache.spark.rdd.RDD[UserElement]
I try to create a pairRDD from userRecords like below:
val userPairs: PairRDDFunctions[String, UserElement] = userRecords.map { t =>
val nameKey: String = t.getName()
(nameKey, t)
}
However, I got the error:
type mismatch; found : org.apache.spark.rdd.RDD[(String, com.mypackage.UserElement)]
required: org.apache.spark.rdd.PairRDDFunctions[String,com.mypackage.UserElement]
What am I missing here? Thanks a lot!
Best Answer
You don't need to do that as it is done via implicits (explicitly
rddToPairRDDFunctions
). Any RDD that is of typeTuple2[K,V]
can automatically be used as aPairRDDFunctions
. If you REALLY want to, you can explicitly do what theimplicit
does and wrap the RDD in aPairRDDFunction
: