I'm looking to flatten an RDD of tuples (using a no-op map), but I'm getting a type error:
val fromTuples = sc.parallelize( List((1,"a"), (2, "b"), (3, "c")) )
val flattened = fromTuples.flatMap(x => x)
println(flattened.collect().toNiceString)
Gives
error: type mismatch;
found : (Int, String)
required: TraversableOnce[?]val flattened = fromMap.flatMap(x => x)
The equivalent list of List
s or Array
s work fine, e.g.:
val fromList = sc.parallelize(List(List(1, 2), List(3, 4)))
val flattened = fromList.flatMap(x => x)
println(flattened.collect().toNiceString)
Can Scala handle this? If not, why not?
Best Answer
Tuples aren't collections. Unlike Python, where a tuple is essentially just an immutable list, a tuple in Scala is more like a class (or more like a Python
namedtuple
). You can't "flatten" a tuple, because it's a heterogeneous group of fields.You can convert a tuple to something iterable by calling
.productIterator
on it, but what you get back is anIterable[Any]
. You can certainly flatten such a thing, but you've lost all compile-time type protection that way. (Most Scala programmers shudder at the thought of a collection of typeAny
.)