Scala – How to access elemens in Row RDD in SCALA

apache-sparkscala

My row RDD looks like this:

Array[org.apache.spark.sql.Row] = Array([1,[example1,WrappedArray([**Standford,Organisation,NNP], [is,O,VP], [good,LOCATION,ADP**])]])

I have got this from converting dataframe to rdd, dataframe schema was :

root
 |-- article_id: long (nullable = true)
 |-- sentence: struct (nullable = true)
 |    |-- sentence: string (nullable = true)
 |    |-- attributes: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- tokens: string (nullable = true)
 |    |    |    |-- ner: string (nullable = true)
 |    |    |    |-- pos: string (nullable = true)

Now how do access elements in row rdd, in dataframe I can use df.select("sentence"). I am looking forward to access elements like stanford/other nested elements.

Best Answer

As @SarveshKumarSingh wrote in a comment you can access a the rows in a RDD[Row] like you would access any other element in an RDD. Accessing the elements in the row can be done in a couple of ways. Either simply call get like this:

rowRDD.map(row => row.get(2).asInstanceOf[MyType])

or if it is a build in type, you can avoid the type cast:

rowRDD.map(row => row.getList(4))

or you might want to simply use pattern matching, like:

rowRDD.map{case Row(field1: Long, field2: MyType) => field2}

I hope this helps :)

tl;dr

class C defines a class, just as in Java or C++.
object O creates a singleton object O as instance of some anonymous class; it can be used to hold static members that are not associated with instances of some class.
object O extends T makes the object O an instance of trait T; you can then pass O anywhere, a T is expected.
if there is a class C, then object C is the companion object of class C; note that the companion object is not automatically an instance of C.

Also see Scala documentation for object and class.

`object` as host of static members

Most often, you need an object to hold methods and values/variables that shall be available without having to first instantiate an instance of some class. This use is closely related to static members in Java.

object A {
  def twice(i: Int): Int = 2*i
}

You can then call above method using A.twice(2).

If twice were a member of some class A, then you would need to make an instance first:

class A() {
  def twice(i: Int): Int = 2 * i
}

val a = new A()
a.twice(2)

You can see how redundant this is, as twice does not require any instance-specific data.

`object` as a special named instance

You can also use the object itself as some special instance of a class or trait. When you do this, your object needs to extend some trait in order to become an instance of a subclass of it.

Consider the following code:

object A extends B with C {
  ...
}

This declaration first declares an anonymous (inaccessible) class that extends both B and C, and instantiates a single instance of this class named A.

This means A can be passed to functions expecting objects of type B or C, or B with C.

Additional Features of `object`

There also exist some special features of objects in Scala. I recommend to read the official documentation.

def apply(...) enables the usual method name-less syntax of A(...)
def unapply(...) allows to create custom pattern matching extractors
if accompanying a class of the same name, the object assumes a special role when resolving implicit parameters

Best Answer

Related Solutions

Scala – Read entire file in Scala

Scala – Difference between object and class in Scala

tl;dr

object as host of static members

object as a special named instance

Additional Features of object

Related Topic

`object` as host of static members

`object` as a special named instance

Additional Features of `object`