My row RDD looks like this:
Array[org.apache.spark.sql.Row] = Array([1,[example1,WrappedArray([**Standford,Organisation,NNP], [is,O,VP], [good,LOCATION,ADP**])]])
I have got this from converting dataframe to rdd, dataframe schema was :
root
|-- article_id: long (nullable = true)
|-- sentence: struct (nullable = true)
| |-- sentence: string (nullable = true)
| |-- attributes: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- tokens: string (nullable = true)
| | | |-- ner: string (nullable = true)
| | | |-- pos: string (nullable = true)
Now how do access elements in row rdd, in dataframe I can use df.select("sentence"). I am looking forward to access elements like stanford/other nested elements.
Best Answer
As @SarveshKumarSingh wrote in a comment you can access a the rows in a
RDD[Row]
like you would access any other element in an RDD. Accessing the elements in the row can be done in a couple of ways. Either simply callget
like this:or if it is a build in type, you can avoid the type cast:
or you might want to simply use pattern matching, like:
I hope this helps :)