Apache-spark – Spark sql errors

apache-sparkapache-spark-sql

I try to work with spark-sql but I had the following errors :

error: missing or invalid dependency detected while loading class file
'package.class'. Could not access term annotation in package
org.apache.spark, because it (or its dependencies) are missing. Check
your build definition for missing or conflicting dependencies. (Re-run
with -Ylog-classpath to see the problematic classpath.) A full
rebuild may help if 'package.class' was compiled against an
incompatible version of org.apache.spark. warning: Class
org.apache.spark.annotation.InterfaceStability not found – continuing
with a stub. error: missing or invalid dependency detected while
loading class file 'SparkSession.class'. Could not access term
annotation in package org.apache.spark, because it (or its
dependencies) are missing. Check your build definition for missing or
conflicting dependencies. (Re-run with -Ylog-classpath to see the
problematic classpath.) A full rebuild may help if
'SparkSession.class' was compiled against an incompatible version of
org.apache.spark.

My configuration :

  • Scala 2.11.8
  • Spark-core_2.11-2.1.0
  • Spark-sql_2.11-2.1.0

  • Note: I use SparkSession.

Best Answer

After dig into the error message, I know how to solve this kind of errors. For example:

Error - Symbol 'term org.apache.spark.annotation' is missing... A full rebuild may help if 'SparkSession.class' was compiled against an incompatible version of org.apache.spark

Open SparkSession.class, search "import org.apache.spark.annotation.", you will find import org.apache.spark.annotation.{DeveloperApi, Experimental, InterfaceStability}. It's sure that these classes is missing in classpath. You'll need to find the artifact which conclude these classes.

So open https://search.maven.org and search with c:"DeveloperApi" AND g:"org.apache.spark", you will find the missing artifact is spark-tags as @Prakash answered.

In my situation, just add dependencies spark-catalyst and spark-tags in pom.xml works.


But it's weird that why maven not auto resolve transitive dependencies here?

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.11</artifactId>
  <version>2.2.0</version>
  <scope>provided</scope>
</dependency>

If I use the above depencency, only spark-core_2.11-2.2.0.jar is in maven dependency; While if I change version to 2.1.0 or 2.3.0, all transitive dependencies will be there.

Related Topic