Hadoop – Cloudera Hadoop Class file for org.apache.hadoop.classification.InterfaceAudience not found

clouderacompilationhadoopjavacword-count

Here is the error I get when trying to compile this WordCount.java file.

$javac -classpath /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/client/hadoop-mapreduce-client-core-2.0.0-cdh4.0.1.jar -d ~/wordcount /usr/lib/hadoop/wordcount_classes/WordCount.java
/usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar(org/apache/hadoop/fs/Path.class): 

warning: Cannot find annotation method 'value()' in type 'LimitedPrivate': class file for org.apache.hadoop.classification.InterfaceAudience not found
1 warning

Best Answer

Add hadoop-annotations-2.0.0-cdh4.0.1.jar to the classpath

Related Solutions

Java – How to get WordCount.java to compile on Cloudera 4

I have a script that builds my hadoop classes. Try:

#!/bin/bash

program=`echo $1 | awk -F "." '{print $1}'`

if [ ! -d "${program}_classes" ]
    then    mkdir ${program}_classes/;
fi

javac -classpath /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/client/h\
adoop-mapreduce-client-core-2.0.0-cdh4.0.1.jar -d ${program}_classes/ $1

jar -cvf ${program}.jar -C ${program}_classes/ .;

You were probably missing the key jars:

 /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.0.1.jar

and

/usr/lib/hadoop/client/hadoop-mapreduce-client-core-2.0.0-cdh4.0.1.jar

Sql – Type II dimension joins

By this phrase:

With the combination of StartDateTime and EndDateTime, they would be unique.

you mean that they never overlap or that they satisfy the database UNIQUE constraint?

If the former, then you can use the StartDateTime in joins, but note that it may be inefficient, since it will use a "<=" condition instead of "=".

If the latter, then just use a fake identity.

Databases in general do not allow an efficient algorithm for this query:

SELECT  *
FROM    TransactionState
WHERE   @value BETWEEN StartDateTime AND EndDateTime

, unless you do arcane tricks with SPATIAL data.

That's why you'll have to use this condition in a JOIN:

SELECT  *
FROM    factTable
CROSS APPLY
        (
        SELECT  TOP 1 *
        FROM    TransactionState
        WHERE   StartDateTime <= factDateTime
        ORDER BY
                StartDateTime DESC
        )

, which will deprive the optimizer of possibility to use HASH JOIN, which is most efficient for such queries in many cases.

See this article for more details on this approach:

Converting currencies

Rewriting the query so that it can use HASH JOIN resulted in 600% times performance gain, though it's only possible if your datetimes have accuracy of a day or lower (or a hash table will grow very large).

Since your time component is stripped of your StartDateTime and EndDateTime, you can create a CTE like this:

WITH    cal AS
        (
        SELECT CAST('2009-01-01' AS DATE) AS cdate
        UNION ALL
        SELECT DATEADD(day, 1, cdate)
        FROM   cal
        WHERE  cdate <= '2009-03-01'
        ),
        state AS
        (
        SELECT  cdate, ts.*
        FROM    cal
        CROSS APPLY
                (
                SELECT  TOP 1 *
                FROM    TransactionState
                WHERE   StartDateTime <= cdate
                ORDER BY
                        StartDateTime DESC
                ) ts
        WHERE   ts.EndDateTime >= cdate
        )
SELECT  *
FROM    factTable
JOIN    state
ON      cdate = DATE(factDate)

If your date ranges span more than 100 dates, adjust MAXRECURSION option on CTE.

Best Answer

Related Solutions

Java – How to get WordCount.java to compile on Cloudera 4

Sql – Type II dimension joins

Related Topic