Scala – How to register a function to sqlContext UDF in scala

apache-sparkapache-spark-sqlscala

I have a method called getAge(timestamp:Long) and I want to register this as a sql function.

I have

sqlContext.udf.register("getAge",getAge) 

But its telling me I need arguments or use _ afterwards, I tried using _ but gives me error. How do I register it with an argument. I am new to scala so I have no idea how to do this.

Best Answer

sqlContext.udf.register("getAge",getAge) 

should be:

sqlContext.udf.register("getAge",getAge _)

The underscore (must have a space in between function and underscore) turns the function into a partially applied function that can be passed in the registration.

More explanation

When we invoke a function, we have to pass in all the required parameters. If we don't, the compiler will complain.

We can however ask it for the function as a value, with which we can pass in the required parameters at a later time. How we do this is to use the underscore.

getAge means to run getAge - for example, def getAge = 10 giving us 10. We don't want the result, we want the function. Moreover, with your definition, the compiler sees that getAge requires a parameter, and complains that one wasn't given.

What we want to do here is to pass getAge as a function value. We tell Scala, we don't know the parameter yet, we want the function as a value and we'll supply it with the required parameter at a later time. So, we use getAge _.

Assuming signature for getAge is:

getAge(l: Long): Long = <function>

getAge _ becomes an anonymous function:

Long => Long = <function>

which means it needs a parameter of type Long and the result of invoking it will yield a value of type Long.

Related Topic