I was solving them in a way. Then I googled some and tried to list every reason that can cause this exception.
Here I will write simplest one.
When writing on databricks(single class lets say) if at a point i try to
Let's define 3 classes. Exception1 is class throwing serializable exception.
The reason behind this is when there is a call to function, Spark tries to serialize the enclosing object class.
There are 2 solutions.
1) NoException2 extends java.io.Serializable
just make class Serializable
2) Make add function function with def.
def add(a:Int) = a+1 // function
instead of
val add = (a: Int) => a + 1 // method
Functions in scala are serializable.
class Exception1 { val rdd = sc.parallelize(List(1,2,3)) def addFunc = { val result = rdd.map(add) result.take(3) } def add(a:Int) = a+1 } class NoException1 { val rdd = sc.parallelize(List(1,2,3)) def addFunc() = { val result = rdd.map(add) result.take(3) } val add = (a: Int) => a + 1 } class NoException2 extends java.io.Serializable { val rdd = sc.parallelize(List(1,2,3)) def doIT() = { val result = rdd.map(add) result.take(3) } def add(a: Int) = a + 1 }Calling functions.
(new Exception1()).addFunc (new NoException1()).addFunc (new NoException2()).addFuncResults
org.apache.spark.SparkException: Task not serializable res9: Array[Int] = Array(2, 3, 4) res9: Array[Int] = Array(2, 3, 4)
No comments:
Post a Comment