I was solving them in a way. Then I googled some and tried to list every reason that can cause this exception.
Here I will write simplest one.
When writing on databricks(single class lets say) if at a point i try to
Let's define 3 classes. Exception1 is class throwing serializable exception.
The reason behind this is when there is a call to function, Spark tries to serialize the enclosing object class.
There are 2 solutions.
1) NoException2 extends java.io.Serializable
just make class Serializable
2) Make add function function with def.
def add(a:Int) = a+1 // function
instead of
val add = (a: Int) => a + 1 // method
Functions in scala are serializable.
class Exception1 {
val rdd = sc.parallelize(List(1,2,3))
def addFunc = {
val result = rdd.map(add)
result.take(3)
}
def add(a:Int) = a+1
}
class NoException1 {
val rdd = sc.parallelize(List(1,2,3))
def addFunc() = {
val result = rdd.map(add)
result.take(3)
}
val add = (a: Int) => a + 1
}
class NoException2 extends java.io.Serializable {
val rdd = sc.parallelize(List(1,2,3))
def doIT() = {
val result = rdd.map(add)
result.take(3)
}
def add(a: Int) = a + 1
}
Calling functions.(new Exception1()).addFunc (new NoException1()).addFunc (new NoException2()).addFuncResults
org.apache.spark.SparkException: Task not serializable res9: Array[Int] = Array(2, 3, 4) res9: Array[Int] = Array(2, 3, 4)
No comments:
Post a Comment