Tuesday, January 17, 2017

SparkException, Task not serializable

I usually run my codes on databricks for a 1st test.I was facing with lots of "Task not serializable" exceptions.
I was solving them in a way. Then I googled some and tried to list every reason that can cause this exception.
Here I will write simplest one.
When writing on databricks(single class lets say) if at a point i try to


Let's define 3 classes. Exception1 is class throwing serializable exception.
The reason behind this is when there is a call to function, Spark tries to serialize the enclosing object class.
There are 2 solutions.

1) NoException2 extends java.io.Serializable
just make class Serializable

2) Make add function function with def.
def add(a:Int) = a+1 // function
instead of
val add = (a: Int) => a + 1 // method

Functions in scala are serializable.


class Exception1 {

    val rdd = sc.parallelize(List(1,2,3))

    def addFunc =  {
      val result = rdd.map(add)
      result.take(3)
    }

    def add(a:Int) = a+1

  }
class NoException1 {
  val rdd = sc.parallelize(List(1,2,3))

  def addFunc() =  {
    val result = rdd.map(add)
    result.take(3)
  }

  val add = (a: Int) => a + 1
}
class NoException2 extends java.io.Serializable {
  val rdd = sc.parallelize(List(1,2,3))

  def doIT() =  {
    val result = rdd.map(add)
    result.take(3)
  }

  def add(a: Int) = a + 1
}

Calling functions.

(new Exception1()).addFunc
(new NoException1()).addFunc
(new NoException2()).addFunc
Results

org.apache.spark.SparkException: Task not serializable
res9: Array[Int] = Array(2, 3, 4)
res9: Array[Int] = Array(2, 3, 4)

No comments:

Post a Comment