Hi all,
I'm trying to create a base class that knows the structure of my final DF and child classes that implement the logic behind altering the data.
IE, something like:
from abc import abstractmethod
from pyspark.sql.functions import struct, udf
from pyspark.sql.types import StringType
class Parent:
def __init__(self):
self.temp_udf = udf(self.temp.__get__(object), StringType())
@abstractmethod
def temp(self, temp_obj):
pass
def work(self):
df1 = spark.read.format("delta").load("...")
df1 = df1.withColumn("temp", self.temp_udf("id"))
df1.select("temp").show()
class Child(Parent):
def __init__(self):
super(Child, self).__init__()
def temp(self, temp_obj):
return "Child" + temp_object # I'll do some logic here
c = Child()
c.work()
I keep failing due to the following error:
Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
Am I trying to achieve the impossible? Or am I missing something?
Top comments (0)