Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
627 views
in Technique[技术] by (71.8m points)

pyspark - Is it possible to get the attempt number of a running Spark Task?

I have a Spark job in which I am applying a custom transformation function to all records of my RDD. This function is very long and somewhat "fragile" as it might fail unexpectedly. I also have a fallback function that should be used in case of a failure - this one is much faster and stable. For reasons beyond the scope of this question, I can't split the primary function into smaller pieces, nor can I catch the primary function's failure inside the function itself and handle the fallback (with try/except for example) as I also want to handle failures caused by the execution environment, such as OOM.

Here's a simple example of what I'm trying to achieve:

def primary_logic(*args):
    ...

def fallback_logic(*args):
    ...

def spark_map_function(*args):
    current_task_attempt_num = ...  # how do I get this?
    if current_task_attempt_num == 0:
        return primary_logic(*args)
    else:
        return fallback_logic(*args)

result = rdd.map(spark_map_function)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...