Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
818 views
in Technique[技术] by (71.8m points)

pyspark - Make single DataFrame from list of Dataframes

I have a list of data frames, on each location of a list, I have one dataframe I need to combine all those in one dataframe. this is to be done in PySpark , before I was using

dataframe_new =pd.concat(listName)

solution 1

from pyspark.sql.types import *

import pyspark.sql

from pyspark.sql import SparkSession, Row

customSchema = StructType([

  StructField("col1",      StringType(), True),

  StructField("col2", StringType(), True),

  StructField("col3", StringType(), True),

  StructField("col4",  StringType(), True),

  StructField("col5", StringType(), True),

  StructField("col6",  StringType(), True),

  StructField("col7", StringType(), True)

])



df = spark.createDataFrame(queried_dfs[0],schema=customSchema)

Solution 2 I tried: (iterating through the list of dataframes, but don't know how to combine them

for x in ListOfDataframe
    new_df=union_all()

but this is always create a new_df

any help to resolve this?

question from:https://stackoverflow.com/questions/65923884/make-single-dataframe-from-list-of-dataframes

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is a useful function for combining a list of dataframes even when the columns or order of columns are different

def Zconcat(dfs):
    return reduce(lambda df1, df2: df1.union(df2.select(df1.columns)), dfs) 

def union_all(dfs):
    columns = reduce(lambda x, y : set(x).union(set(y)), [ i.columns for i in dfs ]  )

    for i in range(len(dfs)):
        d = dfs[i]
        for c in columns:
            if c not in d.columns:
                d = d.withColumn(c, lit(None))
        dfs[i] = d

    return Zconcat(dfs)

Then pass union_all a list of dataframes, e.g.

union_all([df1, df2, df3])

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share

2.1m questions

2.1m answers

63 comments

56.6k users

...