Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
334 views
in Technique[技术] by (71.8m points)

arrays - Join dataframe columns in python

I have a dataframe called X

X has 3 columns (Uni, Sub, Colours)

Uni is string, Sub is string

but Colours is an array of strings

data in X look like this

Uni     Sub      Colours
-----   ------   -------
Melb    Math     (Red, Blue, Green)
Melb    English  (Yellow, Blue, Brown, White)
Sydney  Math     (Green, Pink, Red, White, Black)
Sydney  Art      (White, Pink)
Sydney  Med      (Yellow, Red, Pink, Brown, White, Black, Green)
Ottawa  Math     (Blue, Yellow)
Ottawa  Physics  (Red, Orange, Black)

I am trying to join these columns in one string so I can pass it to SQL Stored Procedure

my desired results should look like this

Melb;Math;Red, Blue, Green|Melb;English;Yellow, Blue, Brown, White|Sydney;Math;Green, Pink, Red, White, Black| ...

Where columns separated by a semicolon ; and rows separated by a bar |

I have this code

MyString = X[['Uni', 'Sub', 'Colours']].apply(lambda x: ';'.join(x), axis = 1)

But I was getting this error

TypeError: sequence item 2: expected str instance, tuple found

So I changed it to this

X["Codes"] = ",".join(map(str, X["Codes"]))
MyString = X[['PtsID', 'VisitID', 'Codes']].apply(lambda x: ';'.join(x), axis = 1)

but the end results are not what i want as it was a list and it look like this

Melb    Math     Red
1    Blue
2    Green
3
Melb    English     Yellow
1    Blue
2    Brown
3    White
4
.
.
.

Any idea how can I get my desired format for a single string?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Consider df:

In [346]: df = pd.DataFrame({'Uni':['Melb', 'Melb', 'Sydney', 'Sydney'], 'Sub':['Math', 'English', 'Math', 'Art'], 'Colours':[('Red', 'Blue', 'Green'), ('Yellow', 'Blue', 'Brown', 'White'), ('Green', 'Pink', 'Red', 'White', 'Black'), ('White', 'Pink')]})
     ...: 

In [347]: df
Out[347]: 
      Uni      Sub                           Colours
0    Melb     Math                (Red, Blue, Green)
1    Melb  English      (Yellow, Blue, Brown, White)
2  Sydney     Math  (Green, Pink, Red, White, Black)
3  Sydney      Art                     (White, Pink)

Use df.agg with Series.str.cat:

In [350]: df.Colours = df.Colours.apply(lambda x: ','.join(x))

In [352]: ans = df.agg(';'.join, 1).str.cat(sep='|')

In [353]: ans
Out[353]: 'Melb;Math;Red,Blue,Green|Melb;English;Yellow,Blue,Brown,White|Sydney;Math;Green,Pink,Red,White,Black|Sydney;Art;White,Pink'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...