Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

pandas - Counting non zero values in each column of a dataframe in python

I have a python-pandas-dataframe in which first column is user_id and rest of the columns are tags(tag_0 to tag_122). I have the data in the following format:

UserId  Tag_0   Tag_1
7867688 0   5
7867688 0   3
7867688 3   0
7867688 3.5 3.5
7867688 4   4
7867688 3.5 0

My aim is to achieve Sum(Tag)/Count(NonZero(Tags)) for each user_id

df.groupby('user_id').sum(), gives me sum(tag), however I am clueless about counting non zero values

Is it possible to achieve Sum(Tag)/Count(NonZero(Tags)) in one command?

In MySQL I could achieve this as follows:-

select user_id, sum(tag)/count(nullif(tag,0)) from table group by 1

Any help shall be appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

My favorite way of getting number of nonzeros in each column is

df.astype(bool).sum(axis=0)

For the number of non-zeros in each row use

df.astype(bool).sum(axis=1)

(Thanks to Skulas)

If you have nans in your df you should make these zero first, otherwise they will be counted as 1.

df.fillna(0).astype(bool).sum(axis=1)

(Thanks to SirC)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...