找到一年中排名前n位的客户,然后在一年中的每个月存储这些客户的数量
•浏览 1
Find the top n clients for a year then bucket those client's volume across each month the year
大家早安,
我想报告该年度的前 n 个客户,然后显示这些前 n 个客户中的每一个在一年中的表现。样本 df:
import pandas as pd
dfTest = [
('Client', ['A','A','A','A',
'B','B','B','B',
'C','C','C','C',
'D','D','D','D']),
('Year_Month', ['2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10', '2018-11',
'2018-08', '2018-09', '2018-10', '2018-11']),
('Volume', [100, 200, 300,400,
1, 2, 3,4,
10, 20, 30,40,
1000, 2000, 3000,4000]
),
('state', ['Done', 'Tied Done', 'Tied Done','Done',
'Passed', 'Done', 'Passed', 'Done',
'Rejected', 'Done', 'Passed', 'Done',
'Done', 'Done', 'Done', 'Done']
)
]
df = pd.DataFrame.from_items(dfTest)
print(df)
Client Year_Month Volume state
0 A 2018-08 100 Done
1 A 2018-09 200 Tied Done
2 A 2018-10 300 Tied Done
3 A 2018-11 400 Done
4 B 2018-08 1 Passed
5 B 2018-09 2 Done
6 B 2018-10 3 Passed
7 B 2018-11 4 Done
8 C 2018-08 10 Rejected
9 C 2018-09 20 Done
10 C 2018-10 30 Passed
11 C 2018-11 40 Done
12 D 2018-08 1000 Done
13 D 2018-09 2000 Done
14 D 2018-10 3000 Done
15 D 2018-11 4000 Doned = [
('Done_Volume', 'sum')
]
# first filter by substring and then aggregate of filtered df
mask = ((df['state'] == 'Done') | (df['state'] == 'Tied Done'))
df_Client_Done_Volume = df[mask].groupby(['Client'])['Volume'].agg(d)
print(df_Client_Done_Volume)
Client
A 1000
B 6
C 60
D 10000
print(df_Client_Done_Volume.nlargest(2, 'Done_Volume'))
Done_Volume
Client
D 10000
A 1000
Client 2018-08 2018-09 2018-10 2018-11
A 100 200 300 400
D 1000 2000 3000 4000
def get_top_n_performer(df, n):
df_done = df[df['state'].isin(['Done', 'Tied Done'])]
aggs= {'Volume':['sum']}
data = df_done.groupby('Client').agg(aggs)
data = data.reset_index()
data.columns = ['Client','Volume_sum']
data = data.sort_values(by='Volume_sum', ascending=False)
return data.head(n)
ls= list(get_top_n_performer(df, 2).Client.values)
data = pd.pivot_table(df[df['Client'].isin(ls)], values='Volume', index=['Client'],
columns=['Year_Month'])
data = data.reset_index()
print(data)
Year_Month Client 2018-08 2018-09 2018-10 2018-11
0 A 100 200 300 400
1 D 1000 2000 3000 4000
s=df.loc[df.state.isin(['Done','Tied Done'])].drop('state',1)
s=s.pivot(*s.columns)
s.loc[s.sum(1).nlargest(2).index]
Year_Month 2018-08 2018-09 2018-10 2018-11
Client
D 1000.0 2000.0 3000.0 4000.0
A 100.0 200.0 300.0 400.0
现在确定顶部,比如说两个(n);关于已完成交易的客户:
import pandas as pd
dfTest = [
('Client', ['A','A','A','A',
'B','B','B','B',
'C','C','C','C',
'D','D','D','D']),
('Year_Month', ['2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10', '2018-11',
'2018-08', '2018-09', '2018-10', '2018-11']),
('Volume', [100, 200, 300,400,
1, 2, 3,4,
10, 20, 30,40,
1000, 2000, 3000,4000]
),
('state', ['Done', 'Tied Done', 'Tied Done','Done',
'Passed', 'Done', 'Passed', 'Done',
'Rejected', 'Done', 'Passed', 'Done',
'Done', 'Done', 'Done', 'Done']
)
]
df = pd.DataFrame.from_items(dfTest)
print(df)
Client Year_Month Volume state
0 A 2018-08 100 Done
1 A 2018-09 200 Tied Done
2 A 2018-10 300 Tied Done
3 A 2018-11 400 Done
4 B 2018-08 1 Passed
5 B 2018-09 2 Done
6 B 2018-10 3 Passed
7 B 2018-11 4 Done
8 C 2018-08 10 Rejected
9 C 2018-09 20 Done
10 C 2018-10 30 Passed
11 C 2018-11 40 Done
12 D 2018-08 1000 Done
13 D 2018-09 2000 Done
14 D 2018-10 3000 Done
15 D 2018-11 4000 Doned = [
('Done_Volume', 'sum')
]
# first filter by substring and then aggregate of filtered df
mask = ((df['state'] == 'Done') | (df['state'] == 'Tied Done'))
df_Client_Done_Volume = df[mask].groupby(['Client'])['Volume'].agg(d)
print(df_Client_Done_Volume)
Client
A 1000
B 6
C 60
D 10000
print(df_Client_Done_Volume.nlargest(2, 'Done_Volume'))
Done_Volume
Client
D 10000
A 1000
Client 2018-08 2018-09 2018-10 2018-11
A 100 200 300 400
D 1000 2000 3000 4000
def get_top_n_performer(df, n):
df_done = df[df['state'].isin(['Done', 'Tied Done'])]
aggs= {'Volume':['sum']}
data = df_done.groupby('Client').agg(aggs)
data = data.reset_index()
data.columns = ['Client','Volume_sum']
data = data.sort_values(by='Volume_sum', ascending=False)
return data.head(n)
ls= list(get_top_n_performer(df, 2).Client.values)
data = pd.pivot_table(df[df['Client'].isin(ls)], values='Volume', index=['Client'],
columns=['Year_Month'])
data = data.reset_index()
print(data)
Year_Month Client 2018-08 2018-09 2018-10 2018-11
0 A 100 200 300 400
1 D 1000 2000 3000 4000
s=df.loc[df.state.isin(['Done','Tied Done'])].drop('state',1)
s=s.pivot(*s.columns)
s.loc[s.sum(1).nlargest(2).index]
Year_Month 2018-08 2018-09 2018-10 2018-11
Client
D 1000.0 2000.0 3000.0 4000.0
A 100.0 200.0 300.0 400.0
所以客户 A 和 D 是我表现最好的两 (n) 个。
我现在想将此列表或 df 反馈到原始数据中,以检索它们在 Year_Month 上升到顶部且客户列为 rows
的一年中的表现
import pandas as pd
dfTest = [
('Client', ['A','A','A','A',
'B','B','B','B',
'C','C','C','C',
'D','D','D','D']),
('Year_Month', ['2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10', '2018-11',
'2018-08', '2018-09', '2018-10', '2018-11']),
('Volume', [100, 200, 300,400,
1, 2, 3,4,
10, 20, 30,40,
1000, 2000, 3000,4000]
),
('state', ['Done', 'Tied Done', 'Tied Done','Done',
'Passed', 'Done', 'Passed', 'Done',
'Rejected', 'Done', 'Passed', 'Done',
'Done', 'Done', 'Done', 'Done']
)
]
df = pd.DataFrame.from_items(dfTest)
print(df)
Client Year_Month Volume state
0 A 2018-08 100 Done
1 A 2018-09 200 Tied Done
2 A 2018-10 300 Tied Done
3 A 2018-11 400 Done
4 B 2018-08 1 Passed
5 B 2018-09 2 Done
6 B 2018-10 3 Passed
7 B 2018-11 4 Done
8 C 2018-08 10 Rejected
9 C 2018-09 20 Done
10 C 2018-10 30 Passed
11 C 2018-11 40 Done
12 D 2018-08 1000 Done
13 D 2018-09 2000 Done
14 D 2018-10 3000 Done
15 D 2018-11 4000 Doned = [
('Done_Volume', 'sum')
]
# first filter by substring and then aggregate of filtered df
mask = ((df['state'] == 'Done') | (df['state'] == 'Tied Done'))
df_Client_Done_Volume = df[mask].groupby(['Client'])['Volume'].agg(d)
print(df_Client_Done_Volume)
Client
A 1000
B 6
C 60
D 10000
print(df_Client_Done_Volume.nlargest(2, 'Done_Volume'))
Done_Volume
Client
D 10000
A 1000
Client 2018-08 2018-09 2018-10 2018-11
A 100 200 300 400
D 1000 2000 3000 4000
def get_top_n_performer(df, n):
df_done = df[df['state'].isin(['Done', 'Tied Done'])]
aggs= {'Volume':['sum']}
data = df_done.groupby('Client').agg(aggs)
data = data.reset_index()
data.columns = ['Client','Volume_sum']
data = data.sort_values(by='Volume_sum', ascending=False)
return data.head(n)
ls= list(get_top_n_performer(df, 2).Client.values)
data = pd.pivot_table(df[df['Client'].isin(ls)], values='Volume', index=['Client'],
columns=['Year_Month'])
data = data.reset_index()
print(data)
Year_Month Client 2018-08 2018-09 2018-10 2018-11
0 A 100 200 300 400
1 D 1000 2000 3000 4000
s=df.loc[df.state.isin(['Done','Tied Done'])].drop('state',1)
s=s.pivot(*s.columns)
s.loc[s.sum(1).nlargest(2).index]
Year_Month 2018-08 2018-09 2018-10 2018-11
Client
D 1000.0 2000.0 3000.0 4000.0
A 100.0 200.0 300.0 400.0
你需要 pandas.pivot_table 方法
这是我的建议:
import pandas as pd
dfTest = [
('Client', ['A','A','A','A',
'B','B','B','B',
'C','C','C','C',
'D','D','D','D']),
('Year_Month', ['2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10', '2018-11',
'2018-08', '2018-09', '2018-10', '2018-11']),
('Volume', [100, 200, 300,400,
1, 2, 3,4,
10, 20, 30,40,
1000, 2000, 3000,4000]
),
('state', ['Done', 'Tied Done', 'Tied Done','Done',
'Passed', 'Done', 'Passed', 'Done',
'Rejected', 'Done', 'Passed', 'Done',
'Done', 'Done', 'Done', 'Done']
)
]
df = pd.DataFrame.from_items(dfTest)
print(df)
Client Year_Month Volume state
0 A 2018-08 100 Done
1 A 2018-09 200 Tied Done
2 A 2018-10 300 Tied Done
3 A 2018-11 400 Done
4 B 2018-08 1 Passed
5 B 2018-09 2 Done
6 B 2018-10 3 Passed
7 B 2018-11 4 Done
8 C 2018-08 10 Rejected
9 C 2018-09 20 Done
10 C 2018-10 30 Passed
11 C 2018-11 40 Done
12 D 2018-08 1000 Done
13 D 2018-09 2000 Done
14 D 2018-10 3000 Done
15 D 2018-11 4000 Doned = [
('Done_Volume', 'sum')
]
# first filter by substring and then aggregate of filtered df
mask = ((df['state'] == 'Done') | (df['state'] == 'Tied Done'))
df_Client_Done_Volume = df[mask].groupby(['Client'])['Volume'].agg(d)
print(df_Client_Done_Volume)
Client
A 1000
B 6
C 60
D 10000
print(df_Client_Done_Volume.nlargest(2, 'Done_Volume'))
Done_Volume
Client
D 10000
A 1000
Client 2018-08 2018-09 2018-10 2018-11
A 100 200 300 400
D 1000 2000 3000 4000
def get_top_n_performer(df, n):
df_done = df[df['state'].isin(['Done', 'Tied Done'])]
aggs= {'Volume':['sum']}
data = df_done.groupby('Client').agg(aggs)
data = data.reset_index()
data.columns = ['Client','Volume_sum']
data = data.sort_values(by='Volume_sum', ascending=False)
return data.head(n)
ls= list(get_top_n_performer(df, 2).Client.values)
data = pd.pivot_table(df[df['Client'].isin(ls)], values='Volume', index=['Client'],
columns=['Year_Month'])
data = data.reset_index()
print(data)
Year_Month Client 2018-08 2018-09 2018-10 2018-11
0 A 100 200 300 400
1 D 1000 2000 3000 4000
s=df.loc[df.state.isin(['Done','Tied Done'])].drop('state',1)
s=s.pivot(*s.columns)
s.loc[s.sum(1).nlargest(2).index]
Year_Month 2018-08 2018-09 2018-10 2018-11
Client
D 1000.0 2000.0 3000.0 4000.0
A 100.0 200.0 300.0 400.0
输出:
import pandas as pd
dfTest = [
('Client', ['A','A','A','A',
'B','B','B','B',
'C','C','C','C',
'D','D','D','D']),
('Year_Month', ['2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10', '2018-11',
'2018-08', '2018-09', '2018-10', '2018-11']),
('Volume', [100, 200, 300,400,
1, 2, 3,4,
10, 20, 30,40,
1000, 2000, 3000,4000]
),
('state', ['Done', 'Tied Done', 'Tied Done','Done',
'Passed', 'Done', 'Passed', 'Done',
'Rejected', 'Done', 'Passed', 'Done',
'Done', 'Done', 'Done', 'Done']
)
]
df = pd.DataFrame.from_items(dfTest)
print(df)
Client Year_Month Volume state
0 A 2018-08 100 Done
1 A 2018-09 200 Tied Done
2 A 2018-10 300 Tied Done
3 A 2018-11 400 Done
4 B 2018-08 1 Passed
5 B 2018-09 2 Done
6 B 2018-10 3 Passed
7 B 2018-11 4 Done
8 C 2018-08 10 Rejected
9 C 2018-09 20 Done
10 C 2018-10 30 Passed
11 C 2018-11 40 Done
12 D 2018-08 1000 Done
13 D 2018-09 2000 Done
14 D 2018-10 3000 Done
15 D 2018-11 4000 Doned = [
('Done_Volume', 'sum')
]
# first filter by substring and then aggregate of filtered df
mask = ((df['state'] == 'Done') | (df['state'] == 'Tied Done'))
df_Client_Done_Volume = df[mask].groupby(['Client'])['Volume'].agg(d)
print(df_Client_Done_Volume)
Client
A 1000
B 6
C 60
D 10000
print(df_Client_Done_Volume.nlargest(2, 'Done_Volume'))
Done_Volume
Client
D 10000
A 1000
Client 2018-08 2018-09 2018-10 2018-11
A 100 200 300 400
D 1000 2000 3000 4000
def get_top_n_performer(df, n):
df_done = df[df['state'].isin(['Done', 'Tied Done'])]
aggs= {'Volume':['sum']}
data = df_done.groupby('Client').agg(aggs)
data = data.reset_index()
data.columns = ['Client','Volume_sum']
data = data.sort_values(by='Volume_sum', ascending=False)
return data.head(n)
ls= list(get_top_n_performer(df, 2).Client.values)
data = pd.pivot_table(df[df['Client'].isin(ls)], values='Volume', index=['Client'],
columns=['Year_Month'])
data = data.reset_index()
print(data)
Year_Month Client 2018-08 2018-09 2018-10 2018-11
0 A 100 200 300 400
1 D 1000 2000 3000 4000
s=df.loc[df.state.isin(['Done','Tied Done'])].drop('state',1)
s=s.pivot(*s.columns)
s.loc[s.sum(1).nlargest(2).index]
Year_Month 2018-08 2018-09 2018-10 2018-11
Client
D 1000.0 2000.0 3000.0 4000.0
A 100.0 200.0 300.0 400.0
我希望这会有所帮助!
IIUC
import pandas as pd
dfTest = [
('Client', ['A','A','A','A',
'B','B','B','B',
'C','C','C','C',
'D','D','D','D']),
('Year_Month', ['2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10','2018-11',
'2018-08', '2018-09', '2018-10', '2018-11',
'2018-08', '2018-09', '2018-10', '2018-11']),
('Volume', [100, 200, 300,400,
1, 2, 3,4,
10, 20, 30,40,
1000, 2000, 3000,4000]
),
('state', ['Done', 'Tied Done', 'Tied Done','Done',
'Passed', 'Done', 'Passed', 'Done',
'Rejected', 'Done', 'Passed', 'Done',
'Done', 'Done', 'Done', 'Done']
)
]
df = pd.DataFrame.from_items(dfTest)
print(df)
Client Year_Month Volume state
0 A 2018-08 100 Done
1 A 2018-09 200 Tied Done
2 A 2018-10 300 Tied Done
3 A 2018-11 400 Done
4 B 2018-08 1 Passed
5 B 2018-09 2 Done
6 B 2018-10 3 Passed
7 B 2018-11 4 Done
8 C 2018-08 10 Rejected
9 C 2018-09 20 Done
10 C 2018-10 30 Passed
11 C 2018-11 40 Done
12 D 2018-08 1000 Done
13 D 2018-09 2000 Done
14 D 2018-10 3000 Done
15 D 2018-11 4000 Doned = [
('Done_Volume', 'sum')
]
# first filter by substring and then aggregate of filtered df
mask = ((df['state'] == 'Done') | (df['state'] == 'Tied Done'))
df_Client_Done_Volume = df[mask].groupby(['Client'])['Volume'].agg(d)
print(df_Client_Done_Volume)
Client
A 1000
B 6
C 60
D 10000
print(df_Client_Done_Volume.nlargest(2, 'Done_Volume'))
Done_Volume
Client
D 10000
A 1000
Client 2018-08 2018-09 2018-10 2018-11
A 100 200 300 400
D 1000 2000 3000 4000
def get_top_n_performer(df, n):
df_done = df[df['state'].isin(['Done', 'Tied Done'])]
aggs= {'Volume':['sum']}
data = df_done.groupby('Client').agg(aggs)
data = data.reset_index()
data.columns = ['Client','Volume_sum']
data = data.sort_values(by='Volume_sum', ascending=False)
return data.head(n)
ls= list(get_top_n_performer(df, 2).Client.values)
data = pd.pivot_table(df[df['Client'].isin(ls)], values='Volume', index=['Client'],
columns=['Year_Month'])
data = data.reset_index()
print(data)
Year_Month Client 2018-08 2018-09 2018-10 2018-11
0 A 100 200 300 400
1 D 1000 2000 3000 4000
s=df.loc[df.state.isin(['Done','Tied Done'])].drop('state',1)
s=s.pivot(*s.columns)
s.loc[s.sum(1).nlargest(2).index]
Year_Month 2018-08 2018-09 2018-10 2018-11
Client
D 1000.0 2000.0 3000.0 4000.0
A 100.0 200.0 300.0 400.0