I am an R user who is new to python. I have some data
dat1=DataFrame({'user_id':['a1','a1','a4','a3','a1','a15', 'a8', 'a15' ,'a1', 'a5'],
'Visits':[1,4,2,1,3,1,1,8,1,9],'cell': [14,21,14,14,19,10,18,17,10,11],
'date': ['2011-01-05', '2011-01-05', '2011-01-12', '2011-01-12', '2011-01-12', '2011-01-12', '2011-01-02', '2011-01-19', '2011-01-19', '2011-01-19' ] })
dat1['date']=pd.to_datetime(dat1['date'])
dat2=dat1.sort_index(by='date')
This gives me a DataFrame of the form
Visits cell date user_id
1 18 2011-01-02 a8
1 14 2011-01-05 a1
4 21 2011-01-05 a1
2 14 2011-01-12 a4
1 14 2011-01-12 a3
3 19 2011-01-12 a1
1 10 2011-01-12 a15
8 17 2011-01-19 a15
1 10 2011-01-19 a1
9 11 2011-01-19 a5
I want to create a DataFrame such that each column is identified with a unique user_id and each row is a unique date. Each cell contains a 0 or 1 depending on whether the user_id and the date share a row in the original DataFrame. In R
I would use sapply and a user defined function for this operation, but in Python I am struggling to find a solution.
With my array of user_ids denoted as
user_names= dat2['user_id'].unique()
My final DataFrame should be of the form
a8 a1 a4 a3 a15 a5
1 0 0 0 0 0
0 1 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 1
Aucun commentaire:
Enregistrer un commentaire