explode and expand rows to multiple rows or columns to multiple columns using pandas dataframe


import pandas as pd


generate some example some data

data = [ [['python','C'],'John'],[['python','Go'],'Mark'] ]
df = pd.DataFrame(data=data,columns=['language','name'])
print(df)
       language  name
0   [python, C]  John
1  [python, Go]  Mark

1. First, we expload the laguage column, put each of the array element into a single row

df2 = df.explode('language')
print(df2)


  language  name
0   python  John
0        C  John
1   python  Mark
1       Go  Mark

if we want to reset the index….

df2 = df.explode('language').reset_index(drop=True)
print(df2)

  language  name
0   python  John
1        C  John
2   python  Mark
3       Go  Mark

1.2 now how about the original column is not list, but strings we need to split? use assign then chain it with explode

data = [ ['python,C','John'],['python,Go','Mark'] ]
df = pd.DataFrame(data=data,columns=['language','name'])
print(df)


    language  name
0   python,C  John
1  python,Go  Mark
df2 = df.assign(language=df.language.str.split(",")).explode('language')
print(df2)
  language  name
0   python  John
0        C  John
1   python  Mark
1       Go  Mark

2. in the above example, we expand rows into multiple rows by one column’s list like element; now sometimes we need to expand columns into multiple columns

let’s generate some data again

data = [ [['age','27'],'John'],[['age','30'],'Mark'] ]
df = pd.DataFrame(data=data,columns=['age_info','name'])
print(df)
    age_info  name
0  [age, 27]  John
1  [age, 30]  Mark

we could use to_list()

df[['attribute','value']] = df.age_info.to_list()
# or df[['First','Last']] = df['age_info'].to_list()
print(df)
    age_info  name attribute value
0  [age, 27]  John       age    27
1  [age, 30]  Mark       age    30

now same quesiton, how about the column is a string that can be split?

data = [ ['john,f','1'],['mark,y','2'] ]
df = pd.DataFrame(data=data,columns=['full_name','id'])
print(df)


df[['First','Last']] = df.full_name.str.split(",",expand=True)
print(df)
  full_name id
0    john,f  1
1    mark,y  2
  full_name id First Last
0    john,f  1  john    f
1    mark,y  2  mark    y

Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC