In this blog, we will explore how to use the Pandas groupby method to group a DataFrame by one column and then join another column by comma.
Let’s start by creating an example DataFrame:
import pandas as pd |
Our example DataFrame has two columns: Name and Fruit. We want to group the DataFrame by the Name column and then join the values in the Fruit column by comma for each group. To accomplish this, we can use the groupby method with an anonymous lambda function.
grouped = df.groupby('Name').apply(lambda x: ','.join(x['Fruit'])) |
In this code, we use the groupby method to group the DataFrame by the Name column. We then use the apply method to apply a lambda function to each group of rows in the DataFrame. The lambda function takes each group of rows, selects the Fruit column using x[‘Fruit’], and joins the values in that column with a comma using the ‘,’.join() method. The result is a new Series object grouped that contains one row for each unique value in the Name column. The values in each row are the joined values of the Fruit column for the corresponding group of rows in the original DataFrame.
print(grouped) |
Output:
Name |
As we can see from the output, the groupby method has grouped the DataFrame by the unique values in the Name column and joined the corresponding values in the Fruit column by comma.
If you want to get two columns in the resulting DataFrame instead of a single column with joined values, you can add an additional step of calling the reset_index() method.
grouped = df.groupby('Name')['Fruit'].apply(lambda x: ','.join(x)).reset_index() |
In this updated code, we have added the [‘Fruit’] parameter inside the groupby method to specify that we are only interested in grouping by the Name column and joining the Fruit column. We then use the reset_index() method to convert the resulting Series object back into a DataFrame with two columns: Name and Fruit.
print(grouped) |
Output:
Name Fruit |
As we can see from the output, the resulting DataFrame now has two columns: Name and Fruit, with the joined values of the Fruit column grouped by the corresponding values in the Name column.