Pandas is a popular data analysis library in Python that provides powerful tools for manipulating and analyzing data. One common task in data analysis is to sample rows from a dataframe based on some grouping criteria. In this blog, we’ll explore how to use Pandas to sample rows from a dataframe by group.
Suppose we have a dataframe with a column called ‘vertical’ and we want to sample up to 100 random rows for each unique value in the ‘vertical’ column. Here’s how we can achieve this:
import pandas as pd |
In this example, we first create a sample dataframe with a ‘vertical’ column and a ‘value’ column. We then group the dataframe by the ‘vertical’ column using the groupby() function. We apply a lambda function to each group that samples up to 100 random rows using the sample() function. Finally, we combine the sampled groups back into a single dataframe using apply() and groupby().
Note that the min(len(x), 100) argument passed to sample() ensures that we don’t sample more rows than are available in a given group. This is useful in cases where a group may have fewer than 100 rows.