Splitting a Pandas DataFrame

Current Location：Home > Learning > PROGRAM > Python >

Python PHP Java Go TypeScript C++ Vba Node.js C语言 MATLAB

Splitting a Pandas DataFrame

Author：JIYIK Last Updated：2025/05/02 Views：

This tutorial explains how to split a DataFrame into multiple smaller DataFrames using row indexing, DataFrame.groupby()methods, and methods.DataFrame.sample()

We will use the following apprix_dfDataFrame to explain how to split a DataFrame into multiple smaller DataFrames.

import pandas as pd

apprix_df = pd.DataFrame(
    {
        "Name": ["Anish", "Rabindra", "Manish", "Samir", "Binam"],
        "Post": ["CEO", "CTO", "System Admin", "Consultant", "Engineer"],
        "Qualification": ["MBA", "MS", "MCA", "PhD", "BE"],
    }
)

print("Apprix Team DataFrame:")
print(apprix_df, "\n")

Output:

Apprix Team DataFrame:
       Name          Post Qualification
0     Anish           CEO           MBA
1  Rabindra           CTO            MS
2    Manish  System Admin           MCA
3     Samir    Consultant           PhD
4     Binam      Engineer            BE

Splitting a DataFrame using row index

import pandas as pd

apprix_df = pd.DataFrame(
    {
        "Name": ["Anish", "Rabindra", "Manish", "Samir", "Binam"],
        "Post": ["CEO", "CTO", "System Admin", "Consultant", "Engineer"],
        "Qualification": ["MBA", "MS", "MCA", "PhD", "BE"],
    }
)

print("Apprix Team DataFrame:")
print(apprix_df, "\n")

apprix_1 = apprix_df.iloc[:2, :]
apprix_2 = apprix_df.iloc[2:, :]

print("The DataFrames formed by splitting of Apprix Team DataFrame are: ", "\n")
print(apprix_1, "\n")
print(apprix_2, "\n")

Output:

Apprix Team DataFrame:
       Name          Post Qualification
0     Anish           CEO           MBA
1  Rabindra           CTO            MS
2    Manish  System Admin           MCA
3     Samir    Consultant           PhD
4     Binam      Engineer            BE

The DataFrames formed by splitting the Apprix Team DataFrame are:

       Name Post Qualification
0     Anish  CEO           MBA
1  Rabindra  CTO            MS

     Name          Post Qualification
2  Manish  System Admin           MCA
3   Samir    Consultant           PhD
4   Binam      Engineer            BE

It splits the DataFrame into two parts using the row index apprix_df. The first part contains apprix_dfthe first two rows of the DataFrame, while the second part contains the last three rows.

We can ilocspecify the rows to split each time in the attribute. [:2,:]means select 2the rows before index ( 2rows at index are not included) and all columns in the DataFrame. Therefore, apprix_df.iloc[:2,:]select the first two rows of the DataFrame apprix_dfat indexes 0and .1

`groupby()`Split the DataFrame using

import pandas as pd

apprix_df = pd.DataFrame(
    {
        "Name": ["Anish", "Rabindra", "Manish", "Samir", "Binam"],
        "Post": ["CEO", "CTO", "System Admin", "Consultant", "Engineer"],
        "Qualification": ["MBA", "MS", "MS", "PhD", "MS"],
    }
)

print("Apprix Team DataFrame:")
print(apprix_df, "\n")

groups = apprix_df.groupby(apprix_df.Qualification)
ms_df = groups.get_group("MS")
mba_df = groups.get_group("MBA")
phd_df = groups.get_group("PhD")

print("Group with Qualification MS:")
print(ms_df, "\n")

print("Group with Qualification MBA:")
print(mba_df, "\n")

print("Group with Qualification PhD:")
print(phd_df, "\n")

Output:

Apprix Team DataFrame:
       Name          Post Qualification
0     Anish           CEO           MBA
1  Rabindra           CTO            MS
2    Manish  System Admin            MS
3     Samir    Consultant           PhD
4     Binam      Engineer            MS

Group with Qualification MS:
       Name          Post Qualification
1  Rabindra           CTO            MS
2    Manish  System Admin            MS
4     Binam      Engineer            MS

Group with Qualification MBA:
    Name Post Qualification
0  Anish  CEO           MBA

Group with Qualification PhD:
    Name        Post Qualification
3  Samir  Consultant           PhD

It divides the DataFrame into three parts based on Qualificationthe values of the column . Rows with the same column value will be placed in the same group.apprix_dfQualification

groupby()The function will Qualificationform groups based on the values of the column. We then use get_group()the method to extract the groupby()rows grouped by the method.

`sample()`Split the DataFrame using

We can form a DataFrame by randomly sampling rows from a DataFrame using sample()the method. We can set the ratio of rows to be sampled from the parent DataFrame.

import pandas as pd

apprix_df = pd.DataFrame(
    {
        "Name": ["Anish", "Rabindra", "Manish", "Samir", "Binam"],
        "Post": ["CEO", "CTO", "System Admin", "Consultant", "Engineer"],
        "Qualification": ["MBA", "MS", "MS", "PhD", "MS"],
    }
)

print("Apprix Team DataFrame:")
print(apprix_df, "\n")

random_df = apprix_df.sample(frac=0.4, random_state=60)

print("Random split from the Apprix Team DataFrame:")
print(random_df)

Output:

Apprix Team DataFrame:
       Name          Post Qualification
0     Anish           CEO           MBA
1  Rabindra           CTO            MS
2    Manish  System Admin            MS
3     Samir    Consultant           PhD
4     Binam      Engineer            MS

Random split from the Apprix Team DataFrame:
    Name      Post Qualification
0  Anish       CEO           MBA
4  Binam  Engineer            MS

It apprix_dfrandomly samples 40% of the rows from the DataFrame and then displays the DataFrame formed by the sampled rows. The setting random_stateis to ensure that each sampling can get the same random sample.

Previous：Differences between Pandas apply, map and applymap

Next：Writing a Pandas DataFrame to CSV

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL：

JIYIK CN >