JIYIK CN >

Current Location:Home > Learning > PROGRAM > Python >

Splitting a Pandas DataFrame

Author:JIYIK Last Updated:2025/05/02 Views:

This tutorial explains how to split a DataFrame into multiple smaller DataFrames using row indexing, DataFrame.groupby()methods, and methods.DataFrame.sample()

We will use the following apprix_dfDataFrame to explain how to split a DataFrame into multiple smaller DataFrames.

import pandas as pd

apprix_df = pd.DataFrame(
    {
        "Name": ["Anish", "Rabindra", "Manish", "Samir", "Binam"],
        "Post": ["CEO", "CTO", "System Admin", "Consultant", "Engineer"],
        "Qualification": ["MBA", "MS", "MCA", "PhD", "BE"],
    }
)

print("Apprix Team DataFrame:")
print(apprix_df, "\n")

Output:

Apprix Team DataFrame:
       Name          Post Qualification
0     Anish           CEO           MBA
1  Rabindra           CTO            MS
2    Manish  System Admin           MCA
3     Samir    Consultant           PhD
4     Binam      Engineer            BE

Splitting a DataFrame using row index

import pandas as pd

apprix_df = pd.DataFrame(
    {
        "Name": ["Anish", "Rabindra", "Manish", "Samir", "Binam"],
        "Post": ["CEO", "CTO", "System Admin", "Consultant", "Engineer"],
        "Qualification": ["MBA", "MS", "MCA", "PhD", "BE"],
    }
)

print("Apprix Team DataFrame:")
print(apprix_df, "\n")

apprix_1 = apprix_df.iloc[:2, :]
apprix_2 = apprix_df.iloc[2:, :]

print("The DataFrames formed by splitting of Apprix Team DataFrame are: ", "\n")
print(apprix_1, "\n")
print(apprix_2, "\n")

Output:

Apprix Team DataFrame:
       Name          Post Qualification
0     Anish           CEO           MBA
1  Rabindra           CTO            MS
2    Manish  System Admin           MCA
3     Samir    Consultant           PhD
4     Binam      Engineer            BE

The DataFrames formed by splitting the Apprix Team DataFrame are:

       Name Post Qualification
0     Anish  CEO           MBA
1  Rabindra  CTO            MS

     Name          Post Qualification
2  Manish  System Admin           MCA
3   Samir    Consultant           PhD
4   Binam      Engineer            BE

It splits the DataFrame into two parts using the row index apprix_df. The first part contains apprix_dfthe first two rows of the DataFrame, while the second part contains the last three rows.

We can ilocspecify the rows to split each time in the attribute. [:2,:]means select 2the rows before index ( 2rows at index are not included) and all columns in the DataFrame. Therefore, apprix_df.iloc[:2,:]select the first two rows of the DataFrame apprix_dfat indexes 0and .1


groupby()Split the DataFrame using

import pandas as pd

apprix_df = pd.DataFrame(
    {
        "Name": ["Anish", "Rabindra", "Manish", "Samir", "Binam"],
        "Post": ["CEO", "CTO", "System Admin", "Consultant", "Engineer"],
        "Qualification": ["MBA", "MS", "MS", "PhD", "MS"],
    }
)

print("Apprix Team DataFrame:")
print(apprix_df, "\n")

groups = apprix_df.groupby(apprix_df.Qualification)
ms_df = groups.get_group("MS")
mba_df = groups.get_group("MBA")
phd_df = groups.get_group("PhD")

print("Group with Qualification MS:")
print(ms_df, "\n")

print("Group with Qualification MBA:")
print(mba_df, "\n")

print("Group with Qualification PhD:")
print(phd_df, "\n")

Output:

Apprix Team DataFrame:
       Name          Post Qualification
0     Anish           CEO           MBA
1  Rabindra           CTO            MS
2    Manish  System Admin            MS
3     Samir    Consultant           PhD
4     Binam      Engineer            MS

Group with Qualification MS:
       Name          Post Qualification
1  Rabindra           CTO            MS
2    Manish  System Admin            MS
4     Binam      Engineer            MS

Group with Qualification MBA:
    Name Post Qualification
0  Anish  CEO           MBA

Group with Qualification PhD:
    Name        Post Qualification
3  Samir  Consultant           PhD

It divides the DataFrame into three parts based on Qualificationthe values ​​of the column . Rows with the same column value will be placed in the same group.apprix_dfQualification

groupby()The function will Qualificationform groups based on the values ​​of the column. We then use get_group()the method to extract the groupby()rows grouped by the method.


sample()Split the DataFrame using

We can form a DataFrame by randomly sampling rows from a DataFrame using sample()the method. We can set the ratio of rows to be sampled from the parent DataFrame.

import pandas as pd

apprix_df = pd.DataFrame(
    {
        "Name": ["Anish", "Rabindra", "Manish", "Samir", "Binam"],
        "Post": ["CEO", "CTO", "System Admin", "Consultant", "Engineer"],
        "Qualification": ["MBA", "MS", "MS", "PhD", "MS"],
    }
)

print("Apprix Team DataFrame:")
print(apprix_df, "\n")

random_df = apprix_df.sample(frac=0.4, random_state=60)

print("Random split from the Apprix Team DataFrame:")
print(random_df)

Output:

Apprix Team DataFrame:
       Name          Post Qualification
0     Anish           CEO           MBA
1  Rabindra           CTO            MS
2    Manish  System Admin            MS
3     Samir    Consultant           PhD
4     Binam      Engineer            MS

Random split from the Apprix Team DataFrame:
    Name      Post Qualification
0  Anish       CEO           MBA
4  Binam  Engineer            MS

It apprix_dfrandomly samples 40% of the rows from the DataFrame and then displays the DataFrame formed by the sampled rows. The setting random_stateis to ensure that each sampling can get the same random sample.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

How to Convert DataFrame Column to String in Pandas

Publish Date:2025/05/02 Views:161 Category:Python

We will look at methods for converting Pandas DataFrame columns to strings. Pandas Series.astype(str) Method DataFrame.apply() Methods operate on the elements in a column We will use the same DataFrame below in this article. import pandas a

How to count the frequency of values in a Pandas DataFrame

Publish Date:2025/05/02 Views:84 Category:Python

Sometimes, when you use DataFrame , you may want to count the number of times a value occurs in a column, or in other words, calculate the frequency. There are mainly three methods used for this. Let's look at them one by one. df.groupby().

How to get value from Pandas DataFrame cell

Publish Date:2025/05/02 Views:147 Category:Python

We'll look at using to get values ​​from cells in iloc Pandas , which is great for selecting by position, and how it differs from . We'll also learn about the and methods, which we can use when we don't want to set the return type to .

How to Add a Row to a Pandas DataFrame

Publish Date:2025/05/02 Views:127 Category:Python

Pandas is designed to load a fully populated DataFrame . We can pandas.DataFrame add them one by one in . This can be done by using various methods, such as .loc , dictionary, pandas.concat() or DataFrame.append() . .loc [index] Add rows to

How to change the order of Panas DataFrame columns

Publish Date:2025/05/02 Views:184 Category:Python

We will show how to use insert and reindex to change the order of columns in different ways pandas.DataFrame , such as assigning column names in a desired order. pandas.DataFrame Sort the columns in the new order The easiest way is columns

How to pretty print an entire Pandas Series/DataFrame

Publish Date:2025/05/02 Views:167 Category:Python

We will introduce various methods to pretty print the entire Pandas Series/DataFrame, such as option_context, set_option, and options.display. option_context Pretty Printing Pandas DataFrame We can option_context use with one or more option

How to Convert a Pandas Dataframe to a NumPy Array

Publish Date:2025/05/02 Views:151 Category:Python

We will introduce to_numpy() the method to pandas.Dataframe convert a to NumPy an array, which is introduced in pandas v0.24.0, replacing the old .values method. We can define it on Index , Series , and DataFrame objects to_numpy . The old

How to add a header row to a Pandas DataFrame

Publish Date:2025/05/02 Views:161 Category:Python

We will look at methods for adding a header row to a pandas dataframe, as well as the option to pass in the names directly in the dataframe or by assigning the column names in a list directly to dataframe.columns the method. We will also in

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial