Comparing Pandas DataFrame Objects
This tutorial explains how to compare Pandas DataFrame objects in Python. We can use ==
the operator to compare DataFrames.
import pandas as pd
data_season1 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [10, 8, 6, 5, 4],
}
data_season2 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [7, 8, 6, 7, 4],
}
df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)
print("df_1:")
print(df_1)
print("")
print("df_2:")
print(df_2)
Output:
df_1:
Player Goals
0 Lewandowski 10
1 Haland 8
2 Ronaldo 6
3 Messi 5
4 Mbappe 4
df_2:
Player Goals
0 Lewandowski 7
1 Haland 8
2 Ronaldo 6
3 Messi 7
4 Mbappe 4
In this article, we will use DataFrame df_1
and df_2
to demonstrate the comparison of DataFrame.
==
Comparing Pandas DataFrame Objects Using the Operator
import pandas as pd
data_season1 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [10, 8, 6, 5, 4],
}
data_season2 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [7, 8, 6, 7, 4],
}
df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)
print(df_1 == df_2)
Output:
Player Goals
0 True False
1 True True
2 True True
3 True False
4 True True
Compares df_1
corresponding df_2
elements of and and returns if the corresponding elements at that position are the same, True
otherwise returns False
.
We can use pandas.DataFrame.all()
the method to find out which rows in df_1
and df_2
are the same.
import pandas as pd
data_season1 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [10, 8, 6, 5, 4],
}
data_season2 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [7, 8, 6, 7, 4],
}
df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)
print((df_1 == df_2).all(axis=1))
Output:
0 False
1 True
2 True
3 False
4 True
dtype: bool
In the output, True
the rows with value are the same as the corresponding element value. Therefore, False
the rows with output value are different from the corresponding element value.
We can use the index to list all rows where the values of df_1
and df_2
are different.
import pandas as pd
data_season1 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [10, 8, 6, 5, 4],
}
data_season2 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [7, 8, 6, 7, 4],
}
df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)
print(df_1[(df_1 == df_2).all(axis=1) == False])
Output:
Player Goals
0 Lewandowski 10
3 Messi 5
It lists df_1
all the rows in that have values df_2
that differ from the corresponding rows in .
If we had different indexes for df_1
and , we would get an error saying .df_2
ValueError: Can only compare identically-labeled DataFrame objects
import pandas as pd
data_season1 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [10, 8, 6, 5, 4],
}
data_season2 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [7, 8, 6, 7, 4],
}
df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2, index=["a", "b", "c", "d", "e"])
print(df_1 == df_2)
Output:
Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled DataFrame objects
We can use pandas.DataFrame.reset_index()
the method to reset the index to overcome the above problem.
import pandas as pd
data_season1 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [10, 8, 6, 5, 4],
}
data_season2 = {
"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
"Goals": [7, 8, 6, 7, 4],
}
df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2, index=["a", "b", "c", "d", "e"])
df_2.reset_index(drop=True, inplace=True)
print(df_1 == df_2)
Output:
Player Goals
0 True False
1 True True
2 True True
3 True False
4 True True
It resets the index of before comparing df_1
and so that both DataFrames have the same index, making comparison possible.df_2
df_2
You also have to make sure you have the same number of rows in your DataFrames before comparing them.
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
How to Convert DataFrame Column to String in Pandas
Publish Date:2025/05/02 Views:161 Category:Python
-
We will look at methods for converting Pandas DataFrame columns to strings. Pandas Series.astype(str) Method DataFrame.apply() Methods operate on the elements in a column We will use the same DataFrame below in this article. import pandas a
How to count the frequency of values in a Pandas DataFrame
Publish Date:2025/05/02 Views:84 Category:Python
-
Sometimes, when you use DataFrame , you may want to count the number of times a value occurs in a column, or in other words, calculate the frequency. There are mainly three methods used for this. Let's look at them one by one. df.groupby().
How to get value from Pandas DataFrame cell
Publish Date:2025/05/02 Views:147 Category:Python
-
We'll look at using to get values from cells in iloc Pandas , which is great for selecting by position, and how it differs from . We'll also learn about the and methods, which we can use when we don't want to set the return type to .
How to Add a Row to a Pandas DataFrame
Publish Date:2025/05/02 Views:127 Category:Python
-
Pandas is designed to load a fully populated DataFrame . We can pandas.DataFrame add them one by one in . This can be done by using various methods, such as .loc , dictionary, pandas.concat() or DataFrame.append() . .loc [index] Add rows to
How to change the order of Panas DataFrame columns
Publish Date:2025/05/02 Views:184 Category:Python
-
We will show how to use insert and reindex to change the order of columns in different ways pandas.DataFrame , such as assigning column names in a desired order. pandas.DataFrame Sort the columns in the new order The easiest way is columns
How to pretty print an entire Pandas Series/DataFrame
Publish Date:2025/05/02 Views:167 Category:Python
-
We will introduce various methods to pretty print the entire Pandas Series/DataFrame, such as option_context, set_option, and options.display. option_context Pretty Printing Pandas DataFrame We can option_context use with one or more option
How to count the number of NaN occurrences in a Pandas Dataframe column
Publish Date:2025/05/02 Views:144 Category:Python
-
We will look at methods for counting the number of NaN occurrences in a column of a Pandas DataFrame. We have a number of options, including isna() the method for one or more columns, by NaN subtracting the total length from the number of o
How to Convert a Pandas Dataframe to a NumPy Array
Publish Date:2025/05/02 Views:151 Category:Python
-
We will introduce to_numpy() the method to pandas.Dataframe convert a to NumPy an array, which is introduced in pandas v0.24.0, replacing the old .values method. We can define it on Index , Series , and DataFrame objects to_numpy . The old
How to add a header row to a Pandas DataFrame
Publish Date:2025/05/02 Views:161 Category:Python
-
We will look at methods for adding a header row to a pandas dataframe, as well as the option to pass in the names directly in the dataframe or by assigning the column names in a list directly to dataframe.columns the method. We will also in