Filtering Data in Pandas DataFrame
Filtering data in a Pandas DataFrame is a common task in data analysis and manipulation. Pandas provides several ways to filter data based on various conditions, allowing you to extract the specific information you need from your dataset.
Basic Filtering
The most basic way to filter a DataFrame is by using boolean indexing. This involves creating a boolean mask, which is a Series of True and False values that correspond to the rows in the DataFrame. You can then use this mask to select the rows that meet the specified condition.
## Example DataFrame
data = {'Name': ['John', 'Jane', 'Bob', 'Alice'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)
## Filter for rows where Age is greater than 30
mask = df['Age'] > 30
filtered_df = df[mask]
Multiple Conditions
You can also filter a DataFrame using multiple conditions by combining boolean expressions with logical operators such as &
(and), |
(or), and ~
(not).
## Filter for rows where Age is greater than 30 and City is Paris
mask = (df['Age'] > 30) & (df['City'] == 'Paris')
filtered_df = df[mask]
Filtering with isin()
The isin()
method is useful when you want to filter a DataFrame based on a list of values.
## Filter for rows where City is either New York or Tokyo
cities = ['New York', 'Tokyo']
mask = df['City'].isin(cities)
filtered_df = df[mask]
Filtering with query()
Pandas also provides the query()
method, which allows you to filter a DataFrame using a string-based expression.
## Filter for rows where Age is greater than 30 and City is Paris
filtered_df = df.query('Age > 30 and City == "Paris"')
By understanding these various filtering techniques, you can effectively extract the data you need from your Pandas DataFrames. In the next section, we'll explore some more advanced filtering methods.