The value_counts() method in Pandas is used to count the unique values in a Series. It returns a Series containing counts of unique values sorted in descending order. This method is particularly useful for understanding the distribution of categorical data.
Key Features:
- Counts Unique Values: It counts how many times each unique value appears in the Series.
- Sorting: By default, the results are sorted in descending order based on the counts.
- Normalization: You can normalize the counts to get the relative frequencies by setting the
normalizeparameter toTrue. - Drop NaN: By default,
NaNvalues are excluded from the counts, but you can include them by setting thedropnaparameter toFalse.
Example:
Here’s a simple example of how to use value_counts():
import pandas as pd
# Sample Series
data = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'banana'])
# Count unique values
counts = data.value_counts()
print(counts)
Output:
banana 3
apple 2
orange 1
dtype: int64
Normalization Example:
To get the relative frequencies:
relative_counts = data.value_counts(normalize=True)
print(relative_counts)
Output:
banana 0.5
apple 0.333333
orange 0.166667
dtype: float64
Conclusion:
The value_counts() method is a convenient way to summarize categorical data and understand how frequently each category appears in your dataset.
