Handling NaN and Duplicates

# Introduction In this challenge, you will deal with a dataset containing missing (NaN) values and duplicate entries. The main objective is to clean and preprocess the dataset by handling these NaN and duplicate values using pandas library. This challenge will test your ability to work with complex data structures, manipulate and analyze data, and make decisions based on the dataset's characteristics. ## Load data as Pandas frames In this challenge, you will work with a hypothetical dataset, which contains information about sales transactions for an e-commerce store. The dataset consists of the following columns: 1. `transaction_id`: A unique identifier for each transaction (integer) 2. `customer_id`: A unique identifier for each customer (integer) 3. `product_id`: A unique identifier for each product (integer) 4. `product_category`: The category of the product (string) 5. `transaction_date`: The date of the transaction (date) 6. `quantity`: The number of units purchased (integer) 7. `price`: The price per unit (float) 8. `rating`: The customer's rating for the product (float, 1-5) Here are some sample records from the dataset: | transaction_id | customer_id | product_id | product_category | transaction_date | quantity | price | rating | | -------------- | ----------- | ---------- | ---------------- | ---------------- | -------- | ------ | ------ | | 1 | 101 | 301 | Electronics | NaN | 2 | 199.99 | NaN | You can load data as a Pandas DataFrame in the following ways. ```python df = pd.DataFrame({ 'transaction_id': [1], 'customer_id': [101], 'product_id': [301], 'product_category': ['Electronics'], 'transaction_date': [None], 'quantity': [2], 'price': [199.99], 'rating': [None] }) ```

|
60 : 00

Click the virtual machine below to start practicing