Text Document Classification

# Introduction This lab demonstrates how to use scikit-learn to classify text documents into different categories. We will use the 20 newsgroups dataset, which contains around 18,000 newsgroup posts on 20 topics. We will use a bag of words approach and a Tf-idf-weighted document-term sparse matrix to encode the features. The lab will also demonstrate various classifiers that can efficiently handle sparse matrices. ## VM Tips After the VM startup is done, click the top left corner to switch to the **Notebook** tab to access Jupyter Notebook for practice. Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook. If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

|60 : 00

Click the virtual machine below to start practicing