One Cut Into Two | Challenge

# Introduction In this challenge, we will be implementing a function that performs tokenization on a given string. The tokenization process includes character-level tokenization for Chinese characters, subword-level tokenization using the greedy longest-match-first algorithm for English words, and the removal of all other symbols. The function will take a string as input and return the tokenization result as a list.

Click the virtual machine below to start practicing