Search Platform/Weekly Updates/2024-12-13
Appearance
Ongoing work
Language Stuff: Kuromoji+Sudachi
- Recently spent a fair amount of time improving analysis tools to deal with issues related to running and reviewing Sudachi tokenizer (suggested as an option by a reviewer).
- Determined reasonable Sudachi config and ran examples for review.
- Completed first analysis of evals of Kuromoji and ICU tokenizers. Kuromoji is better, and very good on Wikipedia articles and queries (at the sentence level). It is less good on Wiktionary articles and queries, so I need to look a little more closely.