NLPIR-ICTCLAS Chinese lexical analysis system-Index

Multi-language Word Segmentation

Automatic tokenization and POS tagging for both Chinese and English

Keyword Extraction

Use information entropy algorithm to extract keywords for both recorded and unrecorded words.

New words identification and Adaptive Words Segmentation

Identify the new words using information entropy from a given untagged corpus.Add the extracted new words to the language model to analyze the high-frequency words and reach the adaptive segmentation.

User-defined Lexicon

Import the self-defined words into NLPIR system one by one or in bulk, refining the segmentation results with a real-time speed.

NLPIR Chinese lexical analysis system

Multi-language Word Segmentation

Keyword Extraction

New words identification and Adaptive Words Segmentation

User-defined Lexicon