Great news! Our paper of “Automated Phrase Mining from Massive Text Corpora” has been just accepted by IEEE Transactions on Knowledge and Data Engineering! Please checkout the toolkit at my GibHub: AutoPhrase. It currently support English, Spanish, and Chinese.
If you encounter any issue, please raise your question in GitHub. We will try our best to answer your question. Meanwhile, we will keep adding more languages. Japanese and Arabic are on their way.
If you are using our tools, please cite the following two papers:
- Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han, “Automated Phrase Mining from Massive Text Corpora”, accepted by IEEE Transactions on Knowledge and Data Engineering, Feb. 2018.
- Jialu Liu*, Jingbo Shang*, Chi Wang, Xiang Ren and Jiawei Han, “Mining Quality Phrases from Massive Text Corpora”, Proc. of 2015 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’15), Melbourne, Australia, May 2015. (* equally contributed, slides)