SIGKDD 2017 Tutorial:
Mining Entity-Relation-Attribute Structures from Massive Text DataJingbo Shang, Xiang Ren, Meng Jiang, Jiawei Han
Computer Science Department, University of Illinois at Urbana-Champaign
Time: Aug 13, 2017, Sunday, 8:00AM --- 12:00PM
Location: Room 200C2
Entity-Relation-Attribute (ERA) structures, forming structured networks between entities and attributes, have demonstrated the flexibility of storing rich information and the effectiveness of gaining insights and knowledge. However, the majority of massive amount of data in the real world are unstructured text, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). Without heavy human annotations and curations, most of existing approaches have difficulties in extracting named entities and their relations as well as typing and organizing knowledge as networks.
We provide the preliminary versions of slides here. Separated slides for each part can be found in the following sections. The final version will be updated after the tutorial.
Part I. Entity Extraction through Phrase Mining
- Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, Jiawei Han, “Automated Phrase Mining from Massive Text Corpora”, submitted to TKDE, under review.
- Jialu Liu, Jingbo Shang, and Jiawei Han, “Phrase Mining from Massive Text and Its Applications”, Synthesis Lectures on Data Mining and Knowledge Discovery, Morgan & Claypool Publishers, 2017.
- Jialu Liu, Jingbo Shang, Chi Wang, Xiang Ren, Jiawei Han, “Mining Quality Phrases from Massive Text Corpora”, in Proc. of 2015 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’15), Melbourne, Australia, May 2015 (won Grand Prize in Yelp Dataset Challenge, 2015)
- Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare R. Voss, and Jiawei Han, “Scalable Topical Phrase Mining from Text Corpora”, PVLDB 8(3): 305 - 316, 2015. Also, in Proc. 2015 Int. Conf. on Very Large Data Bases (VLDB’15), Kohala Coast, Hawaii, Sept. 2015.
Part II. Typing Entities and Relations
- Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji and Jiawei Han, “Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach”, in Proc. of 2017 Conf. on Empirical Methods in Natural Language Processing (EMNLP’17), Copenhagen, Denmark, Sept. 2017
- Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare Voss, Heng Ji, Tarek Abdelzaher and Jiawei Han, “CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases”, in Proc. of 2017 World-Wide Web Conf. (WWW’17), Perth, Australia, Apr. 2017.
- Xiang Ren, Wenqi He, Meng Qu, Lifu Huang, Heng Ji, and Jiawei Han, “AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding”, in Proc. of 2016 Conf. on Empirical Methods in Natural Language Processing (EMNLP’16), Austin, TX, Nov. 2016
- Xiang Ren, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Jiawei Han, “Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding”, in Proc. of 2016 ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD’16), San Francisco, CA, Aug. 2016
- Xiang Ren, Ahmed El-Kishky, Chi Wang, Fangbo Tao, Clare R. Voss, Heng Ji, Jiawei Han, “ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering”, in Proc. of 2015 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’15), Sydney, Australia, Aug. 2015
Part III. Pattern-based Methods for Attribute Discovery
- Meng Jiang, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance Kaplan, Timothy Hanratty and Jiawei Han, “MetaPAD: Meta Patten Discovery from Massive Text Corpora”, in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’17), Halifax, Nova Scotia, Canada, Aug. 2017
Part IV. Structure Discovery from Text: Application Exploration
- Huan Gui, Qi Zhu, Liyuan Liu, Aston Zhang, and Jiawei Han, “Expert Finding in Heterogeneous BibliographicNetworks with Locally-trained Embeddings”, submitted for publication
- Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han, “SetExpan: Corpus-based Set Expansion via Context Feature Selection and Rank Ensemble”, in Proc. of 2017 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’17), Skopje, Macedonia, Sept. 2017
- Meng Qu, Xiang Ren and Jiawei Han, “Automatic Synonym Discovery with Knowledge Bases”, in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’17), Halifax, Nova Scotia, Canada, Aug. 2017
- Fangbo Tao, Honglei Zhuang, Chi Wang Yu, Qi Wang, Taylor Cassidy, Lance Kaplan, Clare Voss, Jiawei Han, “Multi-Dimensional, Phrase-Based Summarization in Text Cubes”, Data Eng. Bulletin 39(3), Sept. 2016, pp. 74-84.
- Jialu Liu, Xiang Ren, Jingbo Shang, Taylor Cassidy, Clare Voss and Jiawei Han, “Representing Documents via Latent Keyphrase Inference”, in Proc. of 2016 Int. World-Wide Web Conf. (WWW’16), Montreal, Canada, April 2016
Summary and Future Directions
Jingbo Shang is a Ph.D. student in Department of Computer Science, University of Illinois at Urbana-Champaign. His research focuses on mining and constructing structured knowledge from massive text corpora. He is the recipient of Computer Science Excellence Scholarship and Grand Prize of Yelp Dataset Challenge in 2015. He received Google PhD Fellowship in Structured Data and Database Management in 2017.
Xiang Ren is an incoming assistant professor in the Department of Computer Science at USC and a member of the USC Machine Learning Center. Currently, he is a visiting researcher at Stanford University, and a PhD candidate of Computer Science at UIUC where he works with Jiawei Han. Xiang’s research develops data-driven and machine learning methods for turning unstructured text data into machine-actionable structures. More broadly, his research interests span data mining, machine learning, and natural language processing, with a focus on making sense of big text data. His research has been recognized with several prestigious awards including a Google PhD Fellowship, a Yahoo!-DAIS Research Excellence Award, a WWW 2017 Outstanding Reviewer Award, a Yelp Dataset Challenge award and a C. W. Gear Outstanding Graduate Student Award from CS@Illinois. Technologies he developed has been transferred to US Army Research Lab, National Institute of Health, Microsoft, Yelp and TripAdvisor.
Meng Jiang received his B.E. degree and Ph.D. degree in 2010 and 2015 at the Department of Computer Science and Technology in Tsinghua University. He is now an assistant professor in the Department of Computer Science and Engineering at the University of Notre Dame. He worked as a postdoctoral research associate at University of Illinois at Urbana-Champaign from 2015 to 2017. He has published over 20 papers on behavior modeling and information extraction in top conferences and journals of the relevant field such as IEEE TKDE, ACM SIGKDD, AAAI, ACM CIKM and IEEE ICDM. He also has delivered six tutorials on the same topics in major conferences. He got the best paper finalist in ACM SIGKDD 2014.
Jiawei Han is Abel Bliss Professor in the Department of Computer Science at the University of Illinois. He has been researching into data mining, information network analysis, and database systems, with over 600 publications. He served as the founding Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data (TKDD). Jiawei has received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011). He is a Fellow of ACM and a Fellow of IEEE. He is currently the Director of Information Network Academic Research Center (INARC) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab. His co-authored textbook ``Data Mining: Concepts and Techniques’’ (Morgan Kaufmann) has been adopted worldwide.