SIGKDD 2017 Tutorial:

Mining Entity-Relation-Attribute Structures from Massive Text Data

Jingbo Shang, Xiang Ren, Meng Jiang, Jiawei Han
Computer Science Department, University of Illinois at Urbana-Champaign
Time: Aug 13, 2017, Sunday, 8:00AM --- 12:00PM
Location: Room 200C2


Entity-Relation-Attribute (ERA) structures, forming structured networks between entities and attributes, have demonstrated the flexibility of storing rich information and the effectiveness of gaining insights and knowledge. However, the majority of massive amount of data in the real world are unstructured text, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). Without heavy human annotations and curations, most of existing approaches have difficulties in extracting named entities and their relations as well as typing and organizing knowledge as networks.


We provide the preliminary versions of slides here. Separated slides for each part can be found in the following sections. The final version will be updated after the tutorial.



Part I. Entity Extraction through Phrase Mining


Part II. Typing Entities and Relations


Part III. Pattern-based Methods for Attribute Discovery


  • Meng Jiang, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance Kaplan, Timothy Hanratty and Jiawei Han, “MetaPAD: Meta Patten Discovery from Massive Text Corpora”, in Proc. of 2017 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD’17), Halifax, Nova Scotia, Canada, Aug. 2017

Part IV. Structure Discovery from Text: Application Exploration


Summary and Future Directions



Drawing Jingbo Shang is a Ph.D. student in Department of Computer Science, University of Illinois at Urbana-Champaign. His research focuses on mining and constructing structured knowledge from massive text corpora. He is the recipient of Computer Science Excellence Scholarship and Grand Prize of Yelp Dataset Challenge in 2015. He received Google PhD Fellowship in Structured Data and Database Management in 2017.

DrawingXiang Ren is an incoming assistant professor in the Department of Computer Science at USC and a member of the USC Machine Learning Center. Currently, he is a visiting researcher at Stanford University, and a PhD candidate of Computer Science at UIUC where he works with Jiawei Han. Xiang’s research develops data-driven and machine learning methods for turning unstructured text data into machine-actionable structures. More broadly, his research interests span data mining, machine learning, and natural language processing, with a focus on making sense of big text data. His research has been recognized with several prestigious awards including a Google PhD Fellowship, a Yahoo!-DAIS Research Excellence Award, a WWW 2017 Outstanding Reviewer Award, a Yelp Dataset Challenge award and a C. W. Gear Outstanding Graduate Student Award from CS@Illinois. Technologies he developed has been transferred to US Army Research Lab, National Institute of Health, Microsoft, Yelp and TripAdvisor.

DrawingMeng Jiang received his B.E. degree and Ph.D. degree in 2010 and 2015 at the Department of Computer Science and Technology in Tsinghua University. He is now an assistant professor in the Department of Computer Science and Engineering at the University of Notre Dame. He worked as a postdoctoral research associate at University of Illinois at Urbana-Champaign from 2015 to 2017. He has published over 20 papers on behavior modeling and information extraction in top conferences and journals of the relevant field such as IEEE TKDE, ACM SIGKDD, AAAI, ACM CIKM and IEEE ICDM. He also has delivered six tutorials on the same topics in major conferences. He got the best paper finalist in ACM SIGKDD 2014.

DrawingJiawei Han is Abel Bliss Professor in the Department of Computer Science at the University of Illinois. He has been researching into data mining, information network analysis, and database systems, with over 600 publications. He served as the founding Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data (TKDD). Jiawei has received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011). He is a Fellow of ACM and a Fellow of IEEE. He is currently the Director of Information Network Academic Research Center (INARC) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab. His co-authored textbook ``Data Mining: Concepts and Techniques’’ (Morgan Kaufmann) has been adopted worldwide.