<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://shangjingbo1226.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://shangjingbo1226.github.io/" rel="alternate" type="text/html" /><updated>2026-05-15T19:45:41-07:00</updated><id>https://shangjingbo1226.github.io/feed.xml</id><title type="html">Jingbo Shang</title><subtitle>Associate Professor @ UC San Diego</subtitle><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><entry><title type="html">NSF C-Accel Award OIA-2040727 (Project Page)</title><link href="https://shangjingbo1226.github.io/2020-11-01-nsf-c-accel" rel="alternate" type="text/html" title="NSF C-Accel Award OIA-2040727 (Project Page)" /><published>2020-11-01T11:00:00-08:00</published><updated>2020-11-01T11:00:00-08:00</updated><id>https://shangjingbo1226.github.io/nsf-c-accel</id><content type="html" xml:base="https://shangjingbo1226.github.io/2020-11-01-nsf-c-accel"><![CDATA[<h3 id="nsf-convergence-accelerator-track-d">NSF Convergence Accelerator Track D:</h3>
<center>
<h1>
Towards Intelligent Sharing and Search for AI Models and Datasets
</h1>
Jingbo Shang, Rajesh Gupta, Lucila Ohno-Machado, Arun Kumar, Giorgio Quer<br />
University of California San Diego &amp; Scripps Research<br />
</center>

<h2 id="abstract">Abstract</h2>

<p>A major goal of AI-driven applications is to discover the underlying patterns in domain-specific datasets, which typically requires tremendous field experience and interdisciplinary knowledge to design or even select suitable AI models. For instance, AI modeling for COVID-19 patient imaging and social distancing datasets requires an understanding of not only the epidemiological processes but also bioinformatics that informs mutation rate and its effects on models, coupled with socio-economic models that accurately capture living and working conditions. Such model selection process is far beyond the capabilities of search services available at existing platforms (e.g., Google Dataset Search, IEEE DataPort, and GitHub).</p>

<p>We envision an open-source, privacy-preserving intelligent system for searching and navigating through large-scale collections of AI models and datasets for scientific and other applications. The envisioned system would transform AI models and datasets into ‘computational resources’ such that model-dataset pairs can be searched and matched easily based on their semantics. It will serve as a sharing portal for models and datasets matched via contextual information, captured as ‘metadata’ that relies upon innovations in metadata methods and tools in the application context. More importantly, the confidential and private information embedded in the models and datasets will be protected by developing novel, rigorous privacy techniques. This way, our system would be able to allow clinicians to upload the patient imaging dataset and issue a query, such as “Coronavirus hazard assessment from chest CT”, and then without risks of leaking patient information, it would return suitable AI models and related datasets.</p>

<p>[<a href="https://youtu.be/-yG2ov24ikE">Marking Video</a>]</p>

<h2 id="team">Team</h2>

<ul>
  <li><strong>PI/Co-PI</strong>: <a href="http://shangjingbo1226.github.io/">Jingbo Shang</a>, <a href="http://mesl.ucsd.edu/gupta/">Rajesh Gupta</a>, <a href="https://medschool.ucsd.edu/som/dbmi/people/faculty/Pages/lucila-ohno-machado.aspx">Lucila Ohno-Machado</a>, <a href="http://cseweb.ucsd.edu/~arunkk/">Arun Kumar</a>, <a href="https://www.scripps.edu/science-and-medicine/translational-institute/about/people/giorgio-quer/">Giorgio Quer</a></li>
  <li><strong>Senior Personnel</strong>: <a href="https://medschool.ucsd.edu/som/dbmi/people/fellows/Pages/Luca-Bonomi,-PhD.aspx">Luca Bonomi</a>, <a href="https://cseweb.ucsd.edu/~dehong/">Dezhi Hong</a></li>
  <li><strong>Graduate Student Research Assistants</strong>: Ranak Roy Chowdhury, Zichao Li, Vraj Shah, Xianjie Shen, Zihan Wang, Zeyun Wu</li>
  <li><strong>External Partner</strong>: <a href="https://ieee-dataport.org/">IEEE DataPort</a>, <a href="https://www.tensorflow.org/tfx">Google Tensorflow Extended Team</a>, <a href="https://aws.amazon.com/">Amazon AWS</a>, <a href="https://databricks.com/">Databricks</a>, <a href="https://www.openml.org/">OpenML</a>, <a href="https://www.snowflake.com/">Snowflake</a>, <a href="https://www.tempus.com/">Tempus Lab</a></li>
</ul>

<h2 id="publication--pre-prints">Publication &amp; Pre-Prints</h2>

<ul>
  <li>
    <p>Unsupervised Deep Keyphrase Generation <br />
Xianjie Shen, Yinghan Wang, Rui Meng and Jingbo Shang. <strong>AAAI 2022</strong>. [<a href="https://arxiv.org/abs/2104.08729">arXiv:2104.08729</a>] [<a href="https://github.com/Jayshen0/Unsupervised-Deep-Keyphrase-Generation">code</a>]</p>
  </li>
  <li>
    <p>UniTS: Short-Time Fourier Inspired Neural Networks for Sensory Time Series Classification <br />
Shuheng Li, Ranak Roy Chowdhury, Jingbo Shang, Rajesh K. Gupta and Dezhi Hong. <strong>SenSys</strong> 2021. [<a href="https://github.com/Shuheng-Li/UniTS-Sensory-Time-Series-Classification">code</a>]</p>
  </li>
  <li>
    <p>Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data <br />
Dheeraj Mekala, Varun Gangal and Jingbo Shang. <strong>EMNLP</strong> 2021.</p>
  </li>
  <li>
    <p>“Average” Approximates “First Principal Component”? An Empirical Analysis on Representations from Neural Language Models <br />
Zihan Wang, Chengyu Dong and Jingbo Shang. <strong>EMNLP</strong> 2021. (short) [<a href="https://arxiv.org/abs/2104.08673">arXiv:2104.08673</a>] [<a href="https://github.com/ZihanWangKi/AverageApproxFirstPC">code</a>]</p>
  </li>
  <li>
    <p>BFClass: A Backdoor-free Text Classification Framework <br />
Zichao Li, Dheeraj Mekala, Chengyu Dong and Jingbo Shang. <strong>EMNLP</strong> (Findings) 2021.</p>
  </li>
  <li>
    <p>UCPhrase: Unsupervised Context-aware Quality Phrase Tagging <br />
Xiaotao Gu*, Zihan Wang*, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han and Jingbo Shang. <strong>KDD 2021</strong>. [<a href="https://arxiv.org/abs/2105.14078">arXiv:2105.14078</a>] [<a href="https://github.com/xgeric/UCPhrase-exp">code</a>]</p>
  </li>
  <li>
    <p><a href="https://arxiv.org/abs/2010.12794">X-Class: Text Classification with Extremely Weak Supervision</a> <br />
Zihan Wang, Dheeraj Mekala and Jingbo Shang. <strong>NAACL</strong> 2021. [<a href="https://github.com/ZihanWangKi/XClass">code</a>]</p>
  </li>
  <li>
    <p>TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names <br />
Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren and Jiawei Han. <strong>NAACL</strong> 2021.</p>
  </li>
  <li>
    <p><a href="https://epubs.siam.org/doi/pdf/10.1137/1.9781611976700.53">“Misc”-Aware Weakly Supervised Aspect Classification</a> <br />
Peiran Li*, Fang Guo* and Jingbo Shang. <strong>SDM</strong> 2021. [<a href="https://arxiv.org/abs/2004.14555">arXiv:2004.14555</a>] [<a href="https://github.com/peiranli/ARYA">code</a>]</p>
  </li>
  <li>
    <p>Sensei: Self-Supervised Sensor Name Segmentation <br />
Jiaman Wu, Dezhi Hong, Rajesh Gupta and Jingbo Shang. <strong>ACL</strong> (Findings) 2021. <a href="https://arxiv.org/abs/2101.00130">arXiv:2101.00130</a>. [<a href="https://github.com/work4cs/sensei">code</a>]</p>
  </li>
  <li>
    <p><a href="https://www.dropbox.com/s/tsmu1h9fk90hgg6/%5BEMNLP%2720%20Findings%5DSeNsER-%20Learning%20Cross-Building%20Sensor%20Metadata%20Tagger.pdf?dl=1">SeNsER: Learning Cross-Building Sensor Metadata Tagger</a> <br />
Yang Jiao*, Jiacheng Li*, Jiaman Wu, Dezhi Hong, Rajesh Gupta and Jingbo Shang. <strong>EMNLP</strong> (Findings) 2020. [<a href="https://github.com/JiachengLi1995/SeNsER">code</a>]</p>
  </li>
  <li>
    <p><a href="https://www.dropbox.com/s/95mtglly9bydj5i/%5BEMNLP%2720%5DMETA-%20Metadata-Empowered%20Weak%20Supervision%20for%20Text%20Classification.pdf?dl=1">META: Metadata-Empowered Weak Supervision for Text Classification</a> <br />
Dheeraj Mekala, Xinyang Zhang and Jingbo Shang. <strong>EMNLP</strong> 2020. [<a href="https://github.com/dheeraj7596/META">code</a>]</p>
  </li>
  <li>
    <p>Towards Benchmarking Feature Type Inference for AutoML Platforms<br />
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar. <strong>SIGMOD</strong> 2021 [<a href="https://github.com/mltypebench/MLFeatureTypeInference">code</a>] [<a href="https://adalabucsd.github.io/sortinghat.html">project page</a>]</p>
  </li>
  <li>
    <p><a href="https://pubmed.ncbi.nlm.nih.gov/33617456/">A Comprehensive Explanation Framework for Biomedical Time Series Classification</a><br />
Praharsh Ivaturi, Matteo Gadaleta, Amitabh C Pandey, Michael Pazzani, Steven R Steinhubl, and Giorgio Quer. <strong>IEEE J Biomed Health Inform</strong>. Feb. 2021.</p>
  </li>
  <li>
    <p><a href="https://pubmed.ncbi.nlm.nih.gov/33478654/">Machine Learning and the Future of Cardiovascular Care: JACC State-of-the-Art Review</a><br />
Giorgio Quer, Ramy Arnaout, Michael Henne, and Rima Arnaout. <strong>J Am Coll Cardiol</strong>. Jan. 2021.</p>
  </li>
  <li>
    <p><a href="https://adalabucsd.github.io/papers/TR_2020_TFDV.pdf">Improving Feature Type Inference Accuracy of TFDV with SortingHat</a><br />
Vraj Shah, Kevin Yang, and Arun Kumar. Preprint.</p>
  </li>
</ul>

<h2 id="contact">Contact</h2>

<p>jshang [at] ucsd [dot] edu</p>

<h2 id="acknowledgment">Acknowledgment</h2>

<p>This project is supported in part by the NSF Convergence Accelerator under award <a href="https://app.dimensions.ai/details/grant/grant.9399212">OIA-2040727</a>.</p>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="nsf" /><category term="award" /><category term="project" /><summary type="html"><![CDATA[NSF Convergence Accelerator Track D: Towards Intelligent Sharing and Search for AI Models and Datasets Jingbo Shang, Rajesh Gupta, Lucila Ohno-Machado, Arun Kumar, Giorgio Quer University of California San Diego &amp; Scripps Research]]></summary></entry><entry><title type="html">SIGKDD Dissertation Award Runner-up!</title><link href="https://shangjingbo1226.github.io/2020-08-25-kdd-dissertation-award" rel="alternate" type="text/html" title="SIGKDD Dissertation Award Runner-up!" /><published>2020-08-25T07:00:00-07:00</published><updated>2020-08-25T07:00:00-07:00</updated><id>https://shangjingbo1226.github.io/kdd-dissertation-award</id><content type="html" xml:base="https://shangjingbo1226.github.io/2020-08-25-kdd-dissertation-award"><![CDATA[<p>My PhD thesis “<a href="https://www.ideals.illinois.edu/bitstream/handle/2142/106218/SHANG-DISSERTATION-2019.pdf?sequence=1&amp;isAllowed=y">Constructing and Mining Heterogeneous Information Networks From Massive Text</a>” has been awarded Runner-up in the SIGKDD Dissertation Award competition. Thanks a lot to my advisor Dr. Jiawei Han and the award committee! Here is a <a href="https://www.dropbox.com/s/ipafsdxq1bb8dsp/kdd-dissertation-award-recording.mp4?dl=0">brief video introduction</a> to my work.</p>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="kdd" /><category term="dissertation award" /><category term="2020" /><summary type="html"><![CDATA[My PhD thesis “Constructing and Mining Heterogeneous Information Networks From Massive Text” has been awarded Runner-up in the SIGKDD Dissertation Award competition. Thanks a lot to my advisor Dr. Jiawei Han and the award committee! Here is a brief video introduction to my work.]]></summary></entry><entry><title type="html">KDD 2020 Tutorial</title><link href="https://shangjingbo1226.github.io/2020-08-19-kdd-tutorial" rel="alternate" type="text/html" title="KDD 2020 Tutorial" /><published>2020-08-19T07:00:00-07:00</published><updated>2020-08-19T07:00:00-07:00</updated><id>https://shangjingbo1226.github.io/kdd-tutorial</id><content type="html" xml:base="https://shangjingbo1226.github.io/2020-08-19-kdd-tutorial"><![CDATA[<p>We are going to discuss about “Scientific Text Mining and Knowledge Graphs”. More details can be found <a href="http://www.meng-jiang.com/tutorial-kdd20-scikg.html">here</a>.</p>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="kdd" /><category term="tutorial" /><category term="2020" /><summary type="html"><![CDATA[We are going to discuss about “Scientific Text Mining and Knowledge Graphs”. More details can be found here.]]></summary></entry><entry><title type="html">ICML Paper Accepted!</title><link href="https://shangjingbo1226.github.io/2020-06-01-icml-paper" rel="alternate" type="text/html" title="ICML Paper Accepted!" /><published>2020-06-01T07:00:00-07:00</published><updated>2020-06-01T07:00:00-07:00</updated><id>https://shangjingbo1226.github.io/icml-paper</id><content type="html" xml:base="https://shangjingbo1226.github.io/2020-06-01-icml-paper"><![CDATA[<p>Our novel training method, LipGrow, can save ~50% time for deep ResNets with theory behind!</p>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="icml" /><category term="2020" /><summary type="html"><![CDATA[Our novel training method, LipGrow, can save ~50% time for deep ResNets with theory behind!]]></summary></entry><entry><title type="html">Two ACL Papers!</title><link href="https://shangjingbo1226.github.io/2020-04-04-two-acl-papers" rel="alternate" type="text/html" title="Two ACL Papers!" /><published>2020-04-04T07:00:00-07:00</published><updated>2020-04-04T07:00:00-07:00</updated><id>https://shangjingbo1226.github.io/two-acl-papers</id><content type="html" xml:base="https://shangjingbo1226.github.io/2020-04-04-two-acl-papers"><![CDATA[<p>Recently, we have 2 papers got accepted by ACL. The travel is not clear yet due to the COVID-19. The camera-ready versions are coming soon. Please stay tuned.</p>

<ul>
  <li>Contextualized Weak Supervision for Text Classification. Dheeraj Mekala, Jingbo Shang. <strong>ACL</strong> 2020.</li>
  <li>Empower Entity Set Expansion via Language Model Probing. Yunyi Zhang, Jiaming Shen, Jingbo Shang and Jiawei Han. <strong>ACL</strong> 2020.</li>
</ul>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="acl" /><category term="2020" /><summary type="html"><![CDATA[Recently, we have 2 papers got accepted by ACL. The travel is not clear yet due to the COVID-19. The camera-ready versions are coming soon. Please stay tuned.]]></summary></entry><entry><title type="html">Topcoder X UCSD Lightning Marathon Match</title><link href="https://shangjingbo1226.github.io/2020-02-25-topcoder-ucsd-challenge" rel="alternate" type="text/html" title="Topcoder X UCSD Lightning Marathon Match" /><published>2020-02-25T06:00:00-08:00</published><updated>2020-02-25T06:00:00-08:00</updated><id>https://shangjingbo1226.github.io/topcoder-ucsd-challenge</id><content type="html" xml:base="https://shangjingbo1226.github.io/2020-02-25-topcoder-ucsd-challenge"><![CDATA[<p>To celebrate the annual event of Halıcıoğlu Data Science Institute, we collaborate with Topcoder to bring a Lightning Data Science Marathon Match to UCSD! It will run <strong>from 5 PM PST Feb 26 (Wed) to 5 AM PST Mar 1 (Sunday)</strong>. We will announce the winners during the HDSI annual event on Mar 2.</p>

<p>The match will allow you to tickle your brains and help solve a real-world problem. Make sure you are ready with your coffee to compete in the intense 72-Hour battle to prove yourself as the best of the best! What’s more? While you get a chance to hone your skills, prove yourself as the best of the lot, <strong>you also get to win $2000 and Topcoder T-shirts in prizes</strong>.</p>

<h3 id="topcoder-x-ucsd-lightning-marathon-match">Topcoder X UCSD Lightning Marathon Match</h3>

<p><strong>Problem Statement</strong>: While the problem will be kept secret till the launch of the contest, it will for sure have the following tags: interesting, real-world, data science, machine learning, prediction! The ranking is purely based on the accuracy of your prediction.
Duration: ~84 Hours</p>

<p><strong>Prizes</strong>: $1000, $500, $250, $150, $100 and Topcoder T-shirts for Top 20</p>

<h3 id="how-to-compete">How to compete?</h3>

<p>In order to compete in a Topcoder Marathon Match, you will need to click the Register button next to the appropriate Marathon Match within the Active Challenge List and agree to the rules of the event. Once you register, make sure you go into the challenge forums and check out the discussions there.</p>

<p>Please fill the following form <a href="https://forms.gle/d4Mcsskn6FfcoSNq9">https://forms.gle/d4Mcsskn6FfcoSNq9</a>, if you want to participate in the challenge.</p>

<h3 id="want-to-practice">Want to practice?</h3>
<p>To understand how to compete and submit on the Topcoder platform and get the feel of a Topcoder Marathon Match you can visit this interesting practice match (<a href="https://www.topcoder.com/challenges/30094385">https://www.topcoder.com/challenges/30094385</a>) and participate to practice.</p>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="Topcoder" /><category term="UCSD" /><category term="Data Science" /><summary type="html"><![CDATA[To celebrate the annual event of Halıcıoğlu Data Science Institute, we collaborate with Topcoder to bring a Lightning Data Science Marathon Match to UCSD! It will run from 5 PM PST Feb 26 (Wed) to 5 AM PST Mar 1 (Sunday). We will announce the winners during the HDSI annual event on Mar 2.]]></summary></entry><entry><title type="html">VDLB 2019 Tutorial (Tutorial Page)</title><link href="https://shangjingbo1226.github.io/2019-08-23-vldb-tutorial" rel="alternate" type="text/html" title="VDLB 2019 Tutorial (Tutorial Page)" /><published>2019-08-23T07:00:00-07:00</published><updated>2019-08-23T07:00:00-07:00</updated><id>https://shangjingbo1226.github.io/vldb-tutorial</id><content type="html" xml:base="https://shangjingbo1226.github.io/2019-08-23-vldb-tutorial"><![CDATA[<h3 id="vldb-2019-tutorial">VLDB 2019 Tutorial:</h3>
<center>
<h1>
Tutorial 6: TextCube: Automated Construction and Multidimensional Exploration
</h1>
Yu Meng, Jiaxin Huang, Jingbo Shang, Jiawei Han<br />
Computer Science Department, University of Illinois at Urbana-Champaign<br />
Time: <b>2:00PM - 5:30PM, Aug 29, 2019</b><br />
Location: <b>Avalon</b><br />
</center>

<h2 id="slides">Slides</h2>

<p><a href="https://www.dropbox.com/s/gbqd83zocy2szph/VLDB%2719%20tutorial.pdf?dl=1">Preliminary Version</a></p>

<h2 id="abstract">Abstract</h2>

<p>Today’s society is immersed in a wealth of text data, ranging from news articles, to social media, research literature, medical records, and corporate reports. A grand challenge of data science and engineering is to develop effective and scalable methods to extract structures and knowledge from massive text data to satisfy diverse applications, without extensive, corpus-specific human annotations.</p>

<p>In this tutorial, we show that TextCube provides a critical information organization structure that will satisfy such an information need. We overview a set of recently developed data-driven methods that facilitate automated construction of TextCubes from massive, domain-specific text corpora, and show that TextCubes so constructed will enhance text exploration and analysis for various applications. We focus on new TextCube construction methods that are scalable, weakly-supervised, domain-independent, language-agnostic, and effective (i.e., generating quality TextCubes from large corpora of various domains). We will demonstrate with real datasets (including news articles, scientific publications, and product reviews) on how TextCubes can be constructed to assist multidimensional analysis of massive text corpora.</p>

<h2 id="outline">Outline</h2>

<ol>
  <li>Introduction
    <ul>
      <li>Motivations &amp; Prior Arts</li>
      <li>Overview of Multidimensional Text Analysis</li>
    </ul>
  </li>
  <li>Phrase Mining
    <ul>
      <li>What are quality phrases?</li>
      <li>Supervised Methods
        <ul>
          <li>Noun Phrase Chunking Methods</li>
          <li>Parsing-based Methods</li>
          <li>How to rank entities at the corpus-level?</li>
        </ul>
      </li>
      <li>Unsupervised Methods
        <ul>
          <li>Raw Frequency based Methods</li>
          <li>Concordance based Methods</li>
          <li>Topic Model based Methods</li>
          <li>Comparative Methods</li>
        </ul>
      </li>
      <li>Weakly/Distantly Supervised Methods
        <ul>
          <li>Phrasal Segmentation and its Variants</li>
          <li>How to leverage distant supervision?</li>
        </ul>
      </li>
      <li>System demos and software introduction
        <ul>
          <li>A multilingual phrase mining system which integrates <a href="https://github.com/shangjingbo1226/AutoPhrase">AutoPhrase</a>, <a href="https://github.com/shangjingbo1226/SegPhrase">SegPhrase</a>, and TopMine together and supports phrase mining in multiple languages (e.g., English, Spanish, Chinese, Arabic, and Japanese).</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Text Representation
    <ul>
      <li>Unsupervised Word Embedding
        <ul>
          <li>Context-free representation</li>
          <li>Contextualized representation</li>
        </ul>
      </li>
      <li>Other embeddings: Network embedding
        <ul>
          <li>DeepWalk, LINE, node2vec, …</li>
        </ul>
      </li>
      <li>Category name-guided word embedding
        <ul>
          <li>CatE</li>
        </ul>
      </li>
      <li>System demos and software introduction
        <ul>
          <li>Our CatE system demo</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Entity Recognition
    <ul>
      <li>What is named entity recognition?</li>
      <li>Handcrafted Features + Human Supervision
        <ul>
          <li>Classical Models: Conditional Random Filed</li>
          <li>Standford NER</li>
          <li>Twitter NER</li>
        </ul>
      </li>
      <li>Automated Features + Human Supervision
        <ul>
          <li>LSTM-CRF, LSTM-CNN-CRF, …</li>
          <li>LM-LSTM-CRF, EMLo, Flair, …</li>
          <li>Multi-task learning</li>
        </ul>
      </li>
      <li>Automated Features + Distant Supervision
        <ul>
          <li>AutoEntity, SwellShark, ClusType, Distant-LSTM-CRF, …</li>
          <li>FuzzyCRF &amp; AutoNER</li>
        </ul>
      </li>
      <li>System Demos and Software
        <ul>
          <li>Named entity recognition inference Python package: <a href="https://github.com/LiyuanLucasLiu/LightNER">LightNER</a>. This module helps users easily apply the pre-trained NER models to their own corpus in an efficient and portable manner.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Text Cube Construction
    <ul>
      <li>Taxonomy Basics and Construction</li>
      <li>Cluster-based Taxonomy Construction
        <ul>
          <li>Hierarchical Topic Modeling</li>
          <li>General Graphical Model Approach</li>
          <li>Hierarchical Clustering</li>
        </ul>
      </li>
      <li>Text Cube Basics and Construction
        <ul>
          <li>What is Text Cube?</li>
          <li>Automatic document allocation for Text Cube construction</li>
        </ul>
      </li>
      <li>System Demos and Software
        <ul>
          <li>Publication Dataset Analysis Demo</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Text Cube Exploration
    <ul>
      <li>Cube-based Multidimensional Analysis
        <ul>
          <li>Statistical Measures Aggregation</li>
          <li>Phrase-based Cell Summarization</li>
          <li>Key N-gram based Ranking and Exploration</li>
        </ul>
      </li>
      <li>System Demos and Software
        <ul>
          <li>Demo: MissionCube</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Summary and Future Directions
    <ul>
      <li>Summary of Text Cube
        <ul>
          <li>Principles and Techniques</li>
          <li>Advantages and Limitations</li>
          <li>How to build a text cube based on your application?</li>
        </ul>
      </li>
      <li>Future Directions</li>
    </ul>
  </li>
</ol>

<h2 id="presenters">Presenters</h2>

<p><img align="left" img="" src="/images/img/BIO/yumeng.jpg" alt="Drawing" style="width: 200px;margin-right:50px;" /><strong>Yu Meng</strong>, Ph.D. student, Computer Science, UIUC. His research focuses on mining structured knowledge from massive text corpora with minimum human supervision.</p>

<p><br />
<br />
<br />
<br />
<br />
<br /></p>

<p><img align="left" img="" src="/images/img/BIO/jiaxinhuang.jpg" alt="Drawing" style="width: 200px;margin-right:50px;" /><strong>Jiaxin Huang</strong>, Ph.D. student, Computer Science, UIUC. Her research focuses on mining structured knowledge from massive text corpora. She is the recipient of Chirag Foundation Graduate Fellowship in Computer Science.</p>

<p><br />
<br /></p>

<p><img align="left" img="" src="/images/img/BIO/jingbo.jpg" alt="Drawing" style="width: 200px;margin-right:50px;margin-top:10px" /><strong>Jingbo Shang</strong>, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research focuses on mining and constructing structured knowledge from massive text corpora with minimum human effort. His research has been recognized by multiple prestigious awards, including Grand Prize of Yelp Dataset Challenge (2015), Google PhD Fellowships (2017-2019) on Structured Data and Database Management. Mr. Shang has rich experiences in delivering tutorials in major conferences (SIGMOD’17, WWW’17, SIGKDD’17, SIGKDD’18, SIGKDD’19).</p>

<p><img align="left" img="" src="/images/img/BIO/hanj.jpg" alt="Drawing" style="width: 200px;margin-right:50px;" /><strong>Jiawei Han</strong> is Abel Bliss Professor in the Department of Computer Science at the University of Illinois. He has been researching into data mining, information network analysis, and database systems, with over 600 publications. He served as the founding Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data (TKDD). Jiawei has received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011). He is a Fellow of ACM and a Fellow of IEEE. He is currently the Director of Information Network Academic Research Center (INARC) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab. His co-authored textbook ``Data Mining: Concepts and Techniques’’ (Morgan Kaufmann) has been adopted worldwide.</p>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="vldb" /><category term="tutorial" /><category term="2019" /><summary type="html"><![CDATA[VLDB 2019 Tutorial: Tutorial 6: TextCube: Automated Construction and Multidimensional Exploration Yu Meng, Jiaxin Huang, Jingbo Shang, Jiawei Han Computer Science Department, University of Illinois at Urbana-Champaign Time: 2:00PM - 5:30PM, Aug 29, 2019 Location: Avalon]]></summary></entry><entry><title type="html">Joining UC San Diego as Assistant Professor!</title><link href="https://shangjingbo1226.github.io/2019-07-24-joining-ucsd" rel="alternate" type="text/html" title="Joining UC San Diego as Assistant Professor!" /><published>2019-07-24T12:00:00-07:00</published><updated>2019-07-24T12:00:00-07:00</updated><id>https://shangjingbo1226.github.io/joining-ucsd</id><content type="html" xml:base="https://shangjingbo1226.github.io/2019-07-24-joining-ucsd"><![CDATA[<p>I’m joining UC San Diego as Assistant Professor starting from Jan 2020. I will be jointly appointed by <a href="https://cse.ucsd.edu/">Computer Science and Engineering (CSE)</a> and <a href="https://datascience.ucsd.edu/">Halıcıoğlu Data Science Institute (HDSI)</a>. I’m looking for talented &amp; dedicated students to work together on data-driven, large-scale text mining and network mining problems.</p>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="UCSD" /><category term="AP" /><category term="CSE" /><category term="HDSI" /><summary type="html"><![CDATA[I’m joining UC San Diego as Assistant Professor starting from Jan 2020. I will be jointly appointed by Computer Science and Engineering (CSE) and Halıcıoğlu Data Science Institute (HDSI). I’m looking for talented &amp; dedicated students to work together on data-driven, large-scale text mining and network mining problems.]]></summary></entry><entry><title type="html">KDD 2019 Tutorial Accepted! (Tutorial Page)</title><link href="https://shangjingbo1226.github.io/2019-04-22-kdd-tutorial" rel="alternate" type="text/html" title="KDD 2019 Tutorial Accepted! (Tutorial Page)" /><published>2019-04-22T12:00:00-07:00</published><updated>2019-04-22T12:00:00-07:00</updated><id>https://shangjingbo1226.github.io/kdd-tutorial</id><content type="html" xml:base="https://shangjingbo1226.github.io/2019-04-22-kdd-tutorial"><![CDATA[<h3 id="sigkdd-2019-tutorial">SIGKDD 2019 Tutorial:</h3>
<center>
<h1>
T17: Constructing and Mining Heterogeneous Information Networks from Massive Text
</h1>
Jingbo Shang, Jiaming Shen, Liyuan Liu, Jiawei Han<br />
Computer Science Department, University of Illinois at Urbana-Champaign<br />
Time: <b>1:00PM - 5:00PM, Aug 4, 2019</b><br />
Location: <b>Kahtnu 2-Level 2, Dena’ina</b><br />
</center>

<h2 id="slides">Slides</h2>

<p><a href="https://www.dropbox.com/s/asqpts97hz7zmaf/kdd19-slides-preliminary-version.pdf?dl=0">preliminary version</a></p>

<h2 id="abstract">Abstract</h2>

<p>Real-world data exists largely in the form of unstructured texts. A grand challenge on data mining research is to develop effective and scalable methods that may transform <em>unstructured text</em> into <em>structured knowledge</em>. Based on our vision, it is highly beneficial to transform such text into <em>structured heterogeneous information networks</em>, on which <em>actionable knowledge</em> can be generated based on the user’s need.</p>

<p>In this tutorial, we provide a comprehensive overview on recent research and development in this direction.  First, we introduce a series of effective methods that construct <em>heterogeneous information networks</em> from massive, domain-specific text corpora. Then we discuss methods that mine such text-rich networks based on the user’s need. Specifically, we focus on scalable, effective, weakly supervised, language-agnostic methods that work on various kinds of text. We further demonstrate, on real datasets (including news articles, scientific publications, and product reviews), how information networks can be constructed and how they can assist further exploratory analysis.</p>

<h2 id="outline">Outline</h2>

<ol>
  <li>Introduction
    <ul>
      <li>Motivations: Why construction and mining of heterogeneous information networks from massive text?</li>
      <li>An overview of network construction from massive texts</li>
      <li>An overview on exploration of applications of constructed networks</li>
    </ul>
  </li>
  <li>Phrase Mining
    <ul>
      <li>Why phrase mining and how to define high-quality phrases?</li>
      <li>Supervised Methods
        <ul>
          <li>Noun Phrase Chunking Methods</li>
          <li>Parsing-based Methods</li>
          <li>How to rank entities at the corpus-level?</li>
        </ul>
      </li>
      <li>Unsupervised Methods
        <ul>
          <li>Raw Frequency based Methods</li>
          <li>Concordance based Methods</li>
          <li>Topic Model based Methods</li>
          <li>Comparative Methods</li>
        </ul>
      </li>
      <li>Weakly/Distantly Supervised Methods
        <ul>
          <li>Phrasal Segmentation and its Variants</li>
          <li>How to leverage distant supervision?</li>
        </ul>
      </li>
      <li>System demos and software introduction
        <ul>
          <li>A multilingual phrase mining system which integrates <a href="https://github.com/shangjingbo1226/AutoPhrase">AutoPhrase</a>, <a href="https://github.com/shangjingbo1226/SegPhrase">SegPhrase</a>, and TopMine together and supports phrase mining in multiple languages (e.g., English, Spanish, Chinese, Arabic, and Japanese).</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Information Extraction: Entity, Attribute, and Relation
    <ul>
      <li>What is Named Entity Recognition (NER)?</li>
      <li>Traditional Supervised Methods
        <ul>
          <li>CorNLL03 shared task</li>
          <li>Sequence labeling framework</li>
          <li>Conditional random fields</li>
          <li>Handcrafted features</li>
        </ul>
      </li>
      <li>Modern End-to-End Neural Models
        <ul>
          <li>Bidirectional LSTM-based models</li>
          <li>Language model and contextualized representations</li>
          <li>Raw-to-end models</li>
        </ul>
      </li>
      <li>Distantly Supervised Models
        <ul>
          <li>Data programming for entity typing</li>
          <li>Learning from domain-specific dictionaries</li>
        </ul>
      </li>
      <li>Meta-Pattern based Information Extraction
        <ul>
          <li>Meta-Pattern Discovery</li>
          <li>Meta-Pattern-Enhanced NER</li>
        </ul>
      </li>
      <li>System Demos and Software
        <ul>
          <li>Named entity recognition inference Python package: <a href="https://github.com/LiyuanLucasLiu/LightNER">LightNER</a>. This module helps users easily apply the pre-trained NER models to their own corpus in an efficient and portable manner.</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Taxonomy Construction
    <ul>
      <li>Taxonomy Basics
        <ul>
          <li>Taxonomy Definition</li>
          <li>Taxonomy Application</li>
          <li>Taxonomy Construction Pipeline</li>
        </ul>
      </li>
      <li>Instance-based Taxonomy Construction
        <ul>
          <li>Used Resources Overview</li>
          <li>Pattern-based Methods</li>
          <li>Supervised Methods</li>
          <li>Weakly-supervised Methods</li>
        </ul>
      </li>
      <li>Cluster-based Taxonomy Construction
        <ul>
          <li>Hierarchical Topic Modeling</li>
          <li>General Graphical Model Approach</li>
          <li>Hierarchical Clustering</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Mining Heterogeneous Information Networks (Structured Analysis)
    <ul>
      <li>Basic Analysis System Demo
        <ul>
          <li>AutoNet system: It constructs a huge structured network from the PubMed papers (title &amp; abstract) and supports online construction (new documents) and intelligent exploration (search).</li>
        </ul>
      </li>
      <li>Summarization
        <ul>
          <li>Graph-based Summarization</li>
          <li>Clustering and Ranking for Summarization</li>
        </ul>
      </li>
      <li>Meta-Path Guided Exploration
        <ul>
          <li>Meta-Path based Similarity</li>
          <li>Meta-Path guided Node Embedding</li>
        </ul>
      </li>
      <li>Link Prediction
        <ul>
          <li>Task-Guided Node Embedding</li>
          <li>Link Enrichment in Constructed Networks</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Summary and Future Directions
    <ul>
      <li>Summary
        <ul>
          <li>Principles and Techniques</li>
          <li>AdvantagesandLimitations</li>
        </ul>
      </li>
      <li>Challenges and Future Research Directions</li>
      <li>Interaction with the Audience
        <ul>
          <li>How to construct and mine heterogeneous information networks based on your text data and application need?</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Question Answering and Discussions</li>
</ol>

<h2 id="presenters">Presenters</h2>

<p><img align="left" img="" src="/images/img/BIO/jingbo.jpg" alt="Drawing" style="width: 200px;margin-right:50px;margin-top:10px" /><strong>Jingbo Shang</strong>, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research focuses on mining and constructing structured knowledge from massive text corpora with minimum human effort. His research has been recognized by multiple prestigious awards, including Grand Prize of Yelp Dataset Challenge (2015), Google PhD Fellowships (2017-2019) on Structured Data and Database Management. Mr. Shang has rich experiences in delivering tutorials in major conferences (SIGMOD’17, WWW’17, SIGKDD’17, and SIGKDD’18).</p>

<p><img align="left" img="" src="/images/img/BIO/jiaming.jpeg" alt="Drawing" style="width: 200px;margin-right:50px;" /><strong>Jiaming Shen</strong>, Ph.D. candidate, Department of Com- puter Science, Univ. of Illinois at Urbana-Champaign. His research focuses on turning massive unstructured text cor- pora into structured knowledge, for better retrieval, explo- ration, and analysis of domain-specific corpora. He is the recipient of Brian Totty Graduate Fellowship in 2016.
<br />
<br /></p>

<p><img align="left" img="" src="/images/img/BIO/liyuan.jpg" alt="Drawing" style="width: 200px;margin-right:50px;" /><strong>Liyuan Liu</strong>, Ph.D. candidate, Department of Computer Science, Univ. of Illinois at Urbana-Champaign. His research interest mainly lies in data-driven text mining, including contextualized representations with language modeling, weak and heterogeneous supervision.</p>

<p><br />
<br /></p>

<p><img align="left" img="" src="/images/img/BIO/hanj.jpg" alt="Drawing" style="width: 200px;margin-right:50px;" /><strong>Jiawei Han</strong> is Abel Bliss Professor in the Department of Computer Science at the University of Illinois. He has been researching into data mining, information network analysis, and database systems, with over 600 publications. He served as the founding Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data (TKDD). Jiawei has received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Daniel C. Drucker Eminent Faculty Award at UIUC (2011). He is a Fellow of ACM and a Fellow of IEEE. He is currently the Director of Information Network Academic Research Center (INARC) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab. His co-authored textbook ``Data Mining: Concepts and Techniques’’ (Morgan Kaufmann) has been adopted worldwide.</p>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="kdd" /><category term="tutorial" /><category term="2019" /><summary type="html"><![CDATA[SIGKDD 2019 Tutorial: T17: Constructing and Mining Heterogeneous Information Networks from Massive Text Jingbo Shang, Jiaming Shen, Liyuan Liu, Jiawei Han Computer Science Department, University of Illinois at Urbana-Champaign Time: 1:00PM - 5:00PM, Aug 4, 2019 Location: Kahtnu 2-Level 2, Dena’ina]]></summary></entry><entry><title type="html">WWW Paper Accepted!</title><link href="https://shangjingbo1226.github.io/2019-01-11-www" rel="alternate" type="text/html" title="WWW Paper Accepted!" /><published>2019-01-11T06:00:00-08:00</published><updated>2019-01-11T06:00:00-08:00</updated><id>https://shangjingbo1226.github.io/www</id><content type="html" xml:base="https://shangjingbo1226.github.io/2019-01-11-www"><![CDATA[<p>Our <strong>NetTaxo</strong> paper has been accepted by WWW 2020. The camera-ready versions are coming soon. Please stay tuned.</p>

<ul>
  <li>Jingbo Shang*, Xinyang Zhang*, Liyuan Liu, Sha Li and Jiawei Han, “NetTaxo: Automated Topic Taxonomy Construction from Large-Scale Text-Rich Network”, in Proc. 2020 Int. World Wide Web Conf. (WWW’20), Taipei, Taiwan, Apr. 2020</li>
</ul>]]></content><author><name>Jingbo Shang</name><email>jshang@ucsd.edu</email></author><category term="www" /><category term="2020" /><summary type="html"><![CDATA[Our NetTaxo paper has been accepted by WWW 2020. The camera-ready versions are coming soon. Please stay tuned.]]></summary></entry></feed>