NSF C-Accel Award OIA-2040727 (Project Page)

4 minute read

Published: November 01, 2020

NSF Convergence Accelerator Track D:

Towards Intelligent Sharing and Search for AI Models and Datasets

Jingbo Shang, Rajesh Gupta, Lucila Ohno-Machado, Arun Kumar, Giorgio Quer
University of California San Diego & Scripps Research

Abstract

A major goal of AI-driven applications is to discover the underlying patterns in domain-specific datasets, which typically requires tremendous field experience and interdisciplinary knowledge to design or even select suitable AI models. For instance, AI modeling for COVID-19 patient imaging and social distancing datasets requires an understanding of not only the epidemiological processes but also bioinformatics that informs mutation rate and its effects on models, coupled with socio-economic models that accurately capture living and working conditions. Such model selection process is far beyond the capabilities of search services available at existing platforms (e.g., Google Dataset Search, IEEE DataPort, and GitHub).

We envision an open-source, privacy-preserving intelligent system for searching and navigating through large-scale collections of AI models and datasets for scientific and other applications. The envisioned system would transform AI models and datasets into ‘computational resources’ such that model-dataset pairs can be searched and matched easily based on their semantics. It will serve as a sharing portal for models and datasets matched via contextual information, captured as ‘metadata’ that relies upon innovations in metadata methods and tools in the application context. More importantly, the confidential and private information embedded in the models and datasets will be protected by developing novel, rigorous privacy techniques. This way, our system would be able to allow clinicians to upload the patient imaging dataset and issue a query, such as “Coronavirus hazard assessment from chest CT”, and then without risks of leaking patient information, it would return suitable AI models and related datasets.

[Marking Video]

Team

PI/Co-PI: Jingbo Shang, Rajesh Gupta, Lucila Ohno-Machado, Arun Kumar, Giorgio Quer
Senior Personnel: Luca Bonomi, Dezhi Hong
Graduate Student Research Assistants: Ranak Roy Chowdhury, Zichao Li, Vraj Shah, Xianjie Shen, Zihan Wang, Zeyun Wu
External Partner: IEEE DataPort, Google Tensorflow Extended Team, Amazon AWS, Databricks, OpenML, Snowflake, Tempus Lab

Publication & Pre-Prints

Unsupervised Deep Keyphrase Generation
Xianjie Shen, Yinghan Wang, Rui Meng and Jingbo Shang. AAAI 2022. [arXiv:2104.08729] [code]
UniTS: Short-Time Fourier Inspired Neural Networks for Sensory Time Series Classification
Shuheng Li, Ranak Roy Chowdhury, Jingbo Shang, Rajesh K. Gupta and Dezhi Hong. SenSys 2021. [code]
Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data
Dheeraj Mekala, Varun Gangal and Jingbo Shang. EMNLP 2021.
“Average” Approximates “First Principal Component”? An Empirical Analysis on Representations from Neural Language Models
Zihan Wang, Chengyu Dong and Jingbo Shang. EMNLP 2021. (short) [arXiv:2104.08673] [code]
BFClass: A Backdoor-free Text Classification Framework
Zichao Li, Dheeraj Mekala, Chengyu Dong and Jingbo Shang. EMNLP (Findings) 2021.
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging
Xiaotao Gu*, Zihan Wang*, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han and Jingbo Shang. KDD 2021. [arXiv:2105.14078] [code]
X-Class: Text Classification with Extremely Weak Supervision
Zihan Wang, Dheeraj Mekala and Jingbo Shang. NAACL 2021. [code]
TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names
Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren and Jiawei Han. NAACL 2021.
“Misc”-Aware Weakly Supervised Aspect Classification
Peiran Li*, Fang Guo* and Jingbo Shang. SDM 2021. [arXiv:2004.14555] [code]
Sensei: Self-Supervised Sensor Name Segmentation
Jiaman Wu, Dezhi Hong, Rajesh Gupta and Jingbo Shang. ACL (Findings) 2021. arXiv:2101.00130. [code]
SeNsER: Learning Cross-Building Sensor Metadata Tagger
Yang Jiao*, Jiacheng Li*, Jiaman Wu, Dezhi Hong, Rajesh Gupta and Jingbo Shang. EMNLP (Findings) 2020. [code]
META: Metadata-Empowered Weak Supervision for Text Classification
Dheeraj Mekala, Xinyang Zhang and Jingbo Shang. EMNLP 2020. [code]
Towards Benchmarking Feature Type Inference for AutoML Platforms
Vraj Shah, Jonathan Lacanlale, Premanand Kumar, Kevin Yang, and Arun Kumar. SIGMOD 2021 [code] [project page]
A Comprehensive Explanation Framework for Biomedical Time Series Classification
Praharsh Ivaturi, Matteo Gadaleta, Amitabh C Pandey, Michael Pazzani, Steven R Steinhubl, and Giorgio Quer. IEEE J Biomed Health Inform. Feb. 2021.
Machine Learning and the Future of Cardiovascular Care: JACC State-of-the-Art Review
Giorgio Quer, Ramy Arnaout, Michael Henne, and Rima Arnaout. J Am Coll Cardiol. Jan. 2021.
Improving Feature Type Inference Accuracy of TFDV with SortingHat
Vraj Shah, Kevin Yang, and Arun Kumar. Preprint.

Contact

jshang [at] ucsd [dot] edu

Acknowledgment

This project is supported in part by the NSF Convergence Accelerator under award OIA-2040727.

Share on

Twitter Facebook LinkedIn