NSF Convergence Accelerator Track D:
Towards Intelligent Sharing and Search for AI Models and DatasetsJingbo Shang, Rajesh Gupta, Lucila Ohno-Machado, Arun Kumar, Giorgio Quer
University of California San Diego & Scripps Research
A major goal of AI-driven applications is to discover the underlying patterns in domain-specific datasets, which typically requires tremendous field experience and interdisciplinary knowledge to design or even select suitable AI models. For instance, AI modeling for COVID-19 patient imaging and social distancing datasets requires an understanding of not only the epidemiological processes but also bioinformatics that informs mutation rate and its effects on models, coupled with socio-economic models that accurately capture living and working conditions. Such model selection process is far beyond the capabilities of search services available at existing platforms (e.g., Google Dataset Search, IEEE DataPort, and GitHub).
We envision an open-source, privacy-preserving intelligent system for searching and navigating through large-scale collections of AI models and datasets for scientific and other applications. The envisioned system would transform AI models and datasets into ‘computational resources’ such that model-dataset pairs can be searched and matched easily based on their semantics. It will serve as a sharing portal for models and datasets matched via contextual information, captured as ‘metadata’ that relies upon innovations in metadata methods and tools in the application context. More importantly, the confidential and private information embedded in the models and datasets will be protected by developing novel, rigorous privacy techniques. This way, our system would be able to allow clinicians to upload the patient imaging dataset and issue a query, such as “Coronavirus hazard assessment from chest CT”, and then without risks of leaking patient information, it would return suitable AI models and related datasets.
- PI/Co-PI: Jingbo Shang, Rajesh Gupta, Lucila Ohno-Machado, Arun Kumar, Giorgio Quer
- Senior Personnel: Luca Bonomi, Dezhi Hong
- Graduate Student Research Assistants: Ranak Roy Chowdhury, Zichao Li, Vraj Shah, Xianjie Shen, Zihan Wang, Zeyun Wu
- External Partner: IEEE DataPort, Google Tensorflow Extended Team, Amazon AWS, Databricks, OpenML, Snowflake, Tempus Lab
Publication & Pre-Prints
Unsupervised Deep Keyphrase Generation.
Xianjie Shen, Yinghan Wang, Rui Meng and Jingbo Shang. arXiv:2104.08729.
X-Class: Text Classification with Extremely Weak Supervision
Zihan Wang, Dheeraj Mekala and Jingbo Shang. NAACL 2021. [code]
TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names
Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren and Jiawei Han. NAACL 2021.
“Misc”-Aware Weakly Supervised Aspect Classification
Peiran Li*, Fang Guo* and Jingbo Shang. SDM 2021. [arXiv:2004.14555] [code]
SeNsER: Learning Cross-Building Sensor Metadata Tagger
Yang Jiao*, Jiacheng Li*, Jiaman Wu, Dezhi Hong, Rajesh Gupta and Jingbo Shang. EMNLP (Findings) 2020. [code]
META: Metadata-Empowered Weak Supervision for Text Classification
Dheeraj Mekala, Xinyang Zhang and Jingbo Shang. EMNLP 2020. [code]
A Comprehensive Explanation Framework for Biomedical Time Series Classification
Praharsh Ivaturi, Matteo Gadaleta, Amitabh C Pandey, Michael Pazzani, Steven R Steinhubl, and Giorgio Quer. IEEE J Biomed Health Inform. Feb. 2021.
Machine Learning and the Future of Cardiovascular Care: JACC State-of-the-Art Review
Giorgio Quer, Ramy Arnaout, Michael Henne, and Rima Arnaout. J Am Coll Cardiol. Jan. 2021.
Improving Feature Type Inference Accuracy of TFDV with SortingHat
Vraj Shah, Kevin Yang, and Arun Kumar. Preprint.
jshang [at] ucsd [dot] edu
This project is supported in part by the NSF Convergence Accelerator under award OIA-2040727.