NSF Convergence Accelerator Track D:
Towards Intelligent Sharing and Search for AI Models and DatasetsJingbo Shang, Rajesh Gupta, Lucila Ohno-Machado, Arun Kumar, Giorgio Quer
University of California San Diego & Scripps Research
A major goal of AI-driven applications is to discover the underlying patterns in domain-specific datasets, which typically requires tremendous field experience and interdisciplinary knowledge to design or even select suitable AI models. For instance, AI modeling for COVID-19 patient imaging and social distancing datasets requires an understanding of not only the epidemiological processes but also bioinformatics that informs mutation rate and its effects on models, coupled with socio-economic models that accurately capture living and working conditions. Such model selection process is far beyond the capabilities of search services available at existing platforms (e.g., Google Dataset Search, IEEE DataPort, and GitHub).
We envision an open-source, privacy-preserving intelligent system for searching and navigating through large-scale collections of AI models and datasets for scientific and other applications. The envisioned system would transform AI models and datasets into ‘computational resources’ such that model-dataset pairs can be searched and matched easily based on their semantics. It will serve as a sharing portal for models and datasets matched via contextual information, captured as ‘metadata’ that relies upon innovations in metadata methods and tools in the application context. More importantly, the confidential and private information embedded in the models and datasets will be protected by developing novel, rigorous privacy techniques. This way, our system would be able to allow clinicians to upload the patient imaging dataset and issue a query, such as “Coronavirus hazard assessment from chest CT”, and then without risks of leaking patient information, it would return suitable AI models and related datasets.
- PI/Co-PI: Jingbo Shang, Rajesh Gupta, Lucila Ohno-Machado, Arun Kumar, Giorgio Quer
- Senior Personnel: Luca Bonomi, Dezhi Hong
- Graduate Student Research Assistants: Ranak Roy Chowdhury, Zichao Li, Vraj Shah, Xianjie Shen, Zihan Wang, Zeyun Wu
- External Partner: IEEE DataPort
Publication & Pre-Prints
XClass: Text Classification with Extremely Weak Supervision
Zihan Wang, Dheeraj Mekala, Jingbo Shang. arXiv:2010.12794. [code]
META: Metadata-Empowered Weak Supervision for Text Classification
Dheeraj Mekala, Xinyang Zhang and Jingbo Shang. EMNLP 2020. [code]
SeNsER: Learning Cross-Building Sensor Metadata Tagger
Yang Jiao*, Jiacheng Li*, Jiaman Wu, Dezhi Hong, Rajesh Gupta and Jingbo Shang. EMNLP (Findings) 2020. [code]
jshang [at] ucsd [dot] edu
This project is supported in part by the NSF Convergence Accelerator under award OIA-2040727.