Computational Linguistics for COVID-19 !
Table of Contents
Introduction
In relation to the COVID19 pandemic we are performing text mining activities that might be relevant for research in this deadly disease, which is having a major impact on our society.
In particular we focus on the following areas of research:
-
Our goal is to process automatically COVID19-related scientific publications, in order to detect mentions of domain specific entities of particular relevance (such as genes, symptoms, drugs, organs, etc.). The primary purpose of this work is enhancing accessibility to the literature, for example simplifying the search of papers dealing with a particular gene, or identifying unexpected connections between different entities.
We process and make available two datasets:
- The LitCovid dataset (abstracts only)
- The LitCovid PMC dataset (full text papers, subset of the above)
We also provide access via API to the pipeline that we use to automatically annotate the articles.
-
A second line of research involves the analysis of social media conversations (twitter in particular) related to the COVID19 pandemic. Different types of visualization and analysis enable to investigate variable trends in the public perception of the disease, and of the measures taken to deal with it.
Simplifying access to the COVID-19 literature for Spanish-speaking clinicians.
As part of a multi-institutional collaboration we are contributing to a repository of clinical literature with Spanish translations, and a classification into clinically-relevant categories.
This initiative started as a manually classified collection of COVID-19 literature (relevant in a clinical context), and was later extended to include automatic classification of new papers, and semi-automated translation in Spanish of the abstracts.
Currently the repository is based at the Mexican National University (UNAM), and it is supported by contributors from the University of Zurich, IDSIA, and UNAM.
Annotation API
We provide a fast, efficient, accurate document annotation service. It will find mentions of biomedically relevant entities in any document provided as input. Please find information here.
Collaborations
We collaborate with other research groups on COVID19-related tasks:
- with the NLP group at FBK, Italy, in order to extract from the literature relationships between COVID-19 and other relevant domain entities.
- with the Hunter Group at the University of Colorado, Denver. The goal is to provide rich annotations on a large literature dataset, such as CORD-19.
- with the RegulonDB group (Julio Collado Vides) of UNAM, Mexico. The purpose is to create a "linked dataset" of COVID19 literature, which will show interrelationships among different publications.
Other NLP+COVID19 initiatives
- ACL NLP COVID-19 Workhshop
- COVID 19 virtual biohackathon, April 5-11, Results
- CORD-19 Research Dataset Challenge, a kaggle-based challenge, organized around a number of literature-based discovery tasks, supported by the CORD-19 dataset.
- All LitCovid annotations on PubAnnotations
Other useful resources
- LitCovid in PubTator
- PMC COVID-19 Initiative
- IR task on CORD-19
- Oxford COVID-19 Evidence Service
- Neural Covidex A "semantic" search engine on the CORD-19 corpus, based on neural network models.
General information about COVID-19
Who are we?
This page is currently maintained by the NLP Group at the Dalle Molle Institute for Artificial Intelligence (IDSIA).
The work described in this page was initially carried out by the Biomedical Text Mining group at the Institute of Computational Linguistics, University of Zurich. It is now being continued at IDSIA where the PI of the group (Fabio Rinaldi) and some group members have moved.
For additional information about the tools and research activities described in this page, please contact Fabio Rinaldi.