Computational Linguistics for COVID-19 !
Table of Contents
1 Social Media Mining
1.1 People
Joseph Cornelius, Tilia Ellendorff, Fabio Rinaldi
1.2 Activities
The goal is to analyze social media mentions of COVID-related issues and to derive useful insights. So far we have been working with the PANACEA dataset of about 40 million tweets (see below).
- current results
- See also Joseph's github.
1.2.1 Timeline
- sentiment analysis on specific hash tags
- topic models
- basic preprocessing, language distribution, extraction of URLs
- started working on the PANACEA dataset
1.3 Datasets
- PANACEA dataset: Covid-19 Twitter chatter dataset for scientific use, collected by the PanaceaLab at Georgia State University. The corpus contains COVID-related tweets from Jan 1st 2020. The first part of the collection (until March 11, 2020) contains tweets related to Coronavirus which were part of a dataset collected for other purposed. From March 11th they started collecting tweets specifically related to coronavirus, about 4.4 million a day.
2 Who are we?
This page is currently maintained by the NLP Group at the Dalle Molle Institute for Artificial Intelligence (IDSIA).
The work described in this page was initially carried out by the Biomedical Text Mining group at the Institute of Computational Linguistics, University of Zurich. It is now being continued at IDSIA where the PI of the group (Fabio Rinaldi) and some group members have moved.
For additional information about the tools and research activities described in this page, please contact Fabio Rinaldi.