Computational Linguistics for COVID-19 !

1. Social Media Mining
2. Who are we?

1 Social Media Mining

1.1 People

Joseph Cornelius, Tilia Ellendorff, Fabio Rinaldi

1.2 Activities

The goal is to analyze social media mentions of COVID-related issues and to derive useful insights. So far we have been working with the PANACEA dataset of about 40 million tweets (see below).

current results
See also Joseph's github.

1.2.1 Timeline

[2020-04-06 Mon] sentiment analysis on specific hash tags
[2020-04-02 Thu] topic models
[2020-03-25 Wed] basic preprocessing, language distribution, extraction of URLs
[2020-03-24 Tue] started working on the PANACEA dataset

1.3 Datasets

PANACEA dataset: Covid-19 Twitter chatter dataset for scientific use, collected by the PanaceaLab at Georgia State University. The corpus contains COVID-related tweets from Jan 1^st 2020. The first part of the collection (until March 11, 2020) contains tweets related to Coronavirus which were part of a dataset collected for other purposed. From March 11^th they started collecting tweets specifically related to coronavirus, about 4.4 million a day.

2 Who are we?

This page is currently maintained by the NLP Group at the Dalle Molle Institute for Artificial Intelligence (IDSIA).

The work described in this page was initially carried out by the Biomedical Text Mining group at the Institute of Computational Linguistics, University of Zurich. It is now being continued at IDSIA where the PI of the group (Fabio Rinaldi) and some group members have moved.

For additional information about the tools and research activities described in this page, please contact Fabio Rinaldi.

Go back to main page