Processing LitCovid with OGER-BB

Table of Contents

1 LitCovid

LitCovid is a collection of PubMed abstracts related to COVID19, released by the National Libray of Medicine. They are categorized by different research topics and geographic locations.

We have processed the LitCovid corpus with our entity recognition tools: Bio Term Hub and OGER-BB, which are described below. The current version (as of [2020-08-26 Wed]) contains 35'313 abstracts.

The annotations are made accessible in different formats:

It is also possible to submit any PubMed abstract (via PubMed ID), PubMed Ventral full text paper (via PMC ID), or any plain text (via cut and paste) to our OGER annotation tool, and have it annotated, see screenshot below.

2 Bio Term Hub (BTH)

bth-databases.png The Bio Term Hub (BTH) is an aggregator of biomedical terminologies sourced from manually curated databases. The BTH allows the quick construction of a terminology resource in a simple standardized format for text mining purposes. The terminologies are sourced from well-known life science databases. The user can select the specific concept types (proteins, genes, diseases, cell lines, etc.) to be included in the generated terminology. The terminologies are provided with unique term identifiers from the original databases. The resources provided by our Bio Term Hub are kept up-to-date by checking the original databases for possible updates. Optionally, the user can request the generation of lexical statistics about the selected terminologies.

Try it at:


Tilia Renate Ellendorff, Adrian van der Lek, Lenz Furrer, Fabio Rinaldi. A Combined Resource of Biomedical Terminology and its Statistics. Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain), pg. 39–50.


OGER is a fast, efficient, dictionary-based annotation tool, which is tightly coupled with the BTH, and allows rapid annotation of large quantities of text.

OGER can be accessed either through a web interface for testing purposes (single document annotation), or as a RESTful web service (typically for batch annotations).



Lenz Furrer, Fabio Rinaldi. OGER: OntoGene’s Entity Recogniser in the BeCalm TIPS Task. Proceedings of the BioCreative V.5 Challenge Evaluation Workshop. Barcelona, Spain, 26–27 April 2017, pg. 175–182


OGER has been coupled with a deep learning model (based on BioBERT), which has been fine-tuned using the CRAFT corpus, in order to increase both precision and recall.

Details of this work can be found in the following papers:

UZH@CRAFT-ST: a Sequence-labeling Approach to Concept Recognition. Lenz Furrer, Joseph Cornelius, Fabio Rinaldi. Proceedings of The 5th Workshop on BioNLP Open Shared Tasks. November 2019.

Parallel sequence tagging for concept recognition. Lenz Furrer, Joseph Cornelius, Fabio Rinaldi (2020).

5 Who are we?

This page is currently maintained by the NLP Group at the Dalle Molle Institute for Artificial Intelligence (IDSIA).

The work described in this page was initially carried out by the Biomedical Text Mining group at the Institute of Computational Linguistics, University of Zurich. It is now being continued at IDSIA where the PI of the group (Fabio Rinaldi) and some group members have moved.

For additional information about the tools and research activities described in this page, please contact Fabio Rinaldi.

Go back to main page

Author: Fabio Rinaldi

Created: 2021-01-14 Thu 15:28