Processing LitCovid/PMC with OGER-BB
Table of Contents
1 LitCovid/PMC
LitCovid is set of COVID19-related abstracts, released by the National Libray of Medicine. A subset of the LitCovid paper has a corresponding full text available from PubMed Central. We call this subset LitCovid/PMC. As of it contains 20617 full text articles, with 55951606 annotations in total.
As we have done previously with the LitCovid corpus, we have now processed the LitCovid/PMC corpus with our entity recognition tools, Bio Term Hub and OGER-BB. The results are made accessible in different formats:
from our own servers using the brat tool (see screenshot below):
https://pub.cl.uzh.ch/projects/ontogene/brat/#/LitCovidPMC/ (beware, it might be slow to load)
How to use: select a pubmed identifier from the initial menu, and the corresponding abstract will be visualized, with annotations. You can then browse through the abstracts using the top arrows.
- as downloadable archive files in different formats:
- in BioC json format: covid19lit-pmc.bioc.json.tgz
- in a simple tab separated format: covid19lit-pmc.tsv.tgz
- the tab separated format gives the entities and their position in the original documents, which can be found here: covid19lit-pmc.txt.tgz
through DBCLS's PubAnnotation interface:
- we have also submitted this dataset to Europe PMC
1.1 Timeline
Processing full text articles is computationally quite demanding, therefore we cannot update this dataset frequently.
The first version was processed mid June, and made available on June 21. It contained 7883 full-text articles.
The current version was downloaded on October 2, 2020. After processing it, we made it available on October 14, 2020. This dataset contains 20617 full text articles, with 55951606 annotations in total.
Last update:
2 How to use OGER
You can submit a pubmed abstract (via PubMed ID), a pubmed central full text paper (via PMC ID), or any plain text (via cut and paste) to our OGER annotation service.
3 Who are we?
This page is currently maintained by the NLP Group at the Dalle Molle Institute for Artificial Intelligence (IDSIA).
The work described in this page was initially carried out by the Biomedical Text Mining group at the Institute of Computational Linguistics, University of Zurich. It is now being continued at IDSIA where the PI of the group (Fabio Rinaldi) and some group members have moved.
For additional information about the tools and research activities described in this page, please contact Fabio Rinaldi.