PADI-web

The Platform for Automated extraction of animal Disease Information from the web (PADI-web) automatically collects news via customized multilingual queries, classifies them and extracts epidemiological information. We detail each step of the PADI-web pipeline, with a focus on the new user-oriented features.

PADI-web retrieves articles daily from the news aggregator Google News through customized RSS feeds. An RSS feed is a combination of terms (disease names, symptoms or hosts). These terms have been identified by an approach combining text mining and domain experts. The RSS feeds are of two types:

  • Disease-based surveillance consists of disease names and target seven animal diseases.
  • Symptom-based surveillance includes clinical signs and hosts without any disease names.

RSS feeds are implemented in 28 languages (e.g. English, French, Chinese, Arabic, Italian, Russian, Turkish, etc.).