PADI-web
The Platform for Automated extraction of animal Disease Information from the web (PADI-web) automatically collects news via customized multilingual queries, classifies them and extracts epidemiological information. We detail each step of the PADI-web pipeline, with a focus on the new user-oriented features.
PADI-web retrieves articles daily from the news aggregator Google News through customized RSS feeds. An RSS feed is a combination of terms (disease names, symptoms or hosts). These terms have been identified by an approach combining text mining and domain experts. The RSS feeds are of two types:
- Disease-based surveillance consists of disease names and target seven animal diseases.
- Symptom-based surveillance includes clinical signs and hosts without any disease names.
RSS feeds are implemented in 28 languages (e.g. English, French, Chinese, Arabic, Italian, Russian, Turkish, etc.).