Populating a Database from Parallel Texts Using Ontology-Based Information Extraction
Legacy data in many mature descriptive sciences is distributed across multiple text descriptions. The challenge is both to extract this data, and to correlate it once extracted. The MultiFlora system does this using an established Information Extraction system tuned to the domain of botany and integrated with a formal ontology to structure and store the data. A range of output formats are supported through the W3C RDFS standard, making it simple to populate a database as desired.