Knowledge bases are incredible enablers of valuable downstream applications such as information retrieval, question answering, medical diagnosis, and data visualization. However, building high quality knowledge bases can be incredibly difficult. While extensive efforts have been focused on unstructured text, troves of information remains untapped in richly formatted data, where relations are conveyed using textual, structural, tabular, and visual cues.
We recently built Fonduer, a knowledge base construction framework for richly formatted information extraction. Fonduer is the first knowledge base construction system for richly formatted data, and uses a new unified data model, which preserves structural and semantic information across different data modalities, and a human-in-the-loop paradigm called data programming to train machine learning systems.