Semantic Analysis of Documents from the HathiTrust Corpus
PI: Annike Hinze
This project developed a tool to assist scholars in identifying and selecting resources from within the HathiTrust Document Corpus for which access is currently available via text-based search in full-text and metadata. The conceptual design and initial implementation of a new framework affords the benefits of semantic search while minimizing the problems associated with applying existing semantic analysis at scale. The Capisco system was developed through the creation of a knowledge base and semantic document analysis. Scholars can use two tools for search and exploration: a semantic search feature and a workset explorer. This toolset is complemented by a synonym browser, concept browser, context browser, synonym adder, and a basket of knowledge. The latter two tools assist with further development of knowledge base concepts. Additionally, semantic links between concepts and and documents created via the Capisco system can be integrated into HathiTrust metadata records for further enrichment.
A copy of the project’s final report is available via IDEALS.