May 062014

The HathiTrust Research Center (HTRC) is pleased to announce the recipients of four Workset Creation for Scholarly Analysis (WCSA) prototyping project awards. These projects represent a range of approaches to developing new tools and techniques designed to assist researchers and scholars in 1) identifying and selecting resources from within the HathiTrust and 2) creating worksets of these resources for scholarly analysis.

Each project will receive $40,000 to develop a prototype over a nine-month period beginning in spring 2014, for a combined total of $160,000 in project funding. HTRC received 15 proposals in response to an RFP released in November, and eight finalists were invited to present projects at a shortlist meeting in February.

The following prototyping projects have been selected:

“Workset Creation through Image Analysis of Document Pages”, Texas A&M University (PI: Keith Biggers)

Biggers will work with Neal Audenaert and Natalie M. Houston to develop a software application that uses the visual characteristics of digitized printed pages to identify documents that contain three types of visually distinctive materials of interest to humanities researchers: poetry, music, and illustrations. This prototype  will demonstrate the value of using visual analysis of document images in conjunction with more traditional textual analysis to enable scholars to ask more refined questions about texts and their physical manifestations.

“Semantic Analysis of Documents from the HathiTrust Corpus”, Waikato University (PI: Annike Hinze)

Hinze’s team will develop a suite of tools that analyze documents by the semantics of their content and metadata. Clustering documents by semantic similarity will open up a wealth of opportunities for scholarly research.The project is designed in close collaboration with two humanities scholars from the areas of Maori & Pacific Studies, and Historical Anthropology, who not only drive this project with research questions based on their scholarly practice, but also provide ongoing input and feedback during the development process.

“Distributed Metadata Correction and Annotation”, Maryland Institute for Technology in the Humanities, University of Maryland. (PI: Trevor Muñoz)

Muñoz will collaborate with Peter Mallios and the Foreign Literatures in America (FLA) project team to develop a set of services and interfaces that will allow the FLA project (and other projects like it) to pull metadata records from the HathiTrust, correct and annotate these records using standardized vocabularies, gather corrections and annotations from other teams or scholars, and export enhanced metadata in formats suitable for publication as linked data.

“ElEPHãT: Early English Print in HathiTrust, a Linked Semantic Workset Prototype”, Oxford University (PI: Kevin Page)

Page will work with colleagues from the Bodleian Library to produce software that exposes the necessary metadata from individual collections for building aggregate worksets drawn from multiple sources. The prototype will build integrated worksets that combine resources from the HathiTrust and from the the Early English Books Online Text Creation Partnership (EEBO-TCP) collection, which focuses on high quality images and accurate transcriptions of items usually found in libraries’ special collections.

By awarding several prototyping projects from a variety of institutions through the WCSA project, HTRC intends to increase awareness of issues surrounding workset creation, uncover new techniques, and deliver prototypes that will enhance the value of the HathiTrust corpus. It will also foster interactions among the HTRC, developers, and researchers. “We’re excited to establish connections with new partners, and we hope the prototyping projects will lead to longer-term collaborations among participating institutions and the HTRC,”  said J. Stephen Downie, HTRC Co-Direct and WCSA PI.

WCSA is funded with a generous grant from the Andrew W. Mellon Foundation and directed by: WCSA PI and HTRC Co-Director J. Stephen Downie, Associate Dean for Research at the University of Illinois Graduate School of Library and Information Science; WCSA Co-PI and HTRC Co-Director Beth A. Plale, Professor, School of Informatics and Computing, Indiana University; and WCSA Co-PI Timothy W. Cole, Professor, University Library, University of Illinois at Urbana-Champaign. WCSA is administered in part by the Center for Informatics Research in Science and Scholarship at the University of Illinois. For more information please contact Megan Senseney.

Nov 262013

The HathiTrust Research Center is seeking proposals for prototyping projects to define and implement a tool or service that will help scholars better identify and select relevant resources at scale from the HathiTrust corpus and/or facilitate the construction of large-scale worksets useful for scholarly analyses.  Grants of $40,000 will be offered to each of four successful respondents to be conducted over a nine-month period beginning April 2014. Workset Creation for Scholarly Analysis: Prototyping Project (WCSA) is generously funded by the Andrew W. Mellon Foundation.

A complete copy of the RFP is available online at:

RFP Schedule:

RFP Available: 22 November 2013
Letters of Intent Due (preferred): 16 December 2013
Final Proposals Due: 13 January 2014
Shortlist Meeting Invitations Issued: 20 January 2014
Shortlist Meeting: 20 February 2014
Award Notification: No later than 15 March 2014

Program Description (see the full RFP for more detail):

The HathiTrust (HT) is a large digitized-text corpus (> 10 million volumes) of keen interest to researchers working in a wide range of scholarly disciplines. To tap the analytic potential of this large and diverse corpus, to tame it and make it useful to them, many researchers need the wherewithal to gather together, into a kind of personal digital carrel, cohesive and coherent subsets of HT texts (potentially tens or hundreds of thousands of volumes or parts of volumes) amenable to the in depth forms of analysis they want to do. The attributes on which they seek to collocate digitized texts are not always recorded in standard bibliographic descriptions.

The HTRC will collaborate with four independent sub-awardees in conducting individual prototyping projects to develop and validate the potential of specific algorithms, services and/or tools that can enable the creation of large and small scale worksets of digitized texts and parts of digitized texts for scholarly analysis in ways not currently feasible. We are seeking proposals from engaged teams of digital humanists, librarians and computer scientists. We anticipate that the proposals received will approach the problem in a variety of different and complementary ways. Proposed prototype experiments must respond to real scholar needs and requirements.

Respondents are urged to contact, in advance of proposal submission to discuss eligibility, project details, prerequisites, and HTRC support with a member of the project team. Prime award project PIs are:

J. Stephen Downie, Graduate School of Library and Information Science, University of Illinois
Tim Cole, University Library, University of Illinois
Beth Plale, Data to Insight Center, Indiana University

Nov 142013

The WCSA project team will be attending two upcoming conferences. J. Stephen Downie will be presenting a paper in the Theories and Methodologies in the Digital Humanities session at the Chicago Colloquium on Digital Humanities & Computer Science (December 5-9), and Timothy W. Cole will be presenting a project briefing at the 2013 CNI Fall Membership Meeting (December 9-10).

The WCSA RFP will be released prior to these two events.  If you’re interested in submitting a proposal, these events would be an excellent opportunity to meet with project PIs and discuss your ideas.