The HathiTrust Research Center (HTRC) is the research branch of the HathiTrust, a repository of over 10 million volumes (3 billion pages) of text. HTRC offers a suite of tools and services, which enable computational access to the HathiTrust corpus. From digitized library collections in HT, scholars select subsets for computational analysis according to their particular research objectives. We refer to these subsets and associated external data sources as “worksets”. Worksets are a type of machine-actionable, referential collection. User requirements for workset creation grow increasingly sophisticated and complex as humanities scholarship becomes more interdisciplinary and more digital over time.

HTRC holds transformative promise for humanities scholarship: enabling scholars to sift through a massive corpus to construct precise worksets required for investigation. How scholars use collections and worksets remains a central research problem in this initiative. Workset Creation for Scholarly Analysis: Prototyping Project (WCSA) is a two-year effort, funded by the Andrew W. Mellon Foundation, which aims to engage scholars in designing tools for exploration, location, and analytic grouping of materials so they can routinely conduct computational scholarship at scale, based on meaningful worksets.

The three major goals of the WCSA project are to:

  1. Enrich the metadata in the HT corpus,
  2. Augment string-based metadata with URIs to leverage discovery and sharing through external services, and
  3. Formalize the notion of collections and worksets in the context of the HTRC.

In November 2013, the HTRC will release an open, competitive Request for Proposals with the intent to fund four prototyping projects that will build tools for enriching and augmenting metadata for the HathiTrust corpus. Concurrently, the HTRC will work closely with the Center for Informatics Research in Science and Scholarship to develop and instantiate a set of formal data models that will be used to capture and integrate the outputs of the funded prototyping projects with the larger HathiTrust corpus.

