Introduction to Digital Libraries


  • Edward A. Fox, Virginia Tech
Duration: Sunday, June 19, 9am - 5pm (full-day)

This tutorial is a thorough and deep introduction to the DL field, providing a firm foundation: covering key concepts and terminology, as well as services, systems, technologies, methods, standards, projects, issues, and practices. It introduces and builds upon a firm theoretical foundation (starting with the '5S' set of intuitive aspects: Streams, Structures, Spaces, Scenarios, Societies), giving careful definitions and explanations of all the key parts of a 'minimal digital library', and expanding from that basis to cover key DL issues. Illustrations come from a set of case studies. Attendees will be exposed to four Morgan & Claypool books that elaborate on 5S, published 2012-2014. Complementing the coverage of '5S' will be an overview of key aspects of the DELOS Reference Model and DL.org activities. Further, use of a Hadoop cluster supporting DLs will be described.

Introduction to the Digital Public Library of America API

  • Unmil P. Karadkar, The University of Texas at Austin
  • Audrey Altman, Digital Public Library of America
  • Mark Breedlove, Digital Public Library of America
Duration: Sunday, June 19, 1pm - 4pm (half-day)


The DPLA API is a Web-based RESTful API that provides programmatic access to over 11 million cultural heritage objects indexed in the DPLA. The API provides access to more metadata fields than those presented in the Web portal. API-based access thus enables a deeper exploration of the DPLA metadata for non-programmers as well as enables software developers to create innovative apps. This tutorial will introduce participants to the API and has the following objectives:

  • Understand and use RESTful APIs in general and the DPLA API in particular
  • Locate relevant information on the DPLA developer pages
  • Read the DPLA data model documentation
  • Retrieve DPLA metadata via a Web browser and a command-line interface
  • Manipulate saved DPLA data using OpenRefine
  • Retrieve DPLA data using a modern programming language such as Python, PHP, JavaScript, or Java.
  • Use Python wrappers such as dpla_utils and DPyLA – for Python programmers

The tutorial is designed for both programmers and non-programmers. We encourage you to bring your own computers and participate in the hands-on use of the DPLA API during the tutorial.

Information Extraction for Scholarly Digital Libraries


  • Kyle Williams, Pennsylvania State University
  • Jian Wu Pennsylvania State University
  • Zhaohui Wu, Pennsylvania State University
  • C. Lee Giles, Pennsylvania State University
Duration: Sunday, June 19, 9am - 12pm (half-day)

Scholarly documents contain many data entities, such as titles, authors, affiliations, figures, and tables. These entities can be used to enhance digital library services through enhanced metadata and enable the development of new services and tools for interacting with and exploring scholarly data. However, in a world of scholarly big data, extracting these entities in a scalable, efficient and accurate manner can be challenging. In this tutorial, we introduce the broad field of information extraction for scholarly digital libraries. Drawing on our experience in running the CiteSeerX digital library, which has performed information extraction on over 7 million academic documents, we argue for the need for automatic information extraction, describe different approaches for performing information extraction, present tools and datasets that are readily available, and describe best practices and areas of research interest.