A guide to digitization
Digitization is ubiquitous in various institutions, such as libraries, archives, and museums. “Digital resources for the humanities begin with a period of digitisation, as digital representations of cultural and historical documents, artefacts, and images are created, generally for online delivery to users.” (Melissa Terras, Digitisation and Digital Resources in the Humanities).
Prior to the era of the World Wide Web and web 2.0, digitization began to transform research in the Humanities. Digitization converts traditional forms of information storage, such as paper, photographs, objects, audio and video into binary codes – zeros and ones – for storage on computers. This conversion facilitates access to digitized primary sources, to the data that represent them and, therefore, to the use of computational methodologies to analyze them. The task of interpretation is and will continue to be, the authority of humans.
Different types of items are easier to digitize than some. Photographs, documents, or other paper- based items are digitized by scanning and that digital version saved in a digital format. Holding a digital version of a photograph allows several digital uses. A few paper-based items are too fragile to withstand the scanning process or the font is hard to read. These are treated as objects and photographed instead, or hand typed. Optical Character Recognition (OCR) software can also be used to convert an image of text into a searchable string, but the success of this is dependent on the quality of the image available and the nature of the document in question: clearly printed black and white text can be easily OCRd.
3D objects are digitized when they are photographed using a digital camera. The original object can then be viewed in digital format without having to be handled. Professors Hitchcock and Shoemaker attest in Digitising History From Below: The Old Bailey Proceedings Online, 1674-1834 “Because the original typeface is so irregular, it was not possible to use an optical character reader, and the text had to be ‘double rekeyed’: that is, typed in twice and the results electronically compared in order to identify errors.”
While the barrier to entry is not negligible because of the technical infrastructure and training, digitization is time-consuming, with the order of magnitude tied to the complexity of the object. Despite the cost, working with digitized representations transforms how we understand different kinds of items, and/or our ability to use them for different purposes. Different types of items call for different types of highlighting issues of cost, purpose, longevity, use, and value.
(Kenney and Rieger 2000, p. 1) endorse digitizing as it offers “opportunities to provide and enhance digital resources “for learning, teaching, research, scholarship, documentation, and public accountability”. Digital versions allow one to evaluate and creatively use the digital resources to discover new ideas and make innovations. This also creates a new channel for raising revenues and raises awareness of the holdings of libraries and archives.
Digital files permits access to metadata of objects to analyze them. The concept of metadata, which is well known in the information science world, is central to the digital humanities. It is data about data, that is, the cultural object has a series of descriptors that provide extended and contextualized information about the object. For example, its date and place of production and the author of the object – if known -, to name a few basic examples. Today we can add, analyze and visualize the metadata of those repositories of digital cultural objects that allow data mining.
These new forms of access and organization of the objects of human culture can be useful to formulate new research questions about the past or to investigate old questions with greater access to information.
Paul Conway in his paper ‘Building Meaning from Digital Photographs” points out it is also important that “[d]ecisions have to be made about spatial resolution, tone reproduction, and color space before images are digitized. In most cases, it will not be the goal to reproduce the physical properties of the original, but to reproduce its appearance under certain viewing conditions.”(Conway p22)
The Frey/Reilly model tries to codify the relationship between photograph digitization processes and the appearance of the digital product. Depending upon the type, condition, and perceived values of the originals, the Frey/Reilly model outlines four manipulation considerations. The four major steps include the identification and selection of photographs, digitization, indexing at the item level, and website development to support online searching, browsing, display, and other functions. Each of these steps is complex and time consuming. Each of these steps involves a significant variety of decision making – processes that are increasingly well described in digitization guidelines.
Since 2004, at least four important guidelines on the digitization of photographic materials have been published in the United States. The four digitization guidelines under review here provide guidance on a set of seven decisions that are required to establish the visual properties of the representation.