CROWDSOURCED DIGITIZATION

2.
WHAT CAN YOU DO WITH CROWDSOURCED DIGITIZATION

Crowdsourcing is enabling museums, libraries, archives, and the humanities a tool for digitizing or computing vast amounts of data, to implement approaches to tap into the wisdom of crowds to help solve a diverse array of problems. The term ‘wisdom of crowds’ refers to the insights and knowledge of many people often working in very different fields. Examples include Wikipedia for reference encyclopedia on almost any topic, and Trove which gives users free access to digital records of content from Australian cultural and research institutions. However, each explores possibilities of partnering with non-academic media bodies differently.

Other terms for crowdsourcing include ‘crowd’ participants, such as ‘community-sourcing’, ‘targeted crowdsourcing’, or ‘micro-volunteering’, which aids in clearing up the misconception that crowd is neither huge nor unidentified. These terms additionally reflect the fact that while some cultural heritage crowdsourcing projects are stirred by a longing for greater public engagement, the more specific the skills, knowledge or equipment required, the more you have to be selective as individuals unable to acquire the necessary skillsets dropout from the pool of potential participants.

Crowdsourcing projects engage with various communities of users and ask different questions. Wide adoption creates a few benefits including, problem identification, idea creation, idea discussions and development, problem-solving, and integration of dispersed knowledge. “Projects range from crowd-curated photography and art exhibitions to collecting objects at in-person ‘roadshow’ events.”  A variety of disciplines from historians interested in scholarly editions of archival documents, to technologist- and collections-led public engagement and data enhancement projects in museums.

Museums, libraries, and archives take different approaches for overseeing physical collections and the information available, same with their public access and engagement platforms. Also, project direction can change if new tools are developed to make the digitization process more efficient, e.g., Bentham Project unbeknownst ended up incorporating handwriting recognition technology when the technology matured.

Crowdsourcing in cultural heritage benefits from its ability to market their efforts as part of the ‘greater good’ when trying to coax participation. This may explain why projects generally follow collaborative and cooperative, rather than competitive, models. However, friction occurs when the typical role of expertise is disrupted with people not seen as professional.

Historical challenges with crowdsourcing have revolved around quality control, privacy, intellectual property concerns, and resource costs. Processes for crowdsourcing innovation involve a series of steps and gates to present a problem and incentives to a large community, usually in the form of a challenge or contest. The challenge owner then works with that community, receiving inputs and ideas that ultimately are filtered down to one or a handful of winners.

Cultural heritage crowdsourcing projects ask the public to undertake tasks that cannot be done automatically, in an environment where the activities and/or goals give intrinsic rewards for participation and where their involvement contributes to a shared goal. Institutions require the proper blend of source material and responsibilities for their ‘crowd’ – normally, a combination of occasional participants and dedicated ‘super contributors’. Building a fruitful crowdsourcing project necessitates an understanding of the motivations for initial and on-going participation, the characteristics of duties suited to crowdsourcing and the application of best practices.

Technologies deliver almost instantaneous data gathering and feedback, computationally validate contributions and the ability to reach both broad and niche groups through loose networks. The ability to track data provenance computationally and verify remediated primary sources is particularly important for scholarly projects. Digitization has also helped manage the limitations of physical space, conservation, location and opening hours that previously affected access to collections.

There are also validation concerns with contributed work.  Transcribe Bentham questioned the editorial quality of volunteer-produced transcripts. The tasks performed by participants involve transforming content from one format to another, for example, transcribing text, describing artefacts, synthesizing new knowledge, or producing creative artefacts such as photography. Semantic context is required for structured text search-for example, searches for specific entities like people, places or events within large datasets – and can be supported through ‘structured transcription’, in which metadata that describe the entity through emergent or externally defined concepts are recorded alongside the transcribed text.

The inherent variability of materials in cultural heritage collections means that the same class of task -whether transcribing handwriting, tagging a painting or georeferencing a map could be quick and uncomplicated or could require tricky subjective judgement to accomplish.

Creating interfaces that are both productive and engaging enhance a user’s experience. Improvements to the Scripto interface designed to help transcribers work with documents more effectively is a good example.

There is potential in archive-based participatory digitization projects embedded in the work researchers are already performing, such as the Papers of the War Department. The discussion of Transcribe Bentham hints at future challenges ahead: advances in machine learning and computational capability to deal with responsibilities that were previously completed by people – such as transcribing handwriting, OCR correction, describing images and discerning patterns – might render these activities less meaningful as crowdsourced tasks.

Leave a comment

Your email address will not be published. Required fields are marked *