This guide was last revised 3 June 2009
The first step in the digital content life cycle is selecting content to be made digital. There is a common belief that everything worth keeping is worth digitising, but that may not always be desirable or even feasible. In some cases the technology may not be mature enough, in others the cost of digital creation and ongoing maintenance may be unaffordable. Whether you plan to create new material or digitise existing material, having a robust selection and prioritisation process will be key to a successful outcome.
Make it Digital has two detailed Selecting for Digitisation guides:
Once analogue content is digitised and separated from its original form, it loses its distinctiveness from digitally created content. As part of the digital content life cycle, digitised content can be used, manipulated or even lost just as readily as any other digital content.
Not all content being digitised needs to be carefully selected beforehand. For instance, where only a small number of items are involved it may be just as easy to digitise all of them without selecting the best fit. Some selection processes will only happen after the digital content is created, such as the selection of digital photographs for publishing, or the editing of a sound or video recording – in these cases the process used will likely be quite different from pre-digital selection.
Selection of content for digitisation becomes important where it has potential for more than a one-time use, and where multiple related items will form part of a managed digital resource or collection. Archives, libraries and museums in particular are creating these resources on a significant scale. Commercial services wanting to sell access to e-books and articles, out of print government information and other content may also need to be selective in choosing what they offer.
While there are no definitive rules or standards for selecting content for digitisation, we consider that it is possible to develop good practice around planning and decision-making. With this in mind, we developed a good practice framework to encourage selection processes that will make good use of available resources and produce digitised content that can be building blocks for future use. The five principles of the framework are:
The latest version of the framework is available for download from our Selection resources section.
Digital content presents opportunities for access, discovery and use that once were only available to the most dedicated researchers and creators. Digital content has very few constraints around physical location or physical copy, and in most formats can be readily edited and repurposed to match the requirements of the user. This has huge implications for the way we need to manage and organise information in the future.
The ease of discovery and use of digital content puts analogue content at a disadvantage – despite the vast majority of information and cultural knowledge still being held in non-digital formats. There is a spectrum of responses to this situation – at one end, making non-digital content more visible to potential users, in the middle, indexing and describing the content and making that information available digitally, and at the other end, digitising content so that it too can be accessed and discovered digitally. Outside this spectrum may also be content that is inappropriate or unnecessary to digitise or even make more visible.
Getting comfortable with the notion that not everything will be digitised in the foreseeable future is a useful part of the digitisation selection process, and will make the scale of the task less overwhelming. Learning to manage digitised and analogue material alongside each other is also a typical consequence of digitisation activity.
Digitisation is a popular term that can convey a wide range of meanings and expectations. There is no agreed technical definition that can be used across all situations, as there are many types of digitisation activities that have little in common with each other. Caution is needed in using the term for anything other than a very broad category of activity.
Our preferred approach for Make it Digital is to define digitisation as:
Digital content creation by making a digital copy or digital recording of analogue information, where that information can reside in a document, artefact, sound, performance, geographical feature or natural phenomena.
Digital content creation includes data-entry and transcription, digital imaging, photography, sound and video recording and transfer – in fact any analogue-to-digital transfer. It excludes transcoding or migration of digital information into a different digital format or media (digital-to-digital transfer), software manipulation or programmed machine creation of new digital information (born-digital information), and analogue output of digital information such as printing or audiovisual playback (digital-to-analogue transfer).
A focus solely on digitisation activity can understate the importance of the whole digital lifecycle of information. Creators are likely to experience a particular piece of information as one object, for instance taking a photograph, transferring it to a computer, and then printing out a copy. In doing so they may potentially underestimate changes that can occur at each point, such as an irreversible loss of detail through software processing. With purely analogue content such as documents, film or photographs, loss of information is often visible early on to the naked eye. In contrast, loss of information for digital content may not be noticeable until much later when the content is accessed or moved to a different format e.g. from computer screen to print.
Analogue content is almost always fused with or defined by its analogue carrier, frequently resulting in an imperfect copy being made, regardless of the technology used. This means that unless care is taken during the copying process, a digitised copy can be of much poorer quality than the original, missing or even destroying important detail in the process of copying. Marginalia, film batch numbers, storage boxes etc can all be cues for dating or organising content, and a easily missed or disposed of in a digitisation process. It is shortsighted to think that digitising will automatically capture all of the value of the original being copied.
One advantage of digital content is the ease at which it can be copied perfectly from one carrier to another. That quality has enabled all kinds of technologies from email and the web to data warehouses and backup systems. However, real world limitations in software and format choices often means a less than perfect copy of digital content is made, such as when ‘lossy’ compression formats are used or the content is opened or re-saved in different software. Perfect copies can also be permanently corrupted by causes as random as a cosmic ray hitting the storage media.
Unlike analogue content, reading, watching or listening to any digital content made available over a computer network or the internet requires copying for delivery. This can trigger issues of copyright and licencing, even where the use is inconsequential or no one stands to lose commercial opportunity. There are cases where the copyrights and royalties are so complex that material has been bypassed for digitisation.
The growth of the digital environment has increased expectations of cultural institutions to store, preserve and make accessible born-digital information, which requires balancing with the needs of traditional materials. Digitisation is also increasing demand for access to the original item, as awareness of what is held in a collection increases. This can place greater pressure on, and risk to, the original than may have existed before digitisation.
Overall, while digital technologies present many opportunities to increase access to documentary heritage or out of circulation resources, they also present many challenges in relation to digital preservation.
In the 1990s the term ‘digital dark age’ was used to describe the prospect of a significant volume of digital data being lost due to predominantly proprietary, non-interoperable software formats, poor storage strategies and lack of standardised practices for digital archiving. Add to this a large volume of digitised records where originals were not being retained, and the consequences were quite daunting.
Digital technology is not only speeding up the creation of obsolescence-prone digital information, it is potentially speeding up the loss of original heritage materials where digitising is not properly tested to be the best strategy for preservation. UNESCO’s Memory of the World programme has a particular focus on safeguarding information in both non-digital and digital form. Its philosophy is that information and its systematic retrieval is the basis of the memory of civilisations, meaning protecting and keeping that information accessible as documentary heritage is a significant task.
Today’s issues of digital preservation are much greater for digital content that has never had a non-digital form, simply because the volume currently being created is much greater. However reformatting through digitisation can be the only viable strategy for preservation of analogue content – the short life expectancy of magnetic audio and video tape is a prime example, where it is electronic, rather than digital data that has the greater threat of loss. In contrast, digitisation of text and photographs that have relatively long life expectancy often has accessibility as a key focus. As digitisation technology and techniques improve, the quality of digital copies of some materials is reaching the point where the copies serve as acceptable ‘back ups’ or surrogates for original items, particularly where they are rare or fragile.
The emerging consensus is that a ‘digital dark age’ can be avoided through using open formats and standards, developing good content management strategies, and having well documented and standardised practices for archiving and long term preservation.
Mass digitisation initiatives such as Google Books have been controversial due to issues of quality, cost and focus. Supporters like the fact that Google is enabling large volumes of out of print content from University libraries to be digitised and made available on a huge scale, allowing users to search millions of results that would not otherwise be visible. Google’s commercial model has however been criticised for creating digital copies that are exclusively discoverable through their search engine, scanning whole texts without a copyright licence, and encouraging poor quality texts to be given prominence over better quality, but in print, materials.
While other mass-digitisation models exist (such as the Open Content Alliance, which focuses solely on out of copyright texts), the mass-digitisation model for scanning printed texts does not easily translate to other media such as archival papers, images, audio and film. While it has been argued that the presence of industrial-scale copying and storage technologies will allow this to occur, in many cases the technology standards and rights issues are more complex and the task significantly more expensive to complete.
Selective large-scale approaches that focus on digital collection development, user-benefit and a balanced representation of print and non-print based content will likely be more useful and valuable in the long-term, and will help drive standards development for a variety of formats and practices.
Not all opportunities for increasing access, discovery and use of non-digital content need lead to digitisation. The value of non-digital content can be enhanced by including references and source locations in digital resources, or content can be sampled rather than digitised as a whole (e.g. taking highlights from raw footage recorded for a documentary). Alternatively, digital technology can be used to improve the description and organisation of such content and make it more visible to potential users (e.g. publishing annotated catalogues of non-digital content).
These kinds of strategies can assist prioritising the kinds of content that will benefit most from digitisation. The important aspect is to make an assessment, as good selection involves making choices and using informed judgement. Attempting to select what should be digitised based only on technical or arbitrary criteria, such as format, record order, or the ease of task, is likely to lead to poor results and an under-utilised digital collection.
Digitisation debates and policies often pose access and preservation as two competing priorities in tension with each other, or place preservation as a sub-set of access. By instead viewing access and preservation strategies as ways to address different points on a continuum of time, digitisation can be seen as a technology solution that can be applied according to the situation.
Access strategies are generally focused on current user needs and available distribution mechanisms. Where digitisation is involved, access strategies tend to favour catalogue-style browsing or searching of low-resolution or text based content, sampling for inclusion in other digital resources or displays, and digitisation-on-demand. Large-scale activities such as digitising whole collections or whole genres of sound or film are often desired but less commonly resourced.
In contrast, preservation strategies focus on maintaining long-term access to the objects being preserved or to the information they contain. Preservation usually involves making a copy of original material to ‘back up’ or to reduce wear of a non-digital item in addition to other efforts aimed at conserving the original for as long as possible. On some occasions it may focus on last-chance migration of content from deteriorating carriers.
Considerations of risk to the original, rarity of the original, access costs involved in viewing, and demand for use are constraining factors for physical collections and frequently lead to the creation of preservation and access copies. These surrogate copies are increasingly being made through digitisation, which has become a core part of modern archival practice alongside microfilming.
Much of the content of interest to today’s users is still under copyright, which can make large-scale provision of digitised content difficult. Until recently common practice has been to copy material for access without doing any work to establish copyright status. Good practice is to identify the copyright status and embed that information into the content's metadata. Not undertaking this work will either encourage unlicensed use of the content or will greatly limit the usefulness of the content into the future.
Discovery, reformatting for different devices or applications, and being able to re-use, re-mix or share the content legally are features highly valued by digitally literate users. Whether focused on access or preservation, these long-term usage trends need to be factored into any decision to digitise content.
A written selection policy provides an opportunity to describe what the drivers and purposes of a digitisation programme are, and ensures some basic thinking and planning has to be undertaken before embarking.
Knowing the purpose, audience and undertaking related research may be a useful requirement of a selection policy.
Digitisation creates a copy of content in a new format, and chances are the copy will be made in order to:
Researching the purpose and audience for digitised content helps the selection process by ensuring that the type of digitisation proposed is appropriate, and potential users demonstrate an interest or need for the content. Undertaking this before selection will greatly enhance the likelihood of a successful project.
Access and preservation are both considerations for a selection policy. Simply digitising content is not enough to make it accessible or usable, particularly if the content is in the wrong format or is missing information or context vital to potential users. If preservation is the aim, consideration needs to be given to how well the original is being looked after, and a whether a digital copy is the best way of providing protection.
If you are digitising on behalf of an organisation, consider how digitised content fits with the goals or services the organisation provides. For instance, if your organisation only delivers services locally, how will digitised content enhance the experience for local users or members, and is the content available elsewhere in digital form?
Matching content with goals and services will help ensure both the content and the services complement rather than compete with each other. Identifying a theme or specific need expressed by users may also provide a useful starting point for a digitisation programme.
Working out the skills, resources and planning needed for managing the whole digital content life cycle is essential. Failure to have robust back-up processes or proper training in equipment may lead to complete loss of the digitised content. Any content proposed for digitisation should have an identified life cycle management strategy, including a budget for on-going content management costs.
Having a basic understanding of how copyright law applies to your content is essential before content is digitised and made available. Confirming that you have the right to place a copy of something on your website, and knowing who originally created the content you are copying, will help ensure only appropriate content makes it through to digitisation. It can be an expensive mistake to copy a work and then discover that no one has the rights to use it. Alternatively, if your organisation has copyright (for example a collection received by way of bequest often transfers it), you need to know how it will be licensed.
Beyond copyright are moral rights and privacy rights of the creators and the subjects of the content, particularly where they are still living. Was the content originally expected to be available to the public? Issues of cultural or historic sensitivity also play a role – some materials may not be appropriate to copy due to their changed meaning or the way the original was acquired.
Having a clear rights policy, including how breaches will be dealt with, will help ensure only appropriate material gets selected for digitisation
The type, nature and scope of materials and techniques for digitisation are so varied that it can be difficult to apply a consistent approach to selecting content for digitisation. Having a clear assessment process can allow a digitised collection to be developed over a period of years or can incrementally improve preservation or access for physical collections. Developing or using checklists or decision trees can make this task somewhat easier to achieve, while creating a record of decisions for future reference. Make it Digital has developed a scorecard for selection and prioritisation of content that may assist with this process.
All digitisation activity from digitisation on demand through to large-scale initiatives requires some prior planning of the policies, expertise and tools required to successfully complete the work.
We recommend good practice for digitisation planning based on these principles:
If not carefully planned, digitisation can result in unintended damage to or even destruction of the original. Copies may not suit an identified purpose, or equipment purchased may not be appropriate for the task. Conversely, a well-planned digitised collection may be useful for multiple purposes and easily migrated into different hardware and software environments. Decisions you make now in planning for digitisation may have an impact for years to come.