Preserving Digital Content


This page was last revised 8 March 2010

Digital content relies exclusively on physical media that can decay or fail rapidly without visible signs of damage. Digital technologies also become obsolete within relatively short periods of time. Back up strategies are a vital first step to avoiding short-term content loss. In the long term, storage for preservation involves planning for archival copies of your content to be migrated and contingencies for transfer to a new owner should your organisation or service face closure. This is where your choices of appropriate formats, descriptions, collections policy and rights statements really come into their own.

Make it Digital has one detailed Preserving Digital Content guide:

  1. Preservation Resources

Managing   Preserving

Understanding digital preservation

Access, use and reuse on a regular basis depend on the reliable maintenance of digital content. As a component of digital content management, digital preservation aims to keep digital content usable in the long term. Digital preservation can protect the equivalent of a digital 'original' while ensuring that older content no longer in active use stays available for later access. In order to do this, additional steps are needed beyond good practice digital content management.

Preservation activities have been practised in the library, archive and museum professions for decades, with varying degrees of success. It has rarely ever been possible to preserve every record or artefact for a particular era or field, making it necessary to design preservation systems knowing that some material will be lost over time. What is put in place will be a trade-off between resource constraints and minimising the risk of loss for items of value.

Managing trade-offs is particularly important for digital content. Unlike non-digital items where loss can be visible, gradual and partial, digital content and the physical media it is stored on can very quickly be lost completely without any visible signs of decay to indicate a problem. The volume of digital content being created today is also incredibly large, making it potentially difficult to assess and prioritise. This creates a challenge for digital preservation practice, particularly in conserving materials for long enough prior to copying or migration.

Digital preservation in this context differs from digitising for preservation. Digitising for preservation involves creating a digital copy of a non-digital item, such as an audio tape, intended to replace or stand in for the original when it is no longer usable. Digitising is just one of many strategies available for preservation of non-digital material. Digital preservation on the other hand is about keeping digital items - including digitised copies - usable in the long term, and generally requires a distinct set of strategies.

Strategies for digital preservation

There are a variety of published strategies and guides freely available on digital preservation. Depending on the source, they may emphasise different dimensions of preservation. Some aim to preserve content as part of a series of related activities and processes over time. Others propose separating items for preservation into a discrete realm of activities and processes. Strategies may focus on preserving the bitstream of digital content - the structure of bits and bytes - as a representation of the original, or they may emphasise the need to preserve the information and content in a usable form.

If you are developing a preservation strategy it is important to consider how to align it with your organisation's culture and current practice, and how to determine which long term objectives are more important. For example, if you specialise in archiving music CDs, it may be very important to preserve the original pressed CD, cover artwork and liner notes as long as possible, even with the risk of CDs becoming obsolete. But if your speciality is archiving oral histories or speeches, it may be of only minor importance that the physical media and format of a recording is an audio CD. Similarly, you may want to manage archival and current content in the same way if your users need to access contemporary and historical items at the same time.

Strategies and guides can greatly help with deciding what you need to do. Look out for a preservation strategy or guidance that follows good practice by addressing the following areas:

  • the maintenance of descriptions of digital content, particularly relating to creation information and change history
  • backing up of digital content through duplication onto separate storage media
  • refreshing of storage media by periodic copying of digital content to new media of the same type
  • the migration of digital content to new hardware, software and storage environments to avoid obsolescence
  • the maintenance of access by making access copies or through emulation of software and hardware environments
  • organisational continuity through provisions for ownership of digital content in the event of organisational change or transfer of responsibilities

Strategies such as migration and emulation may require specialised resources and staff that are not available to smaller organisations. In such cases it may be appropriate to to focus on descriptions, backup and refreshing as primary strategies until items are able to be transferred to a larger archive or specialist resources are found.

Minimising the risk of loss is central to any preservation strategy. Preventing physical loss requires responses around storage such as security and temperature control. Preventing informational loss for digital content is a lot harder, as preservation strategies when poorly executed can be the cause of loss (e.g. reformatting using lossy compression). It is important to identify early on for instance whether a digital record also has an analogue form, or whether source versions or masters are held or owned by someone else. Understanding which digital content is irreplaceable and which is a mere copy is key to managing risk and preventing loss. A basic knowledge of records or collection management, as covered in our Managing Digital Content guide, will greatly assist.

Following the basics

Any organisation or person responsible for keeping digital content for any reasonable length of time needs to follow the basic steps involved in preventing information loss. Even without access to expensive archival software or repositories there are three practices to always follow where ever digital content is kept.

1. Make regular back-up copies

Despite the claims of some manufacturers, no digital storage media currently exists that can be considered to be safe for long-term storage. All storage media such as hard disk drives or optical disks are prone to failure or corruption over time. In the short-term the way to manage this risk is through back-up copies. A simple to remember guide for back-up is the one used by the American Society of Media Photographers (external link), known as the 3-2-1 backup rule:

  • keep 3 copies of any important file, the primary (or master) file and two backups
  • keep 2 of the copies on different media types, or at least use physically separate media and brands
  • keep 1 other backup copy stored off-site

New back-up copies should be made whenever significant changes or additions are made to your content. As a guide, a significant change is likely to be one where changes or additions are not easily re-created. Having a regular back-up schedule or routine is the simplest way to minimise disruption and information loss caused by storage failure.

2. Use file integrity checking for archived copies

If you have back-up copies of digital content that does not change over time, such as archived files, it is important to be able to verify the integrity and completeness of both the master file and the copies. The use of a file integrity check such as generating MD5 checksums is a good way of gaining a level of assurance about the integrity of your content. MD5 checksums generate the equivalent of a digital fingerprint that you can match between copies or over different time periods to check that a digital file has not be modified or corrupted. There are many software programs available cheaply or freely such as FastSum (external link) that will generate a checksum code for each file or folder and which can be saved for later checking.

If you are copying to optical media using a CD or DVD burner, in addition to a checksum, always finalise the disk and verify the data you have created. Avoid using re-writable CDs and DVDs as they are significantly less reliable than CD and DVD ROMs.

3. Have a workflow for managing your content

Back-up strategies can fail if your digital workflow does not address the different stages of the digital content lifecycle. Use the good practice principles identified in our Managing Digital Content guide to establish a process for managing your digital content from the point of creation through to digital archiving. Good inventories, filenaming schemes, and having someone responsible for entering information and undertaking back-ups all minimise the risk of unintended loss of your digital content.

Refreshing and migration strategies

Storage media generally have a rated life that may be many times longer than what you can realistically expect to experience. Hard disk drives may be warranted to last between 12 months and 5 years under normal operation, but some fail very quickly, with the cost being permanent loss of information. Optical disks like CDs and DVDs can easily lose information due to improper handling and storage, even if they are rated for 10, 20 or 100 years. Those created by standard computer burners also have a shorter life than commercially made disks, as they are more prone to oxidisation or dye fading. Solid State Drives (SSDs), while not having moving parts, are prone to wear and like hard disk drives may lose data through cosmic radiation.

Compact Discs

CDs and DVDs easily lose information when improperly handled or stored

The only current solution to avoiding data corruption caused by storage media is to periodically refresh your media. This means copying your digital content to fresh storage media which is generally of the same nature as the previous media e.g. from one hard disk drive to another. The presence of errors arising from regular file integrity checking on your archived content provides an indicator that media refreshing is required.

Over time a bigger issue than media failure is technology obsolescence. Software and hardware systems continue to rapidly change and evolve, making it harder over time to keep using the same media. This may require periodic migration to a new storage type and software environment. The migration from analogue tape to digital tape or hard disk storage is one of the more significant ones in recent years, as media stored on magnetic tape from the 1950s through to the 1990s has suffered from both obsolescence and physical decay.

A migration strategy is something to carefully consider and plan beforehand as issues may arise around the need to:

  • format-shift content to be readable on new software
  • develop software emulation or 'virtual machines' to run older software, or
  • preserve obsolete hardware in order to keep older content accessible

The best way to manage the issues you are likely to face is to seek advice from professionals or experts in content and systems migration. Attention to using open standards and formats for content creation along with widely used storage media types will also help limit your exposure to obsolescence.

Digital continuity and contingency planning

If digital content you collect is important or worthwhile keeping, chances are that it will need to be kept for longer than the careers of most individuals in your organisation. Thinking and planning ahead to what future staff or volunteers need to know about your digital content will help ensure it remains usable and accessible for as long as it is needed.

Things to consider include:

  • having written policies and workflow processes for managing your digital content, including processes for back-ups, media refreshing, migration and disaster recovery
  • identifying long-term solutions for securing storing digital content, including possible provisions for transfer to external repositories for archiving
  • making provisions for what will happen in the event of an organisation wind-up, including ensuring continued ownership of the content

Human error, poor record keeping and accidental loss are significant contributors to loss of content or information, and are as likely to occur as storage media corruption or technology obsolescence. Continuity planning and proper management of your digital content as an organisation play a key role in digital preservation.

Trusted digital repositories

It seems inevitable that collections of the future will be increasingly digital. Blogs, websites, digital publishing, photographs and video are all born-digital content that may have no analogue original or copy to fall back on. This makes the task of preserving digital content in the long-term a serious challenge for cultural institutions and research bodies. A key solution to this challenge is to develop and build trusted digital repositories that can hold indefinitely a diverse range of cultural and research knowledge in an open, accessible formats. The OCLC (external link) has described the characteristics of trusted digital repositories as needing:

  • compliance with the Reference Model for an Open Archival Information System (OAIS)
  • administrative responsibility
  • organisational viability
  • financial sustainability
  • technological and procedural suitability
  • system security
  • procedural accountability

If your digital collection continues to grow, a possible outcome is that your organisation may need to consider building its own repository. By good planning you can be aware of the kinds of capacity you may need to develop ahead of time. However, not every collecting organisation is likely to be capable of building its own repository with the kind of characteristics OCLC recommends. It may instead be useful to develop relationships with regional, national or sector-based digital repositories, and to consider whether your long term archival content is able to be managed by them.