Describing Digital Content


This guide was last revised 4 June 2009

If you want your digital content to be stored, found and used over time, it needs to have good file naming and associated metadata that describes what the content is, where it came from, and who can use it. File naming conventions are essential to good workflow and organisation, while structured metadata that follows open standards is central to usability and interoperability.

Make it Digital has one detailed Describing Digital Content guide:

  1. Metadata Resources

Describing 500w

Getting started with metadata

The digital content life cycle identifies describing your content as the next stage after content creation. While adding metadata at this point is a critical step, metadata can be generated at any stage of the life cycle, and for different purposes. At the creation stage, for example, an image file from a digital camera can automatically generate information about its resolution, compression, and date of capture. New metadata might also be added later in the life cycle through user tagging on the web. If managed well, the metadata created across the whole life cycle adds significant value to the digital content, and help ensure it is efficiently managed, discovered, shared and reused over a very long time.

What is metadata?

Metadata is any information that describes digital content. It can describe the attributes and characteristics of digital content in structured and standard ways, or in less structured ways through the use of general descriptions and tagging.

Although Make it Digital focuses on the use of metadata for digital content, metadata is also used to describe physical objects such as books and magazines and entities like people, organisations, geographic places, events, museum objects and corporate records.

Metadata can usefully describe any kind of digital content such as an image, video file, audio file, or text. It can also describe all sorts of things about the digital content, for instance the person or organisation that created the content, the date it was created, its length (e.g. “duration: 3:27 at 15 fps”), and technical details such as who entered the metadata and its processing history.

Metadata is not just applied to an individual object such as an image, video or document. In most digital content management systems, metadata is also applied to a group of similar and related objects (e.g. a set of diaries and memorabilia of a person), and also at the level of a collection of items (e.g. the painting collection of a museum or a fonds of archival records).

Almost all structured metadata used for digital content has been designed to follow particular standard formulas, or schemes.

Standards play a significant role in formulating and structuring the way that good metadata is documented. All commonly used metadata schemes follow open standards.

Benefits of structured metadata

When metadata is added to digital content, groupings and collections in a standardised and consistent way, it can be managed and organised so users are better able to discover, share and use that content. Metadata:

  • describes digital content and its relationship to similar content
  • enables sharing and reuse of content
  • enables management of digital content
  • serves as a record of ownership
  • makes it possible and exchange digital content
  • reduces duplication of effort
  • is an asset to an organisation.

To gain maximum benefit from metadata, it helps to consider the use and purpose for which it is intended and plan for this accordingly. Choosing metadata standards that are fit for purpose and in common use will be economic, reduce risk and will help protect the future value of your descriptions and content.

An example of an image with structured metadata

Image Metadata

Planning for Metadata Use

Planning for use of metadata is an important activity and there are many and varied aspects to consider in the planning process. Some of the more significant ones to consider are:

  • Does the organisation require a metadata plan for the entire organisation?
  • Could various parts of the organisations have different requirements?
  • Are there areas where requirements overlap?
  • What happens now and could it be improved?
  • Is there a need for different metadata for different kinds of content, services and activities?
  • Where will metadata be stored - with the content and/ or separately?
  • How will metadata be created – automatically, manually?
  • Who and how will the metadata be maintained?
  • Which community will the metadata be shared with?
  • Lastly, selecting which metadata standards to use is one of the most important planning activities you can undertake.

Metadata for user communities

Metadata schemes

Metadata is most often pre-packaged and ready to use by professional subject communities and sectors in what is known as a metadata scheme.

A metadata scheme provides a standard and consistent way to create, manage and share metadata. A scheme is generally made up of:

  • a set of specifications, which can contain information about the purpose for which the scheme is intended
  • its maintenance agency (the organisation responsible for it)
  • the names of the metadata elements (also known as labels) with their meaning (semantics) and ways the elements can be used.
  • recommended values for the elements themselves, such as thesauri use and encoding schemes.
  • an abstract or entity-relationship model illustrating a high-level purpose or view of the scheme.

Communities generally choose a scheme which best suits their purpos

Examples of metadata schemes

 

  • Dublin Core Metadata Initiative element set, version 1.1 - ISO 15836, (DCMES) identifies a standard set of metadata terms - now known as properties - that can be used to describe digital content. Some of the Dublin Core properties include: title, creator, date, subject and description. The Dublin Core scheme is a cross-domain scheme, which means that it can be used as a core scheme to map other metadata schemes to.
  • Friend of a Friend (FOAF), a project developing the use of metadata for machine-readable Web homepages for people, groups, and organisations. http://www.foaf-project.org/
  • Online Information Exchange (ONIX) which publishers can use to distribute metadata about their books to book dealers, other publishers, and anyone else involved in the sale of books. http://www.bisg.org/documents/onix.html

Metadata profiles

Metadata schemes can be created for an entire domain or subject community and a metadata profile can be created based on that scheme for a specific purpose within that community.

A metadata profile further refines and interprets a metadata scheme.

Example of a Metadata Profile

 

 The United States Federal Geographic Data Committee (FGDC) has created a metadata standard for digital geospatial metadata. This metadata scheme has been developed for the entire geospatial sector. It is known as the Content Standard for Digital Geospatial Metadata.

 

Two metadata profiles have been developed for sectors within the geospatial domain :

Application Profiles

Application profiles allow for the mixing and matching of metadata schemes. A particular metadata scheme may immediately suit a metadata implementer, but on occasion elements, vocabularies and terms from another metadata scheme may need to be used. By developing an application profile, a metadata implementer can create metadata for their unique purpose and use.

Example of an Application Profile  

 

The Food and Agriculture Organisation have developed an application profile known as the AGRIS Application Profile for the International Information System on Agricultural Sciences and Technology.

 

It uses metadata terms from the following metadata scheme:

  • Dublin Core Elements and Qualifiers
  • Agricultural Metadata Element Set,
  • Australian Government Locator Service Metadata Set

The AGRIS Application Profile is available at the Food and Agriculture Organisation

Content Standards for metadata schemes

Most metadata schemes while specifying metadata elements and their meaning do not contain content standards.

Content standards provide instruction on how to populate metadata elements. They provide standard and consistent ways to transcribe and describe attributes of the digital content within the metadata scheme. For example, the content standard for libraries, Anglo American Cataloguing Rules (AACR) provides instruction on how to write the content when transcribing an author as <last name, first name>.

Content standards, like metadata schemes, may also make recommendations on values for the elements themselves, such as thesauri use and encoding schemes .

Content standards are most often stand-alone documents and tied to a metadata scheme. This is because metadata schemes and content standards have been developed within subject communities and specialities. The museums, libraries, and education communities and many of the various science sectors have developed their own metadata schemes and companion content standards.

Cultural heritage sector

The content standard, Cataloguing Cultural Objects (CCO): a guide to describing cultural works and their images is used with the VRA Core 4.0, published by the Visual Resources Association and used in the cultural heritage sector.

Libraries

The content standard, Anglo-American Cataloguing Rules, published by the American Library Association et. al. is used with MARC21 (MAchine-Readable Cataloguing Record) published by the Library of Congress and used in the library sector.

Archives

The content standard, Describing Archives (DACS) published by the Society of American Archivists is used to describe archival materials. DACS can be used with the metadata schemes, MARC21 and the Encoded Archival Description (EAD). The Society of American Archivists and the Library of Congress publish EAD and it is a standard for encoding archival finding aids using eXtensible Markup Language (XML)

Types of Metadata Schemes

Metadata schemes are created for different purposes. A primary purpose maybe the discovery of digital content on the web; another, that digital content is managed for long-term preservation and so on. Some metadata schemes have a primary purpose and so are able to fulfil multiple purposes such as description, discovery and administration. For simplicity this guide lists three primary types of metadata by purpose, though variations on this typology can be found in literature.

Descriptive and discovery metadata

Descriptive and discovery metadata are created in order to:

  • describe digital content – for example, an abstract element may summarise what the digital content is all about; and

Many of the metadata schemes and content standards that are available are used for discovery and description purposes. The Dublin Core Metadata Element Set is an example of a metadata scheme for discovery. It can be used by many sectors as a common layer to map their own schemes to when making digital content available on the web.

The following is an example of a metadata scheme for description:

“Categories for the Description of Works of Art” published by the J. Paul Getty Trust and College Art Association

This metadata scheme, like some others, allows for the description of a single (or item) level description and multiple levels of description.

Administrative Metadata

Administrative metadata is designed primarily to manage digital content. Those managing content over time need to be able to undertake activities such as:

  • archive digital content
  • track digital content and its representations
  • ensure file formats can be read and transformed
  • ensure the authenticity and integrity of digital content over time
  • identify for the source of the metadata and updates

A scheme known as Preservation Metadata: Implementation Strategies (PREMIS) is an example of a metadata scheme for administration. This scheme can be used for the long-term management of any type of digital content. Organisations managing digital repositories are its primary users.

Rights management information is also administrative metadata. Generally known as “rights languages” they are used to express rights information over content.

An example of a language for digital rights management is the Open Digital Rights Language (ODRL) an international effort aimed at developing and promoting an open standard for rights expressions for digital content in publishing, distributing and consuming of digital media across all sectors and communities.

A machine-readable language for expressing rights for digital content on the web has been developed by the Creative Commons, a non-profit organisation. Creative Commons are a set of five licences applies over and above copyright.

Technical Metadata

When digital content is created, whether an image, music, sound recording and video, the file it creates contains some form of embedded technical metadata. This kind of metadata generally defines the technical characteristics or attributes of digital content. The technical metadata may contain information such as:

  • the compression, resolution and pixel dimension of an image
  • date and time a document was created
  • the number of bits of sample depth of an image
  • the number of frames per second of a video

Examples of technical metadata formats include:

Technical metadata is found in a range of file (mime) types and their corresponding file formats. Three open standards containing some technical metadata include:

  • TIFF (Tag Image File Format), is a high quality uncompressed file format used for creating archival copies (raster image) and creating derivative files. TIFF can be used by many scanners, printers, and computer display hardware does not favour particular operating systems, file systems, compilers, or processors
  • JPEG2000 is a developing standard also intended for archival use.
  • MPEG-7 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group) with metadata for description. technical and administrative use

In 2006 the American National Information Standards Organisation (NISO) published “Technical Metadata for Digital Still Images” to define a set of non proprietary and open technical metadata elements for digital still images.

In the library sector the Library of Congress have developed some technical file formats as XML schemes for audio, video, text and images to ensure interoperability amongst libraries:

  • AudioMD: Audio Technical Metadata Extension Scheme
  • VideoMD: Video Technical Metadata Extension Scheme
  • TextMD: technical metadata for text-based digital objects
  • Image (MIX) : NISO Metadata for Images in XML Scheme

Another type of technical metadata is the kind that brings together separate component parts e.g. scanned pages of digital files into one logical unit e.g. a book. This kind of metadata is also known as structural metadata.

Combining and encoding metadata schemes

The various types of metadata – descriptive, administrative and technical – as well as different metadata schemes can be combined and used together when encoded into mark-up languages.

Mark-up languages are used on the World Wide Web and structure the metadata in a consistent way, so that web technologies can use and reuse the metadata in different ways. Guidelines are available on encoding metadata schemes into mark-up languages. The Internet Engineering Task Force have developed RFC (Request for Comments) 2731 for Dublin Core Metadata using HTML4.0.

The Metadata Encoding & Transmission Standard (METS) can encode descriptive, administrative and technical metadata.

The Canadian Heritage Information Network (CHIN) has more on encoding and combining metadata – Metadata Standards for Museum Cataloguing