| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

CCMetadataWorkingPlan

Page history last edited by PBworks 17 years, 9 months ago

CyberCemetery Metadata, Working Plan

Starr Hoffman

7.19.2006


CC Metadata Entry Rules, last modified 7/19/06

 

 

What do we currently capture for CRS (report-level)?

  • DC Title
  • DC Creator
  • DC Subject
  • DC Description
  • DC Contributor
  • DC Publisher
  • DC Identifier
  • DC Source
  • DC Relation
  • DC Language
  • DC Coverage
  • DC Date
  • DC Resource Type
  • DC Format
  • DC Rights
  • Institution
  • Collection
  • System
  • Digital Objects
  • Permalink


What would be easy to use for CC (site-level)?

  • DC Title
    • do we use what appears on the homepage, or what appears in the title bar/coded as "title?"
      • 7/13, Valerie & Starr agreed that the metadata/coded title should take primacy. If there is no coded title, then the title appearing near the top and/or in the largest font on the homepage should be used, and in unusual cases where this and the coded title don’t match, there will have to be some good judgment used.
      • if there are two or more options for the title, then include the secondary title(s) under the qualifier: “alternatetitle”
    • is this repeatable? yes
    • can it be qualified? yes
  • DC Subject
    • use LIV for this? other controlled vocabulary?
    • LIV not currently an option on the dpu template
    • there would be many for each site; perhaps deal with this later
  • DC Description
    • use from the html meta-description
    • if this is not available, use text from homepage or “about us”
    • if not available—write brief description?
  • DC Identifier
    • what will we use for this? the permalink? (if not, where will we store this?)
    • applicable options: URL / other
  • DC Language
    • almost always English
  • DC Coverage
    • options: time period, place name, single date, and/or date range
  • DC Resource Type
    • DC Type: “Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary (DCT1)). To describe the physical or digital manifestation of the resource, use the FORMAT element.”
    • appropriate choices might include: text / html / multimedia / website
    • current closest options in template: text / interactive_resource__
    • should ask Mark to add either html or website as an option
  • DC Format
    • should this be "website" or “html” for all?
    • is this a repeatable field? no; can we make it repeatable?
    • what about referring to specific aspects, like pdfs or streaming media?
    • what about files that require specific software/hardware to access—can we note this?
    • from DC site: “Typically, Format may include the media-type or dimensions of the resource. Format may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types (MIME) defining computer media formats).”
    • current options in template: audio, video, text, image, __other__
  • DC Rights
    • statement about GPO/digital preservation?
    • automatically generate identical statement for each CC site?
  • Institution
    • always __UNT__
  • Collection
    • “CC?” (CRS Reports are labeled CRSR)
    • __UNTGD (UNT GovDocs) is the current choice available
  • System
    • “CC?” (CRS Reports are labeled CRS)
  • Digital Objects
    • number of digital objects associated with each site (!!!)

 

What fields might be used sometimes, but could prove problematic for some sites?

  • DC Creator / DC Contributor / DC Publisher
    • distinguishing between each might be difficult in some cases
    • DC Creator: “An entity primarily responsible for making the content of the resource. Examples of Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.”
      • In the case of an e-zine, should the editor appear as creator, and the other staff appear as contributors? yes
      • can we note these titles somewhere? yes: use a qualifier (ex.: editor)
      • discard titles, such as Mr./Mrs./Ms./Dr./Rev.?
    • DC Publisher: “An entity responsible for making the resource available. Examples of Publisher include a person, an organization, or a service. Typically, the name of a Publisher should be used to indicate the entity.”
      • should we name this element “Agency” instead of “Publisher?”
    • DC Contributor: “An entity responsible for making contributions to the content of the resource. Examples of Contributor include a person, an organization, or a service. Typically, the name of a Contributor should be used to indicate the entity.”
      • In the case of an e-zine, should the editor appear as creator, and the other staff appear as contributors? yes
      • can also use qualifiers to illuminate the role of each contributor
      • NOTE: for “Access America,” I stopped the contributors after “C”
  • DC Date
    • will not always be present for site creation/modification
    • following CRS practice of dating days as “1” if specific date was not recorded
      • i.e., “8/1/01” for “August 2001”
    • it would be good to be able to distinguish between:
      • site creation date
      • site last modified date
      • harvest date (we’ve got this for most sites)
      • date made public on CC
      • metadata creation
      • metadata modification (can this be automatically generated/updated?)

 

What fields seem unlikely to see much use?

  • DC Source
  • DC Relation
    • maybe a related report?
    • linked sites?

 

What additional information could/should be captured?

  • a list of all file types included in the site
  • information about the capture: date, software used....etc.?
  • should we record the original URL anywhere? is that useful for any reason?
  • software used to harvest
    • some will be an educated guess
    • some sent by agency, some by GPO
    • record of what we’ve got: G:/digital projects/CyberCemetery/CCdocumentation.xls
  • should we note when/if a site is a particular “snapshot” of a changing site (for instance, e-zine), or is this irrelevant, since each site is essentially a snapshot?

 

Questions

  • which of the above fields are required?

 

Back to CyberCemetery Metadata

Back to GovDocs Notes & Training

Back to Front Page

Comments (0)

You don't have permission to comment on this page.