A quick Google search confirms for me that I am not the only one who despises the way the iPhone numbers pictures and videos.  I am in the habit of copying from my brother’s computer the pictures that he takes of my nephews, both to procure a source of images that can be used in scrapbooks and to serve as a form of off-site backup.  In the years that I’ve been copying these pictures, the confusion caused by the iPhone’s file naming procedure has caused me great consternation.  I do not own an iPhone myself, but the best I can tell from personal experience plus online research is that after pictures are downloaded to a computer, it will begin numbering new pictures at 001.  The end result is that when I copy pictures from several folders – each one created with a date name that indicates when the pictures are downloaded – I wind up with multiple different images with the same file name.  I even sometimes wind up with two copies of the same picture with different file names; apparently, if a picture is left on the iPhone after a download, that picture’s number is also reset.  Needless to say, if I have a photo with the number 009 printed on the back, the process for determining when that photo was taken is not clear cut, because a search for that file name may produce three or more images with the same name.

In addition to generating personal frustration, these interactions with iPhone pictures provide me with a reason to reflect on why metadata matters.  Earlier this year on the ACRL TechConnect Blog, Meghan Frazer posted “An Elevator Pitch for File Naming Conventions.”  She provides her own recommendations about file naming conventions:

  1. include the date of creation
  2. leave out spaces and special characters
  3. use descriptive file names

She provides links to other resources online that justify these decisions.  Two of the most useful are a document created by the Bentley Historical Library and a series of videos produced by the State Library of North Carolina.  She also points out a program on Macs called Automator that can be used to batch rename files.

I began using a personal computer in the era of the MS-DOS FAT file system, so the names of the older files that I have migrated from computer to computer over the years are hampered by the 8.3 file name (a maximum of 8 characters for the base file name and 3 characters for the extension).  In my attempt to create descriptive file names at the time, I tended to use the extension to indicate something about the content of the file; for example *.let was a letter that I wrote, while *.civ was a file that I created for my civics class.  Needless to say, this has created a problem for me in that by using the extension in this way, I removed any indication of the software program that I used to create these files.  A post on How-To Geek provides step-by-step instructions of four methods for batch renaming files in Windows.  I know that the word processor I used was WordPerfect, so I could change the extensions easily from the command line.  But that would lose the descriptive information contained in my original extension, so instead I’ve decided to try out one of the most highly recommended software solutions, Bulk Rename Utility, in order to have a more sophisticated method for handling my file name changes.  (Stay tuned for the results!)

To this point, I have relied on personal examples of why metadata matters, but as Frazer points out, if repositories are making files available to the public, file naming matters for both for discovery and access.  For example, the Moby Dick Big Read project provided audio versions of each chapter of this book, available for download.  Moby Dick had been on my list of things to read for years, and I had a long car trip coming up, so I downloaded the chapters to an SD card and popped it into my car’s audio system.  It had been very time-consuming to download these files because I had to download each chapter separately, so I hadn’t taken the time to notice how the files were named.  But when I tried to play the book, I realized the file names given were going to make my listening very frustrating.  The chapters tended to be named c1.mp3, so when these files were sorted into ascending name order, chapter 1 was followed by chapters 10 through 19.  When I later renamed these files, I had to be cognizant of the 136 chapters and use leading zeroes so that they would be in the right order.  (A more minor issue, but still one easily addressed by the producers from the outset, was that only two of the files were embedded with ID3 information, so the title and artist were not readable by my audio system.)

Obviously, repositories can leave the responsibility to the user for renaming files as necessary.  I eventually got Moby Dick to play in the correct order, so other people can do it, too.  But it strikes me that this is a relatively simple way that archives and digital libraries could both facilitate discovery and access while also modeling best practices for file naming procedures.  In the long run, this can only improve the state of born-digital records that are accessioned by repositories.  Personally, I don’t think it’s too much to add to the workflow.  Here are some resources that can address this need:

  • MIT Libraries, Data Management and Publishing – in addition to addressing naming and versioning, it also describes data identifiers
  • University of Michigan, Digital Preservation Glossary – provides succinct definitions of key terms related to digital preservation and distinguishes among various types of metadata

If you want to delve into metadata more fully, you can peruse these sites:

  • Dublin Core: defines a set of fifteen “core” metadata elements; has the flexibility to address various needs (see the “Levels of interoperability” section to determine what is needed by your organization)
  • Metadata Encoding & Transmission Standards (METS): as this web site explains, “The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium.  The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.”
  • Metadata Object Description Schema (MODS): MODS is an XML schema for a bibliographic element set; one of the most useful elements on this web site is the listing of conversions between MODS and other metadata schema
  • Open Archival Information System (OAIS): for technical information about digital preservation, the latest version of the magenta book (2012) contains all of the recommended best practices
  • PREMIS: the Preservation Metadata: Implementation Strategies working group was convened by OCLC External Link and RLG External Link and developed the PREMIS data dictionary with the goal of creating an implementable set of “core” preservation metadata elements; Version 2.2 is available, and Version 3.0 is in the works