Creating Metadata by Hand: Musings on the Limits of Automation in Archives

This post was written by Alice Griffin, who has worked in La MaMa’s Archives since November as the Metadata/Digitization Assistant. She’s leaving La MaMa at the end of July to pursue a Master’s degree at the Pratt Institute’s School of Information. We asked her to offer some reflections about her time at La MaMa. (We will miss her terribly and wish her all the best in her next adventure.)

“But… a computer could just do your job.” The first time I heard this remark it made me pause, seriously question the future of my career, and turn to my professional mentors for reassurance. Now, after being in this position for seven months, I feel confident that my position is not so easily automated away.

In the La MaMa Archives/Ellen Stewart Private Collection, I am the metadata/digitization assistant. My job is to add digital media to corresponding catalog records on the fantastically vast La MaMa Archives Digital Collections site, created by several catalogers and Project Manager, Rachel Mattson, over the past three years. As a result of this project, researchers all over the world can now see the photographs and programs that were initially just minimally described. This project of digitization requires a scanner, some metadata know-how, creativity, patience, and lots of time. The La MaMa Archives does have a lovely professional scanner, my metadata knowledge continues to grow, and I do have a considerable amount of patience. However, time is running short as the grant I was hired on comes to an end. I have added hundreds of digital objects to the digital collections since November 2016, but it feels as though my job has just started.

IMG_3142

Alice Griffin, a human, at her desk in the La MaMa Archives.

So, why can’t a computer just do my job? A computer is already helping me with many aspects of this task. The scanner I use to digitize photographs, programs, flyers, postcards, and other objects is connected directly to my computer and once I choose settings and file name, there’s not much more to do except click “scan.” Once I have my preservation (TIFF) files and access (JPEG) files created in Photoshop, it’s just a matter of an easy drag and drop to initiate Secure File Transfer Protocol (SFTP) through Cyberduck to store them on the La MaMa Archives server or upload them to our digital collections site through a CollectiveAccess-powered backend. I also manually add metadata: a paragraph describing the material at hand, links to Library of Congress Naming Authorities and Subject Headings, information about the storage location and preservation needs of the object, and other bits to make it as complete a record as possible. But in the era of self-driving cars, why do we need a human to do this work? Even though I don’t think anyone would accuse a surgeon of obsolescence because of the rise of robotics in the operating room, I think this is a fair question and I would like to attempt a response.

Simply put, a computer does not yet exist that automates all aspects of my workflow; human labor and expertise are always involved. The labs page of the Stanford Libraries website lists the equipment that used for digitization projects and the rate of digitization. The robotic-book scanner can scan 4 times the number of pages in an hour as someone operating the manual book scanner. So, why even continue to pay student workers to do that manual work? The Stanford Libraries’ robotic-book scanners are not safe for fragile bound materials, and therefore careful human hands are necessary. Of course, book scanners are being engineered to have that gentle touch. In her article “The Hidden Faces of Automation,” Lily Irani mentions a “patented machine” engineered to turn the pages of rare books for digitization. But even this kind of machine was not fully automated; it “housed a worker who flipped the pages in time to a rhythm-regulated soundtrack” (34).

In 2006, the System for the Automated Migration of Media Assets (SAMMA), a system of robotics, hardware, and software, began being sold as a way for institutions to transfer media from obsolete formats to digital files in a streamlined and cost-effective manner. Three factors make SAMMA unusable for my project. First, I am not working with or digitizing La MaMa’s audiovisual materials (for some information about La MaMa’s awesome audiovisual materials, see Rachel Mattson’s blog post here). Second, SAMMA does not create metadata about the content of the materials, such as who or what is depicted. And third, SAMMA’s cost-effectiveness is relative; the costs for a community archives, such as La MaMa, to use tools like SAMMA or the book scanners mentioned above would be prohibitive.

While robotics, hardware, and software are useful, there is still always human skill and precision involved. Before even beginning to scan I must make decisions about whether each object is appropriate for digitization – are there privacy or rights concerns? And if there are duplicates of an object, I must choose the best copy to digitize. When scanning begins it is not just a matter of sticking a stack of papers into the automatic feed on a photocopier, or placing a book or videotape into a robotic scanner. Materials I work with must be handled carefully so that they do not tear or crinkle. Additionally, in order to fully describe an object I am digitizing, I must fill in several fields to physically characterize the object or objects: how big is the object? How many duplicates are there? Is it color or black and white? Throughout this work, the materials must be handled with care, one page/photograph/poster at a time. We want these originals to last because while digital files generally allow for easier access, they do not necessarily stand the test of time. Original photographic prints, negatives, and papers cannot just go in the trash once you have a digital surrogate.

CottonClubGala

Object record for production photographs from the 1985 production of The Cotton Club Gala [OBJ.1985.0307] as viewed on La MaMa’s digital collections website.

Adding metadata also requires a human mind. The description field, in particular, even requires some creativity because, as a cataloger, I have to think about how different people will use the catalog. How will La MaMa archives staff search the catalog versus the La MaMa marketing or development staff? How does an academic researcher use the catalog versus an artist that has performed at La MaMa before? A human cataloger can take advantage of these nuances of use to create a more robust, user-oriented catalog in a way that a rigid computer program simply can’t. To give an example, I asked myself these questions while cataloging photographs from the 1985 production of “Cotton Club Gala,” directed by Ellen Stewart with music by Aaron Bell and choreography by Larl Becham. The description field is a beautiful thing because it allows you to tell the researcher in full sentences about the object: what production it’s from, who is depicted, anything of note about the object, or even if you’re not sure of the date. So, in the case of the Cotton Club Gala photographs, I made sure to address all these users in the description:

This folder contains eight photographic prints, five of which are duplicates, from “Cotton Club Gala,” directed by Ellen Stewart and produced at La MaMa in 1985. This folder also includes a typewritten letter on Vogue letterhead from assistant to Amy Gross, David DeNicolo, to La MaMa archivist Doris Pettijohn thanking her for letting them look at the photographs.

Valois Mickens is depicted in the third image.

The description is not long nor is it complicated, but it provides information in a readable format. There is information about prints and duplicates for archives staff; it identifies the production as directed by Ellen Stewart, which means it could be an important production for marketing use; for an academic researcher the whole description, including the letter from Vogue, because it gives context for the objects; an artist searching the catalog might also appreciate the whole description, or they might find information about who worked on the production and who is depicted more interesting. The description field is different for every object record, and therefore requires flexibility, creativity, and brevity to produce a paragraph that contextualizes the object without overwhelming the user.

The La MaMa Archives holds many one-of-a-kind materials; for some productions, the programs, photographs, or posters here may be the only remaining evidence that they took place. In this way, the La MaMa catalog does not just hold information gleaned from other sources, but it is a producer of information itself. When a researcher or an archives staff member notices a mistake in the catalog we usually need to consult our own material to solve the problem, a Google search will not help us. For example, when digitizing photographic prints for the 1965 and 1967 productions of The Sand Castle, written by Lanford Wilson and directed by Marshall W. Mason, I noticed that the performers depicted in the photographs weren’t matching up with the production dates that were handwritten on the back of the photographs. The La MaMa catalog was the only source I could turn to fix the confusion. I cross-referenced performers listed in the programs with who was depicted in the image and compared sets and costumes for both productions. In this way the La MaMa catalog functions as repository and generator of the history of off off-Broadway.

1965_TheSandCastle_a004

Production photograph from the 1967 production of The Sand Castle [OBJ.1965.0216]. (This item was originally cataloged, in error, as documenting the 1965 production.)

While my position may appear to be a solitary one, it does require person-to-person interaction at a level that a computer cannot do. I am in regular contact with James D. Gossage, a photographer who documented many of La MaMa’s early shows. His own files and memories have corrected and enriched the catalog and in March 2017, Gossage donated programs, a poster, and some photographs that the Archives did not have before. He gave us the rights to three of the photographs [OBJ.1967.0349], which depict Tom Eyen, a playwright and director of many La MaMa shows and probably best known for writing Dreamgirls. These are beautiful portraits with dramatic light and shadow and the La MaMa Archives is excited to have them. It’s possible that Gossage felt comfortable passing along these prints into our care because, despite some errors in the catalog, he could see the work that we put into describing these materials to the best of our knowledge and ability. The humanity (and therefore error) present in the La MaMa Digital Collections website, reflects the deep humanity in the artists and their productions that the photographs, programs, correspondence, and posters document.

1967_TomEyen_a003.jpg

Portrait of Tom Eyen by James D. Gossage, circa 1967. [OBJ.1967.0349]

No, my position cannot be simply automated away, but I’m sure I will continue to field questions about my position’s relevance. And while not receiving proper recognition for my work is mostly an inconvenience or a blow to my ego, it does reveal a widespread misunderstanding, or even misrecognition, of the mechanisms behind automation and making information available on the Internet. I am glad to see that there is growing scholarship on how obscuring the connection between human beings and automation deeply affects individuals and communities economically and emotionally. There is too much to delve into here in this blog post, but I would like to suggest some further reading. First, Safiya Umoja Noble’s article “Google Search: Hyper-visibility as a Means of Rendering Black Women and Girls Invisible,” examines how Google search results are not separate from human influence, but are in fact embedded in racist and sexist stereotypes that benefit advertisers. This aspect of Google is mostly ignored or glossed over. Noble reminds us that “the results that surface on the web in commercial spaces like Google are not neutral processes—they are linked to human experiences, decision-making, and culture.” Another article that reveals the human influence behind a process commonly thought of as automated is Sarah Roberts’ “Commercial Content Moderation: Digital Laborers’ Dirty Work.” Roberts exposes the human labor behind the moderation of user-generated content and how these workers impact the content they screen while that content also takes a toll on their well-being.

The third article I want to recommend here Lily Irani’s short piece “The Hidden Faces of Automation.” In it, Irani explains how the “data janitors” behind “cultural data work,” such as “transcribing small audio clips, putting unstructured text into structured database fields, and ‘content-moderating’…user-generated content” (37), are so easily and consistently undervalued and underpaid. Irani then asks two very important questions that I would like to highlight here: “What would computer science look like if it did not see human-algorithmic partnerships as an embarrassment, but rather as an ethical project where the humans were as, or even more, important than the algorithms? What would it look like if artificial intelligence and human-computer interaction put the human care and feeding of computing at the center rather than hiding it in the shadows?” I think Irani brings up a remarkable point in these questions. Even though technology fields are booming, computers continue to be limited by the limitations of humans; limitations of technical knowledge, sure, but also limitations of empathy for human workers. Perhaps technologists need to embrace this level of social responsibility in their work. It is not a failure to admit we still need to do things by hand; rather, this honesty allows light to be shed on a previously concealed issue.

Suggestions for further reading:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s