Much of the content we view online is ephemeral. URLs are moved, taken down, and reorganized, often they are removed forever from the web. Additionally, the growing volume of born-digital material, which has no physical analogue, poses a serious challenge to institutions tasked with collecting and preserving cultural heritage. The New York Art Resources Consortium (NYARC) is engaged with capturing this type of information. Members of the consortium focus on the documentation of art history and the art market. As a result, the ability to properly record and preserve web-based or born digital materials is especially important.

The NYARC Web Resources Program selects, preserves, describes, and provides public access to a number of archived websites. These sites complement the print collections of each of the three art research libraries within the consortium as well as their respective institutional websites. There are 10 collections in the  NYARC Web Archive:  

  1. Art Resources
  2. Artists’ Websites
  3. Auction Houses
  4. Brooklyn Museum
  5. Catalogues Raisonnes
  6. Museum of Modern Art
  7. New York Art Resources Consortium
  8. New York City Galleries
  9. Restitution of Lost or Looted Art
  10. The Frick Collection

The websites in each collection are carefully chosen. During the appraisal process, their content as well as degree of ephemerality are carefully considered.


In 2010, Deborah Kempe, the Frick’s Chief of Collections Development and Access, was introduced to the web-based service Archive-It at an ARLIS/NY meeting held at the Metropolitan Museum of Art. This introduction demonstrated the feasibility of using Archive-It’s technology to implement a web archiving program at NYARC (Kempe 2013).

“Going forward, one of the biggest challenges scholars and curators of contemporary art and architecture face currently, and will increasingly face, is how to store, retrieve, and investigate born-digital materials.”

—James Cumo, How Art History Is Failing at the Internet (2012)

Inspired by James Cuno’s popular 2012 blog post stressing the value of preserving born-digital content, Kempe and others at NYARC decided to pursue a program to establish the NYARC web archive program. This program recognized the need to collect and preserve online resources that would be of value to the consortium and to the public.

In 2012, NYARC received an initial twelve-month grant from the Andrew W. Mellon Foundation to assist with the planning phase of the initiative. In 2013, NYARC received a two-year grant–Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art Resources–to assist with the development of a web archiving program.


Since 2015, two graduate students from Pratt Institute’s School of Information have served as Web Archiving Fellows at NYARC. Over the course of two semesters, the fellows work with the NYARC web archiving team to gain “professional-level experience working on a variety of projects ranging from cataloging to digital preservation” (NYARC n.d.) Fellows contribute to NYARC’s unique collection of web archives devoted to born-digital resources–including websites for New York City galleries, artists’ websites, catalogues raisonnés, and auction catalogs. Moreover, the fellows are involved in the appraisal process and often evaluate candidate websites that may be added to the collection. As part of their training, the Fellows have the opportunity to work extensively with Archive-It and Webrecorder, two state-of-the-art web archiving technologies. They also participate in meetings with the Frick Collection and NYARC staff members. Frick fellows also have several opportunities to present their findings and are introduced to people and communities involved within the national and international web archiving community. In 2019, for example, both Pratt fellows had the opportunity to participate in the inaugural Webrecorder community call, which included a preview of planned features as well a question-and-answer exchange with the Webrecorder development team. The fellows were also encouraged to attend webinars on data and web archiving and were engaged in user testing throughout the semester.


While the fellows were largely involved with quality assurance (QA) of crawled sites (discussed here) during the 2018-2019 academic year, they were also involved with more experimental explorations of the web archive as a potential place for humanities research. Inspired by Thomas Padilla “Collections as Data” project, the fellows designed a “persona,” or archetypal user, for web archives research. In exploring the scholarly use of web archives, they identified web historians as being a group that might see data from web archives as a primary historical source for understanding the digital age. The work of Ian Milligan, Niels Brügger, and Emily Maeura were also influential in formulating ideas around engaging archives from a scholarly perspective. Milligan’s work at the Archives Unleashed Project at the University of Waterloo is significant as it provides web archivist and Internet researchers with a potential avenue to further analyze archived websites. In particular, he and his team have developed tools to extract data for text analysis and for visualizations. Niel Brügger’s The Archived Web: Doing History in the Digital Age (2018) provides the first book-length analysis on web archives as a rich resource for scholarly inquiry. Emily Maeura’s article “What’s Cached is Prologue: Reviewing Recent Web Archives Research Towards Supporting Scholarly Use” further contributes to this area of research and highlights very important issues such as ethics in web archiving.

archetypal persona for web archives research inspired by “Collections as Data” Project

Using the persona a jumping off point, the fellows performed open-ended explorations of several of the NYARC collections. This included creating a dataset from the NYC Galleries web archive and using the Archives Unleashed Toolkit to analyze two collections in the NYARC web archive–the Frick Collection and Restitution of Lost or Looted Art.


During the early 2000’s, many art galleries, art dealers, museums, and art-related organizations began to transition their publications toward being digital. By 2008, this trend had accelerated– perhaps as a cost-saving maneuver in the wake of the recession. While being digital allows for wider, faster access to the materials it also makes them highly ephemeral and liable to loss. The efforts of NYARC, which archives the websites of over 200 New York City-based art galleries, are a response to this situation. The websites that are archived are valuable because they show a range of galleries in New York City and Brooklyn, including neighborhood galleries in Williamsburg and the Rockways as well as the more established galleries in Chelsea and the Lower East Side. In addition to preserving active galleries, NYARC is also engaged in finding and archiving the websites of galleries that are no longer open.

In order to understand these closures better, the Fellows created a dataset of all the galleries. Then, using this dataset in a software called Tableau, we created an interactive data visualization to allow the user to interact with the collection in a new way. Rather than looking at an alphabetized list of galleries represented in the collection, using a feature like this allows the collection to be explored within an interactive virtual map. The user can zoom in on specific locations of interest or have an open ended exploration. Each of the dots represents a gallery within the collection and when panned over displays information about the number of captures of the website, when it was first captured, and most recently captured. Those making acquisitions decisions regarding potential new websites to include in the collection can observe the breadth of galleries within scope and make potential use of it as a tool during collection development processes. So there are multiple ways this type of tool can be useful for both users and web archivists.

Some of the websites for closed galleries remain live. Others however have been retired and exist exclusively within NYARC’s web archive. It’s evident that there are vast amounts of digital content which otherwise would be lost if not for the work NYARC is doing to capture gallery websites. And this will serve the research goals of those who are searching for information related to the history of the art market, the evolution of the gallery scene in NYC, and trends in contemporary art.

Chelsea gallery closures

Upper East Side gallery closures


In addition to using Tableau, the Fellows also synched the NYARC Web Archive collection with the Archives Unleashed Cloud. The Archives Unleashed Cloud, maintained by the University of Waterloo, is an “open source cloud-based analysis tool that helps researchers and scholars conduct web archive analysis” (Archives Unleashed Project 2019). Developed by Nick Ruest (PI), the Archives Unleashed Cloud requires no programming (unlike the non-cloud based Archives Unleashed Toolkit). In as little as three clicks, the Fellows were able to connect the NYARC collection to Archives Unleashed. The Archives Unleashed Cloud software ingested the NYARC collection and within 48 hours, the Fellows were able to select portions of the entire collection for further analysis — the Frick Collection and the Restitution of Lost or Looted Art were chosen. Both collections underwent extensive analysis through Archives Unleashed, a process that took two days for each selection. Once finished, the software returned a Gephi network, a raw network, a complete list of URLs, all of the text derived from the HTML, and a .zip file of text from the top 10 domains for each collection.  

“Archives Unleashed aims to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past.”

–The Archives Unleashed Project

Visualizations made from Archives Unleashed and Voyant


In 2012, the Frick’s website was relaunched using Drupal to enhance its clarity and accessibility. Several areas of the website were added, including an online digital collection using the eMuseum image database from The Gallery System (Frick 2018). The Frick website is comprised of several main pages. These include: the Art page, where visitors can browse the collection; the Exhibition page, which includes information on current and recently ended exhibitions; and the Interact page, where videos of lectures and symposia filmed at the Frick are hosted. As with all partner institutional websites, the Frick website, along with the NYARC website, Frick Future, and The Frick Collection Instagram, are part of the NYARC web archive. Because of its size, 96 GB (which is comparatively small), the Fellows chose to sync this collection to Archives Unleashed. The size of the files was important because there are data limits for the service and the Fellows also did not wish to overly tax the system by attempting to throughput a larger amount of data. The result of synching was several files, including a Gephi network, a raw network, and text pulled from the html. Unfortunately, the text files were not readily usable because they had mostly repeated passages (caused by monthly crawls) as well as long strings of text from google videos. In short, these text files would need substantial cleaning before use. The Gephi network and raw networks, however, provided a somewhat instant look into how the urls we collect are connected. One of the Fellows is versed in network visualizations and was able to extend the raw Gephi file provided by Archived Unleashed (pictured).

Gephi Visualization of hyperlinks in the Frick Collection

Restitution of Lost or Looted Art

During World War II, the Third Reich systematically looted cultural treasures from Nazi-occupied countries. These works of art were not only stolen from museums and cultural sites, but also from the personal collections of Jewish families. More recently, the Syrian Civil war has brought additional cultural loss. ISIL has deliberately destroyed or stolen significant cultural heritage material and has targeted historic sites in Aleppo, Raqqa, and Palmyra, just to name a few (Buffenstein 2017). Such plundering, looting, and destruction of cultural goods has had a resounding effects on the world’s cultural fabric. Attention to this loss, however, has been tremendous as news outlets, archaeologist, art historians, and organizations such as UNESCO have publicly decried the destruction of cultural heritage. Websites, blogs, and social media accounts have also been created to track looting and restitution efforts. NYARC’s Restitution of Lost or Looted Art Collection aims to capture this material by harvesting 26 websites on the subject.

Gephi Network of 26 websites in the Restitution of Lost or Looted Art Collection


As more scholars become interested in using web archives as sources for scholarly inquiry and data collection, it is increasingly important to create and maintain sustainable workflows for acquiring, assessing, and preserving data. The collected data must be useable — i.e. must be analyzable and settable — and retrievable so that scholars using web archives have reliable materials with which to work. This is especially important when archiving cultural heritage information. Both the NYC Art Galleries data and the information from the Restitution of Lost or Looted Art collection demonstrate the importance of longitudinal data for filling in potential gaps in our knowledge. The Frick Collection provides insights into the strong interconnections between major institutions like the Frick and other arts organizations. The impact of this information on future scholarship is important and its organization and preservation require sustaining such archiving efforts. The analyses performed by the Fellows is the beginning of next steps for the NYARC collection. Moving forward, it will be important to continue to collect and investigate the archives as they grow and evolve. What are the narratives to be found in the data? How might the information derived from services like Archives Unleashed be used in constructing (or even correcting) the scholarly record?   

Works Cited

Archives Unleashed Project. (2019). Archives Unleashed Cloud. Retrieved from

Buffenstein, Alyssa. (2017).  A Monumental Loss: Here Are the Most Significant Cultural Heritage Sites That ISIS Has Destroyed to Date [blog post]. ArtNet. Retrieved from

Cuno, James. (2012).  “How Art History Is Failing at the Internet” [blog post].  The Daily Dot. Retrieved from

Frick Collection, The. (2018). Technology and Digital Media. The Frick Collection. Retrieved from

Kempe,  Deborah. (2013). “It Is Time to Embrace the Present [blog post].  Web Archiving Section.  Retrieved from