Digital preservation: a long-term path
What is digital preservation? Is digitization the same as preservation? Is having a backup preservation? And sharing content? Is it right to ask whether we do or don’t do digital preservation? The answer is not as easy and simple as a YES or a NO. Understanding this key concept is crucial for organizations to be successful in implementing its digital preservation strategy for the future.
Let’s start at the beginning. Digital preservation can be seen as all those processes aimed at ensuring the continuity of digital heritage materials for as long as they are needed[1]. It is important to understand digital preservation not as an absolute thing but in levels, that is, as a dynamic process that can be steadily evolving and improving. Therefore, the effort should focus on understanding what our level of digital preservation is, how other organisations in the world are doing that and how we can improve it.
Starting point: How to know the level of digital preservation?
There are some good practice guidelines for digital preservation. Two of the simplest are the NDSA levels and the DPC RAM Model. Both allow institutions to evaluate the work they are doing in digital preservation in a simple way to identify areas for improvement, set goals, and define a long-term preservation strategy. Let’s learn a bit about them!
The NDSA Levels
The National Digital Stewardship Alliance has created the Levels of Digital Preservation Matrix[2], known as NDSA Levels, to allow any institution to evaluate the work it is doing in preservation in a very simple way and to serve as a starting point or to see which areas need to be optimized. This matrix can also be used to evaluate the functionalities that a digital preservation system must fulfil. Nowadays, this is one of the most important benchmarks in digital preservation. It is very useful and easy to understand. NDSA Levels focus in 5 Functional Areas: Storage, Integrity, Control, Metadata and Content, and in 4 Levels: Know your content, Protect your content, Monitor your content, and Sustain your content. And to fulfil any level you must fulfil all the previous ones.
The DPC RAM Model
The Digital Preservation Coalition Rapid Assessment Model[3], known as DPC RAM Model, is a maturity modelling tool that has been designed to enable a rapid benchmarking of an organization’s digital preservation capability (less than 2 hours by someone with good knowledge) whilst remaining agnostic to solutions and strategy. The model evaluates 11 aspects grouped in organizational and service level capabilities that are ranked according to a simple and consistent set of maturity levels. It will enable organizations to monitor their progress as they develop and improve their preservation capability and infrastructure and to set future maturity goals. Unlike the NDSA Levels, in the DPC RAM Model it is possible to have different maturity levels in the different aspects evaluated.
What next? Preserve first, curate later
Once the work of the institution has been assessed and the level of preservation is known, it is time to establish the strategy to achieve the goals.
A few years ago, this evaluation and audit process was very time-consuming. Content had to be structured to fit into digital preservation plans. This often turned into a problem in and by itself, because a lot of work had to be done before the content was ingested into the digital preservation system, resulting in paralysis in some cases due to over perfectionist practices and in others due to lack of resources.
Nowadays, there are digital preservation solutions that aim to preserve content as soon as possible. These out-of-the-box solutions allow a high level of preservation with less effort. By simply ingesting the unstructured content into the system, the system takes care of various preservation-related tasks such as integrity checks, multiple copies, automatic metadata extraction, characterization and validation of formats, detection of viruses, duplicates, PII (personal information identification), etc. In addition, once the content is preserved within the platform, it is possible to restructure it, add metadata (individually or in bulk), do full text searches (in the content or in the metadata), add tags, create workflows, and other types of actions that allow the archivist to curate, organize and manage the assets within the digital preservation system. In this way, the content is preserved first and curated later, minimizing the risk of data loss.
Better together: The LIBNOVA approach
LIBNOVA’s approach to assisting and supporting institutions to achieve the highest level of preservation possible starts before a digital preservation project begins and goes beyond the end of the project.
LIBNOVA not only assists in the technical part of the project, offering the most advanced digital preservation solutions, but is involved from the very beginning of the project by teaming up with the Institution’s own information professionals, by sharing and teaching its methodology, by facing together challenges that are often similar to those faced by other customers and looking for solutions, by offering unlimited training for the different user groups involved in the project, and by revisiting the adopted digital preservation strategy to adapt and improve it as many times as necessary.
LIBNOVA’s goal is to empower the Archive Staff to have an overall governance of the whole project. And the better way to do this is together, with a partnership approach with the customers.
Looking ahead: Challenges on the horizon
Since its beginnings more than a decade ago, LIBNOVA has focused on providing the community with the most advanced digital preservation platform. A few years ago, it created LIBNOVA RESEARCH LABS, an observatory to coordinate the lines of research to be followed in technological innovation within the company. The results of completed research have a direct impact on the LIBSAFE digital preservation platform.
This laboratory, therefore, is used to identify and work on the future challenges of digital preservation, and the technology to address those challenges.
Thus the digital preservation and management of the research data sets is proving to be one very real challenge. The LIBNOVA-led consortium is participating in the EU-funded ARCHIVER Project[4], to build the next generation solution for archiving and preserving research datasets at the multi-petabyte scale.
Preserving research data has some unique challenges, making it really complex to achieve without proper processes and tools. Managing the provenance, solving the reproducibility problems, or aligning the data sets with the best practices are key areas to address, with many communities involved.
Another challenge that we have been working on for a long time and that always allows room for improvement is to simplify digital preservation tasks for the end user. We call this Assisted Digital Preservation. Thanks to the application of algorithms based on the latest advances in Artificial Intelligence (AI) and the use of artificial neural networks (ANN), the software enables users to do complex things with a single click. These advances are being implemented in tasks such as the automation of ingest processes, the creation of workflows, the analysis of the content of digital objects, etc.
And the last challenge to highlight, but not the least, is how to understand and reduce the environmental impact of digital preservation actions. Everything we do in our daily lives has an environmental impact, digital preservation does too. So what’s the point of preserving information for the future if we are destroying the future?
On this point, at LIBNOVA we are working to reduce our environmental impact as a company by implementing internal environmental management policies (remote work to reduce transportation emissions, reducing waste generation, improving office energy efficiency, etc.) but we are also analysing how to reduce the environmental impact of our customers’ digital preservation work. And this is more interesting and has a higher impact. Applying Artificial Intelligence here to analyse the use of storage or access to content, for example, will allow our customers to better choose the amount and type of storage they need and change it when necessary, thus not only saving costs for our customers, but also reducing their environmental impact.
[1] Concept of Digital Preservation https://en.unesco.org/themes/information-preservation/digital-heritage/concept-digital-preservation
[2] Levels of Digital Preservation https://ndsa.org/publications/levels-of-digital-preservation/
[3] DPC RAM Model: https://www.dpconline.org/digipres/dpc-ram
[4] LIBNOVA-led Consortium selected for the ARCHIVER Project https://www.libnova.com/libnovas-consortium-selected-for-the-cern-led-research-project-to-build-the-next-generation-research-data-preservation-solution/
About LIBNOVA
LIBNOVA’s mission is to safeguard the world’s research and cultural heritage. Forever.
Year after year, LIBNOVA has been pushing the boundaries of what is possible in digital preservation, bringing innovations that empower the organizations to preserve their content in an easier and more efficient way. The LIBSAFE platform covers several Digital Preservation needs and is an advanced OAIS-aligned digital preservation software.
LIBNOVA was founded in 2009, has offices in the US and Europe and is now present in 15 countries with presence in the academic, cultural heritage and research communities. LIBNOVA Research Labs (2017) manages all research initiatives for the company.
Customers like the British Library, HILA Stanford University, the EPFL, the University of Oxford and many more already trust LIBNOVA.
More info: https://www.libnova.com/
Antonio Guillermo Martinez
Antonio G. Martinez is the CEO of LIBNOVA, worldwide leader in a digital preservation company. Entrepreneur since he was 17, creating several companies in the technology industry, and in others. LIBNOVA’s success is based on the anticipation of needs and the development of simple products to solve complex problems. It provides technology vision, management skills and experience in software and hardware engineering. Also, his great focus on satisfaction and commitment to customers.
Teo Redondo
Teo Redondo is the CTO and Head of Research & Development at LIBNOVA, where he leads several innovation projects about Digital Preservation solutions for Libraries, Archives and Museums, and Research institutions, and also leads LIBNOVA Research Labs for the areas of future functionalities, most around implementing Artificial Intelligence techniques for better handling of research data and content.
Maria Fuertes
Maria Fuertes is the Head of Marketing at LIBNOVA, where she has specialized in analysing the needs of the digital preservation community and works closely with the innovation department to develop the products that customers need and to offer the community the most advanced digital preservation platform.