Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available). To only see the sessions for 3 May's Online Day, select "Online" for location.

Please note that all times are shown in the time zone of the conference. The current conference time is: 27th Apr 2024, 11:59:16pm CEST

 
Only Sessions at Location/Venue 
 
 
Session Overview
Session
SES-04 (PANEL): SOLRWAYBACK: BEST PRACTICE, COMMUNITY USAGE & ENGAGEMENT
Time:
Thursday, 11/May/2023:
1:30pm - 2:30pm

Session Chair: Thomas Langvann, National Library of Norway
Location: Theatre 2


Show help for 'Increase or decrease the abstract text size'
Presentations

SolrWayback: Best practice, community usage and engagement

Thomas Egense1, László Tóth2, Youssef Eldakar3, Sara Aubry4, Anders Klindt Myrvoll1

1Royal Danish Library (KB); 2National Library of Luxembourg (BnL); 3Bibliotheca Alexandrina (BA); 4National Library of France (BnF)

Panel description

This panel will focus on the status quo of SolrWayback, implementations of SolrWayback and where it's heading in the future, including the growing open source community adapting SolrWayback and contributing to developing the tool, making it more resilient.

Thomas Egense will give an update on the current development and the flourishing user community and some thoughts on making SolrWayback even more resilient in the future.

László Tóth will talk about the National Library of Luxembourg (BnL) development of a fully automated archiving workflow comprised of the capture, indexing and playback of Luxembourgish news websites. The solution combines the powerful features of SolrWayback such as full-text search, wildcard search, category search and mre, with the high playback quality of PyWb.

Youssef Eldakar will present the way Solwayback have enhanced the way researchers can search for content and view the 18 IIPC special collections and also bring up some considerations about scaling the system.

Sara Aubry will present how the National Library of France (BnF) has been using SolrWayback to give researcher teams the possibility to explore, analyze and visualize specific collections. She will also share how BnF contributed to the application development, including the extension of datavisualisation features.

Thomas Egense: Increasing community interactions and the near future of SolrWayback

During the last year, the number of community interactions such as direct email questions, bugs/ feature requests posted on github jira, has increased every week. It is indeed good news that so many Libraries/Institutions or researchers already have embraced SolrWayback, but to keep up this momentum more community engagement will be welcomed for this open source project.

By submitting a feature request or bug report on GitHub you will help prioritize which will benefit the most, do not hold back. More programmers for backend(Java) or frontend (GUI) would speed up the development of SolrWayback.

Recently BnF helped improve some of the visualization tools by allowing shorter time intervals instead of years. For newly established collections this is a much more useful visualization. Is it a good example of the different need for new collections just 1 year old compared to collections with 25 years of web harvests. So it was not in our focus though it was a very useful improvement.

In the very near future I expect that more time will be used on supporting new users attempting to implement SolrWayback. Also the hybrid SolrWayback combined with PyWb for playback seems to be the direction many choose to go. And finally large collections will run into a Solr scaling problem that can be solved by switching to SolrCloud. There is a need for better documentation and workflow support in the SolrWayback bundle for this scaling issue.

László Tóth: A Hybrid SolrWayback-PyWb playback system with parallel indexing using the Camunda Workflow Engine

Within the framework of its web archiving programme, the National Library of Luxembourg (BnL) develops a fully automated archiving workflow comprised of the capture, indexing and playback of Luxembourgish news websites.

Our workflow design takes into account several key features such as the efficiency of crawls (both in time and space) and of the indexing processes, all while providing high quality end user experience. In particular, we have chosen a hybrid approach for the playback of our archived content, making use of several well-known technologies in the field.


Our solution combines the powerful features of SolrWayback such as full-text search, wildcard search, category search and so forth, with the high playback quality of PyWb (for instance its ability to handle complex websites, in particular with respect to POST requests). Thus, once a website is harvested, the corresponding WARC files are indexed in both systems. Users are then able to perform fine-tuned searches using SolrWayback and view the chosen pages using PyWb. This also means that we need to store our indexes in two different places: the first is within an OutbackCDX indexing server connected to our PyWb instance, the second is a larger Solr ecosystem put in place specifically for SolrWayback. This parallel indexing process, together with the handling of the entire workflow from start to finish, is handled by the Camunda Workflow Engine, which we have configured in a highly flexible manner.


This way, we can quickly respond to new requirements, or even to small adjustments such as new site-specific behaviors. All of our updates, including new productive tasks or workflows, can be deployed on-the-fly without needing any downtime. This combination of technologies allows us to provide a seamless and automated workflow together with an enjoyable user experience. We will present the integrated workflow with Camunda and how users interact with the whole system.


Youssef EldakarWhere We Are a Year Later with the IIPC Collections and Researcher Access through SolrWayback ”

One year ago, we presented a joint effort, spanning the IIPC Research Working Group, the IIPC Content Development Working Group, and Bibliotheca Alexandrina, to republish the IIPC collections for researcher access through alternative interfaces, namely, LinkGate and SolrWayback.


This effort aims to re-host the IIPC collections, originally harvested on Archive-It, at Bibliotheca Alexandrina with the purpose of offering researchers the added value of being able to explore a web archive collection as a temporal graph with the data indexed in LinkGate, as well as search the full text of a web archive collection and run other types of analyses with the data indexed in SolrWayback.


At the time of last year's presentation, the indexing of 18 collections and a total compressed size of approximately 30 TB for publishing through both LinkGate and SolrWayback was at its early stage. As part of this panel on SolrWayback, one year later, we present an update of what is now available to researchers after the progress made on indexing and tuning of the deployment, focusing on showcasing access to the data through the different tools found in the SolrWayback user interface.


We also present a brief technical overview of how the underlying deployment has changed to meet the demands of scaling up to the growing volume of data. We finally share thoughts on future next steps. See the republished collections at https://iipc-collections.bibalex.org/ and the presentation from 2022.

Sara Aubry: SolrWayback at the National Library of France (BnF) : an exploration tool for researchers and the web archiving team engagement to contribute to its evolution

With the opening of its DataLab in October 2021 and the Respadon project (which will also be presented during the WAC), BnF web archiving team is currently concentrating on the development of services, tools, methods and documentation to ease the understanding and appropriation of web archives for research. The underlying objective is to provide the research community, along with information professionals, with a diversity of tools dedicated to the building, exploring and analysis of web corpora. Among all tools we have tested with researchers, SolrWayback has a particular place because of its simplicity to handle and its rich functionalities. Beyond a first contact with the web archives, it allows researchers to question and analyze the focused collections to which it gives access. This presentation will focus on researcher feedback using SolrWayback, how the application promotes the development of skills on web archives, and how we accompany researchers in the use of this application. We will also present how research use and feedback has led us to contribute to the development of this open source tool.




 
Contact and Legal Notice · Contact Address:
Privacy Statement · Conference: IIPC WAC 2023
Conference Software: ConfTool Pro 2.6.149
© 2001–2024 by Dr. H. Weinreich, Hamburg, Germany