IFLA 2014 Satellite Meeting
Le web de données en bibliothèque, du projet à la pratique

Organisé par la Section technologie de l'information
et le Groupe d'intérêt spécialisé sur le web sémantique
Bibliothèque nationale de France, Paris
14 août 2014, 9h - 17h

Avec le soutien de

Programme et intervenants

La conférence propose des sessions plénières ouvertes à tous. En parallèle, deux ateliers sont organisés à destination d'un public cible, avec un nombre de places limitées.

Cette conférence est en anglais sans traduction simultanée.


9h – 9h30  Accueil

9h30 – 9h45 Discours de bienvenue par la BnF, la Section technologie de l’information de l’IFLA et le Groupe d’intérêt spécialisé sur le web sémantique (SWSIG)

9h45 – 11h15 Session plénière
They made it happen... Library linked data success stories

  • Linking Libraries in The European Library and Europeana par Valentine Charles, Nuno Freire et Antoine Isaac
    The European Library and Europeana have both an extensive experience in aggregating metadata for bibliographical records or digital resources from the cultural heritage institutions of Europe. For both of them meeting the challenges offered by multilingual and heterogeneous data is an ongoing effort. The growth of the Semantic Web and the more generalised publication of knowledge organization systems as Linked Open Data offer the possibility to make these services truly multilingual.
    This paper shows how The European Library and Europeana exploit the semantic relations and translations offered by knowledge organisations systems in order to solve the problem of data integration at a European scale. It also demonstrates the potential of Linked Open Vocabularies for enabling multilingual search and retrieval services.

  • We grew up together: data.bnf.fr from the BnF and Logilab perspectives par Agnès Simon, Sébastien Peyrard, Vincent Michel et Adrien Di Mascio
    Three years after the launching of the Linked Open Data site data.bnf.fr, we can report on the experience of the project, from the cross perspective of a public institution, the National library of France (BnF) and a company, Logilab. Starting like a small innovative project, with few data and a small team, data.bnf.fr is now becoming up-running service, integrating progressively all the resources of the BnF catalogues and broadly consulted.
    This paper shares what made data.bnf.fr a success story: librarians and IT services from the BnF and Logilab programmers working together according to the agile software development; using the free software CubicWeb based on a relational database; relying on the long term cataloguing and diffusion policy of the library.
    Yet we are now tackling technical, organizational and stategic issues concerning scalability, dependencies, stability, but also knowledge transfer to new comers on the project. We are now considering the project in a long term perspective, integrating it to the BnF routines and issues, but also keeping on innovating.

  • Web NDL Authorities: Authority Data of the National Diet Library, Japan, as Linked Data par Tadahiko Oshiba et Kazuo Takehana
    In January 2012, the National Diet Library, Japan, (NDL) launched Web NDL Authorities, a system capable of providing NDL authority data as Linked Data. The NDL had published in book form its subject headings list since 1964 and its author name authority since 1979. It also began providing the latter in a MARC format beginning in 1997. Web NDLSH, a Web version of the National Diet Library Subject Headings (NDLSH) in the context of the Semantic Web, was first published in 2010. After this, the NDL expanded Web NDLSH to provide both name authority data and subject authority data as Linked Data, and this new system is known as Web NDL Authorities. Web NDL Authorities has been exchanging links with the Virtual International Authority File (VIAF) since start of the NDL's participation in the VIAF in October 2012.
    After providing a brief history of the NDL's authority data, this paper summarizes the why and how of providing NDL's authority data as Linked Data via Web NDL Authorities. This paper also describes links from Web NDL Authorities to other authority data such as the VIAF and the LCSH. This service can be accessed at http://id.ndl.go.jp/auth/ndla.

11h15 – 11h30 Pause

11h30 – 12h45 Session plénière (en parallèle à l'atelier pour débutants ci-dessous)
Perspectives for developing linked libraries and related applications

  • An unbroken chain: approaches to implementing Linked Open Data in libraries; comparing local, open source, collaborative and commercial systems par Lukas Koster et Rurik Greenall
    This paper compares methods for libraries to interact with the Web of Data by assessing the benefits and risks associated with local development, free-and-open-source software, collaborative and commercial solutions. Through a number of case studies, we provide insight into how each approach can be implemented and the extent to which these approaches can be reconciled.

  • Methodological Proposals for Designing Federative Platforms in Cultural Linked Open Data: the example of MoDRef par Antoine Courtin et Jean-Luc Minel
    As part of the on-going Labex project "Past in the present", our proposal aims at highlighting the organizational issues of Linked Data projects that have to deal with pluri-institutional contexts, among which libraries. First, we will discuss what is at stake. Second, we will present a methodology based on the building of several diagrams which highlight technical, conceptual, and organizational obstacles. We will also address the issues of designing and producing an information system intended to ensure the transmission of scientific skills, the exploitation of major vocabularies, associated to specific vocabularies, by foreign institutions and the harmonizing or building of bridges between heterogeneous descriptions.

  • Internal and external interoperability of books metadata using work concept and semantic web technologies par Pierre Boudigues, Joëlle Aernoudt, Gautier Poupeau et Stéphane Bizeul
    Metadata is a key feature of book distribution workflows in general, and e-books in particular. Traditional players in the book industry have to take into account the production workflows, quality and scope of their metadata, in order to be able to keep a leading role in the digitization process unlike what happened for the music industry. This issue has to be addressed when designing the digital publishing workflow. Metadata management happens at every step of the process and involves every player (publishers, book- sellers, librarians...) each at their own level. Their successful collaboration relies on the use of standards, identifiers and vocabularies, all required in order to reach the necessary interoperability level for exchanging, linking, and using the data they produce.
      1) Features of metadata in the book industry: the work concept
      Traditional bibliographic records are no longer sufficient to provide useful metadata for the digital world. New models like FRBR, centered on the notion of Work and links between entities, are required. Use case: the Work concept, based on the FRBR specification, developed by Electre and the migration of the traditional bibliographic database.
      2) Internal interoperability: Sharing and aggregating data in the book industry
      Several types of metadata go beyond the traditional bibliographic description:
      - metadata related to the audience and success of the book like critics (newspaper articles, awards, media events) and user reviews (social networks, comments)
      - metadata related to the content (places, stories, characters...)
      - metadata related to the author (biography, book signing events...).
      The different types of metadata mentioned above are not created at the same time and rely on different producers at different steps in the workflow. Aggregating and linking these data requires the common use of standards and identifiers (ISBN, ISTC, ISNI...). The example of Electre's data warehouse using Semantic Web technologies shows the downfalls of current practice in the book industry in France and demonstrates the gap that has to be bridged in order to successfully combine and use the data at a global level.
      3) External interoperability: Linking book data on the World Wide Web
      It is important that the data thus aggregated and combined is shared on the web outside the book industry, so that external users can take advantage of highly structured information provided by the traditional players in the field. Then book data producers can link their data to other datasets already available on the Web in order to create new services with real added value. We will demonstrate this topic with the external enrichment of our Data warehouse by collecting open data contents (RDF data from BnF's Website, data.bnf.fr, Dbpedia, Wikidata, Wikimedia Commons).
      It is necessary to build a common ecosystem for producers and users in the digital publishing workflow, including a set of standards and identifiers. Semantic Web and Linked Data standards provide a suitable framework for linking successfully different types of data from different producers and silos. The business model for metadata distribution still has to be explored in this perspective.

11h30 - 12h45 Atelier pour débutants (en parallèle à la session pléinière ci-dessus)
40 participants maximum

Atelier animé par Richard Wallis

Public visé
Cet atelier s'adresse aux bibliothécaires motivés par le web de données mais qui n’en maîtrisent pas encore la technique. Les participants viendront en majorité du monde des bibliothèques, mais les représentants d'autres types d'organisation, publiques ou commerciales, sont les bienvenus.

- Comprendre les avantages de la création et de l’utilisation du web de données en bibliothèque
- Apprendre les notions techniques de base afin de pouvoir aborder plus facilement les présentations plus techniques

Thèmes qui pourront être abordés
Au début de la session, quelques sujets de discussion seront sélectionnés selon les cas pratiques et les centres d'intérêt prioritaires des participants, parmi lesquels :
- Qu’est-ce que le web de données et pourquoi est-ce important pour les bibliothèques ?
- Introduction à des notions de base comme RDF, les URI, les vocabulaires et les ontologies
- Exemples de jeux de données disponibles : DBpedia, VIAF, etc.
- Exemples d’applications concrètes et bénéfices qui peuvent en être retirés

12h45 – 13h45 Pause déjeuner
Un buffet est offert à tous les participants grâce à OCLC, partenaire de l’événement


13h45 – 15h Session plénière (en parallèle à l'atelier pour les encadrants ci-dessous)
Creating, maintaining and using vocabularies for library linked data

  • Making MODS to Linked Open Data: A Collaborative Effort for Developing MODS/RDF par Ray Denenberg, Rebecca Guenther, Myung-Ja Han, Jeff Mixter, Amy L. Nurnberger, Melanie Wacker, Kathryn Pope et Brian Luna Lucero
    Publishing library catalog records as Linked Open Data is a challenge to many libraries because there is no community-driven best practice that each individual library can easily follow and implement into its workflow.
    Publishing library data as Linked Open Data is common practice for many national libraries, notably the British Library, French National Library, and the German National Library as well as for metadata aggregators and service providers, such as Europeana and the Online Computer Library Center (OCLC). However, the ways in which these institutions execute Linked Open Data differs in many aspects. These differences are typically found in the data model used, the granularity of data, and the Linked Open Data sources used in the data, to name a few examples.

    The Metadata Object Description Schema (MODS) RDF Group was formed in late 2013 as virtual working group to test and develop a MODS/RDF ontology. The group is a follow-on to an initiative of the Library of Congress. MODS was originally developed in 2002 to ''give special support to cataloguing electronic resources'' and as an alternative that is less detailed than, although highly compatible with, MARC21.
    For this reason, is has been adopted by a wide variety of users and applications. MODS is also used as a metadata standard to which a library's traditional catalog records can be transformed while maintaining quality and granularity.
    In addition, MODS has proved that its data model and rich semantics can work well in semantic Web environments :
    - MODS can accommodate entity data structure introduced in FRBR
    - MODS has semantics that accommodate URIs as values, in addition to strings.
    The MODS/RDF Group, consisting of Librarians and programmers from a number of libraries (primarily academic institutions, OCLC and the Library of Congress), has been working to develop a MODS/RDF ontology that will allow MODS users to convert their MODS/XML metadata to RDF. The Group also hopes to publish a transformation tool, XSLT, as an end product. Since its first meeting in January 2014, the Group has created an openly viewable GitHub page and members work together to solve the common issues in creating new, as well as using already established, Linked Data semantics that best work for the MODS data model and the information that library catalog records describe.
    This presentation will share the challenges that have been encountered and the progress so far. The Group also would like to draw suggestions and recommendations for future work, especially in conjunction with other linked data work, such as BIBFRAME and Schema.org.

  • Making library Linked Data using the Europeana Data Model par Marko Knepper et Valentine Charles
    Europeana provides a common access point to digital cultural heritage objects across different cultural domains among which the libraries. The recent development of the Europeana Data Model (EDM) provide new ways for libraries to experiment with Linked Data. Indeed the model is designed as a framework reusing various well-known standards developed in the Semantic Web Community, such as the Resource Description Framework (RDF), the OAI Object Reuse and Exchange (ORE), and Dublin Core namespaces. It provides new opportunities for libraries to provide rich and interlinked metadata to the Europeana aggregation.
    However to be able to provide data to Europeana, libraries need to create mappings from the library standard to EDM. This step involves decisions based on domain-specific requirements and on the possibilities offered by EDM. The cross-domain nature of EDM limiting in some cases the completeness of the mappings, extension of the model have been proposed to accommodate the library needs.
    The "Digitised Manuscripts to Europeana" project (DM2E) has created an extension of EDM to optimize the mappings of library-data for manuscripts. This extension is in the form of subclasses and subproperties that further specialize EDM concepts and properties. It includes spatial creation and publishing information, specific contributor and publication type properties and more.
    Furthermore the granularity of the mapping has been extended to allow references and annotations on page level as required for scholarly work. As part of this project the metadata of the Hebrew Manuscripts as well as of the Medieval Manuscripts presented in the Digital Collections of the Frankfurt University Library4 have been mapped to this extension. This includes links to the Integrated Authority File (GND) of the German National Library with further links to the Virtual International Authority File (VIAF).
    Based on this development a new comprehensive mapping from the digitalization metadata format METS/MODS to EDM has been established for all materials of the Frankfurt Judaica in "Judaica Europeana". It demonstrates today's capabilities of the creation of linked Data structures in Europeana based on library catalogue data and structural data from the digitalization process.

  • Versioning Vocabularies in a Linked Data World par Diane Hillmann, Gordon Dunsire et Jon Phipps
    Policies regarding change management in open or public vocabularies used in the context of Linked Open Data have lagged behind those driving other web-based communities of practice. A fresh emphasis on vocabulary management and maintenance has begun to emerge, as the reliance on potentially volatile vocabularies, and the implications of their ongoing growth and change, has begun to permeate the conversation.
    Particularly in libraries, where management of commonly used vocabularies has long been a community wide activity, management of vocabularies has been seen as the realm of larger institutions and organizations. This centralized control has been workable (if slow to evolve to incorporate new needs) so long as data distribution has also been centralized, but this pattern of distribution has become more questionable as a transition to the more open world of Linked Data begins to demonstrate the inflexibility of traditional practices. As more attention shifts to new vocabulary standards and usages outside libraries, researchers and innovative organizations have sought to take advantage of this boom in interest, but unlike librarians, they have little experience in implementation over time.
    Merging the technology of the Semantic Web with the information management experience of libraries seems a reasonable strategy, but better understanding by all of where practices must change is critical.

  • From UNIMARC bibliographic and authority record to Linked Open Data par Mirna Willer et Leonardo Jelenković
    The paper describes results of the research project aimed at publishing bibliographic and authority data of the Croatian Union Catalogue CROLIST, and potentially other catalogues that implement IFLA UNIMARC formats as Linked Open Data (LOD). Problems of mapping UNIMARC records in RDF using available ontological vocabularies are being described. Namely, the choice should have be made regarding the methodology used: whether to map UNIMARC records to a set of published vocabularies such as Dublin Core, ISBD, Bibo, Foaf, etc. – following the mix&match method applied by some of the national libraries (BL, BnF), or to map the data in parallel to all those vocabularies, which published namespaces that are relevant to bibliographic and authority data. In the latter case, the UNIMARC record data would be published by using UNIMARC, ISBD, DC, Bibo, MADS, RDA, EDM, DNB gnd, etc. namespaces allowing the services that "talk" a particular language to reuse the published data conforming to their needs. Additionally, such an approach would, on the one hand, retain the context of the data and the informational value of the UNIMARC vocabularies – rich and lonely, and on the other, enable the reuse of "dumbed-down" data in simpler vocabularies such as DC and Bibo by library and non-library communities and users – poor and popular. Still, the third approach would be to expose one's data in one vocabulary only, whether a locally or internationally published vocabulary, and let other services exploit it by taking advantage of the maps that use the sub-property ladder method, (from a fine granularity element to a coarse granularity element) which is being developed by some of the international standards like ISBD.

    The choice made for the CROLIST data was the second approach, that is, to publish LOD in parallel available vocabularies, as mappings to other vocabularies would be internally controlled in order to manage and ensure the rendering of their contextual informational value.
    Another issue that was dealt with was automatic linking to internal and external LOD sources, such as a local (CROLIST) linked name authority file and external ones such as VIAF, DBpedia, etc. It has been proven that the system's infrastructure will have to be redesigned to provide a linking process that entails reliability and quality control as part of the control of the professional.
    A series of other questions were recognised that would require further research and deliberation. Some of these are: Do we know who the users of our LOD are? Can we envisage the usage our LOD will be put into, and should or need we care? Should or need everything be linked OPEN data? How to ensure the sustainability of our standards, or others' standards used in publishing our LOD? How to maintain the Universal Bibliographic Control, that is, how to become the target LOD source for other libraries, and, by extension, non-library services? Will libraries compete or will they produce information pollution if they all expose their linked open data to the Web of data?

13h45 – 15h Atelier pour les encadrants (en parallèle à la session plénière ci-dessus)
40 participants maximum

Atelier animé par Gildas Illien et Emmanuelle Bermès

Public visé
Cet atelier s'adresse aux décideurs, encadrants et encadrants intermédiaires (avec ou sans expertise technique) qui souhaitent partager leur expérience, leurs bonnes pratiques, leurs questionnements et leurs difficultés dans la mise en place de projets, de produits et de processus pérennes utilisant les standards du web sémantique au sein de leur organisation.
Les participants viendront en majorité du monde des bibliothèques, mais les représentants d'autres types d'organisations, publiques ou commerciales, sont les bienvenus.

- Discuter des questions de gestion et d'organisation propres aux technologies et à l'environnement du web sémantique
- Comparer votre propre expérience à celle d'autres professionnels
- Construire un réseau de managers du web de données en bibliothèque

Thèmes qui pourront être abordés
Au début de la session, quelques sujets de discussion seront sélectionnés selon les cas pratiques et les centres d'intérêt prioritaires des participants, parmi lesquels :
- Promotion : comment défendre un projet web sémantique et valoriser ses bénéfices en interne et à l'extérieur?
- Questions juridiques : licences des métadonnées, propriété, provenance, informations personnelles et autres aspects juridiques
- Financements : combien coûte et que rapporte un service de web de données ?
- Métiers : quels sont les compétences et les profils professionnels rêvés pour construire un projet web de données ?
- Coopération : comment construire des partenariats pérennes et de confiance dans l'environnement du web de données ?
- Publics et services : comment identifier, connaître et servir les usagers finaux (humains ou machines) dans l'environnement du web de données ?
- De l'innovation à l'exploitation : de la start-up à la production, quelles étapes, quels enjeux ?

15h – 15h15 Pause

15h30 – 16h30 Session plénière
Designing Linked Data software and services for libraries

par Teodore Fons, Nicolas Chauvat et Schlomo Sanders

Cette session prendra la forme d’une table ronde durant laquelle des fournisseurs de logiciels et des développeurs expliqueront pourquoi ils s’intéressent au web de données et à son implémentation dans les bibliothèques et dans d’autres institutions, proposeront des retours d’expérience et montreront des outils existants.

16h30 – 17h Conclusion et fin de la journée


© BnF 2014