Metadata enrichment tools

Software in the enrichment tools

Three main metadata enrichment tools are made available to be used before the process of aggregating content to Europeana. These tools are:

  • NERD Annotator
    • Perform Named Entity Recognition and Disambiguation (NERD)
    • Better performance on longer texts with rich context
    • Plug-n-play solution, no fine-tuning needed
  • Context-based Topic Analysis Tool
    • Utilizing state-of-the-art BERT-based deep learning models (for word and sentence embeddings/representations) 
    • Cultural Heritage domain pre-training, improving topic coherence and context analysis
  • Linked-Data Annotator
    • Link text to Thesaurus/Vocabulary terms
    • Smart String Matchings utilizing state-of-the-art NLP technologies
    • Improved time performance
    • Use existing thesaurus/vocabulary or create custom from a list of keywords and the respective URIs
WEAVE keywords list for automatic enrichments

These tools benefit from the use of Wikidata entries to enrich the metadata, and for this reason a WEAVE-specific keywords list was created with links to Wikidata and also Getty AAT where available.

The terms were collected by the content providers, so as to reflect the different terminologies relating to the different types of cultural heritage that are part of the WEAVE collections. The WEAVE keywords list was used to support the automatic enrichments. 

Enrichment validation

In the scope of the automatic enrichment task, the collections aggregated by WEAVE content providers were imported in the tools and used to perform the enrichment. A round of validation of the automatically generated enrichments was then provided by the content partners on their respective collections, so to accept or reject annotations that the machine recognized for each record.

As a result of the validation, all accepted enrichments will be re-aggregated to the respective collections and republished in Europeana.