← Back to Projects
ATLAS: 	Article Tracking, Linking and Analysis of Swedish Encyclopedias

ATLAS: Article Tracking, Linking and Analysis of Swedish Encyclopedias

PythonPyTorchScikit-learnJupyter NotebookVector DatabaseData Analysis

A Natural Language Processing project developed in the EDAN70 course at LTH, focusing on entity linking and matching in Swedish historical encyclopedias. The project serves as a pipeline for processing historical encyclopedias, with the goal to be able to use the result in analyzing how knowledge is distributed across time.

Key Features

  • Entity recognition and linking through fine-tuning BERT models
  • Custom data processing pipelines for entity extraction and classification
  • Vector database used with similarity search for cross edition linking and wikidata linking

Challenges & Solutions

  • Navigating non-structured data with OCR errors and poor proofreading
  • Developing a model for extracting headwords

Project Screenshots

ATLAS: 	Article Tracking, Linking and Analysis of Swedish Encyclopedias screenshot 1
ATLAS: 	Article Tracking, Linking and Analysis of Swedish Encyclopedias screenshot 2