← Back to Projects

ATLAS: Article Tracking, Linking and Analysis of Swedish Encyclopedias
PythonPyTorchScikit-learnJupyter NotebookVector DatabaseData Analysis
A Natural Language Processing project developed in the EDAN70 course at LTH, focusing on entity linking and matching in Swedish historical encyclopedias. The project serves as a pipeline for processing historical encyclopedias, with the goal to be able to use the result in analyzing how knowledge is distributed across time.
Key Features
- ✓Entity recognition and linking through fine-tuning BERT models
- ✓Custom data processing pipelines for entity extraction and classification
- ✓Vector database used with similarity search for cross edition linking and wikidata linking
Challenges & Solutions
- ↳Navigating non-structured data with OCR errors and poor proofreading
- ↳Developing a model for extracting headwords