Skip to content

bnl_ground_truth_newspapers_before_1878 #79

@ymaurer

Description

@ymaurer

A URL for this dataset

https://data.bnl.lu/data/historical-newspapers/

Dataset description

33.000 transcribed text lines from historical newspapers (before 1878) along with the cropped images of the original scans

Text line based OCR
19.000 text lines in Antiqua
14.000 text lines in Fraktur
Transcribed using double-keying (99.95% accuracy)
Public Domain, CC0 (See copyright notice)
Best for training an OCR engine

The newspapers used are:

  • Le Gratis luxembourgeois (1857-1858)
  • Luxemburger Volks-Freund (1869-1876)
  • L'Arlequin (1848-1848)
  • Courrier du Grand-Duché de Luxembourg (1844-1868)
  • L'Avenir (1868-1871)
  • Der Wächter an der Sauer (1849-1869)
  • Luxemburger Zeitung (1844-1845)
  • Luxemburger Zeitung = Journal de Luxembourg (1858-1859)
  • Der Volksfreund (1848-1849)
  • Cäcilia (1862-1871)
  • Kirchlicher Anzeiger für die Diözese Luxemburg (1871-1878)
  • L'Indépendance luxembourgeoise (1871-1878)
  • Luxemburger Anzeiger (1856)
  • L'Union (1860-1871)
  • Diekircher Wochenblatt (1837-1848)
  • Das Vaterland (1869-1870)
  • D'Wäschfra (1868-1878)
  • Luxemburger Bauernzeitung (1857)
  • Luxemburger Wort (1848-1878)

Dataset modality

Mixed

Dataset licence

Creative Commons Public Domain Dedication and Certification

Other licence

No response

How can you access this data

As a download from a repository/website

size of dataset

500MB-2GB

Confirm the dataset has an open licence

  • To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

opendata@bnl.etat.lu

Metadata

Metadata

Assignees

No one assigned

    Labels

    datasetDataset to be added

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions