OCR Text Dataset

To create this dataset, I created a script to retrieve all OCR text files for the collection of N. W. Ayer & Son’s American Newspaper Annual and Directory: A Catalogue of American Newspapers. The dataset is available through the UNT Digital Library.

The Process