Our Goal

We aim to uncover and extract hidden insights about slavery, the experiences of Freedmen, the obstacles they faced, and the significant contributions they made to society and their cause. We deployed a sophisticated Natural Language Processing (NLP) and Machine Learning (ML) pipeline that played a pivotal role in our exploration.

The Process

The process began with Optical Character Recognition (OCR) to convert the textual content from scanned images into machine-readable text files.

Scan

original document

into a

digital image

Transcribe

Digital Image

into a

text file

Highlight the

key words

(Annotation)
Highlighted

key data

used as input values

Technical Overview

We annotated, or labeled the appropriate words to the corresponding data type, such as names, place, dates, events, and relationships. Formatting the historical documents into these constraints is the most time-consuming process of the entire operation, but it’s crucial. This will be used as input data and the more accurate this data is the more accurate the analysis will be. NLP techniques such as named entity recognition (NER) and part-of-speech tagging, were employed to identify key elements.
For the structured documents, like forms, GPT-3 would do this entire annotation. For very loosely structured documents, people would go through and manually pick out the important key words for a handful of documents.
This in turn would be used to train GPT-3 or GPT-4 to go through and annotate the rest of the documents. The heart of our pipeline involved the use of GPT-3, a state-of-the-art language model which allowed us to delve deep into the content, extract meaningful narratives, and generate summaries that highlighted key historical events, challenges, and contributions of Freedmen.
In addition to text generation, we employed information extraction techniques to identify and capture specific historical details from the documents. This included the extraction of dates, locations, and significant events, as well as the relationships between different individuals mentioned in the text.

DISCLAIMER:

The data presented here, sourced from the Smithsonian's official website and other external sources related to the Freedmen Bureau, remains the sole property of the respective owners. While our analyses and insights are independently generated, we do not claim ownership of the underlying data. We prioritize ethical standards and intellectual property rights. Feel free to share and utilize the insights provided here for educational or non-commercial purposes. For any inquiries or concerns, please contact us. Your use of this website implies acceptance of these terms.