Tag Archives: Michiel Overtoom

Automating TEI encoding (using the Overtoom/Jockers Python script)

We all know that appropriate standards are required if electronic textual scholarship is to become precisely what it claims to be – scholarly. Enter the TEI, the various debates on its use, and the rest is history – we now have a standard for electronic textual encoding. What next? Well, encoding, what else? Textual encoding is a tedious process, particularly if you are working with a large corpus. Thankfully, Michiel Overtoom set about writing a Python script to automate the conversion of Project Gutenberg plain texts files to a format more suited to his own purposes (this included removal of the Gutenberg boilerplate). Stanford’s Matt Jockers (@mljockers) took this a step further in terms of textual scholarship, adapting Overtoom’s script so that it converts the Gutenberg text to a TEI-compliant XML file.

Continue reading