Introduction to GRETIL and associated technologies

by Maximilian Mehner, M.A. (Philipps University Marburg)

Digital Literacy / Literacies

grep -hoR "\w\+\( and \w\+\)\? literac[iy]\(es\)\?" dhq/xml | sort | uniq

Manually filtered results:

| algorithmic literacy     | multimodal literacy    |
| cod(e/ing) literacy      | new literacies         |
| computation(al) literacy | software literacy      |
| computer literacy        | spatial literacy       |
| critical literacy        | statistical literacy   |
| data literacy            | technical literacy     |
| DH literacy              | technological literacy |
| digital literacy         | transmedia literacy    |
| hypermedia literacy      | visualization literacy |
| information(al) literacy | visual literacy        |
| media literacy           | web literacy           |

Digital Literacies – discussion points

critical, but focused on what?

  • broad pedagogy vs. institutional
    • trans-medial vs. specific
    • surface-level reading vs. writing (code)
  • systematic training vs. exploration incentives
  • FOSS vs. corporate proprietary options

Cf. Spence 2020.

Digital Humanities (DH)

DH are perceived/used … (Thaller in Jannidis 2017)

  1. by most scholars as more efficient means for conventional research questions or
  2. by some scholars as entirely new methods that expand and redefine research questions.

DH apply their research methods to … (Spence 2020)

  1. digitised surrogates of analogue content and
  2. born digital content.

The Digital & the Humanities in time

1949 Roberto Busa collaborates with IBM on Index Thomisticus
1964 Linguistic Computing Centre in Cambridge
1966 Journal: Computers and the Humanities
1968 Statistical Package for the Social Sciences (SPSS)
1973 Journal: Literary and Linguistic Computing
1978 TUebinger System von TExtverarbeitungs-Programmen (TU-STEP)
1985 rise of Personal Computers (PC) and Email
1987 Text Encoding Initiative (TEI)
1993 World Wide Web goes public
2007 Online journal: Digital Humanities Quarterly

Cf. Thaller in Jannidis 2017, p. 3–12.

1949 Roberto Busa collaborates with IBM on Index Thomisticus
1964 Linguistic Computing Centre in Cambridge
1966 Journal: Computers and the Humanities
1968 Statistical Package for the Social Sciences (SPSS)
  quantitative turn
1973 Journal: Literary and Linguistic Computing
1978 TUebinger System von TExtverarbeitungs-Programmen (TU-STEP)
1985 rise of Personal Computers (PC) and email
  computing for communication
  desktop publishing; digital editions on CD-ROM
1987 Text Encoding Initiative (TEI)
1993 World Wide Web goes public
  common, content-independent interface
  archives and libraries get involved
2007 Online journal: Digital Humanities Quarterly

GRETIL – goals and development

= Göttingen Register of Electronic Texts in Indian Languages

Cf. fact sheet (2017).

exemplary processing routines

… for a linguistic research design

Figure 1: Teich/Fankhauser in Flanders 2019, p. 238.

workflow until 2016

http://gretil.sub.uni-goettingen.de/gretilbk.htm#Manu

# __        __            _ ____            __           _   
# \ \      / /__  _ __ __| |  _ \ ___ _ __ / _| ___  ___| |_ 
#  \ \ /\ / / _ \| '__/ _` | |_) / _ \ '__| |_ / _ \/ __| __|
#   \ V  V / (_) | | | (_| |  __/  __/ |  |  _|  __/ (__| |_ 
#    \_/\_/ \___/|_|  \__,_|_|   \___|_|  |_|  \___|\___|\__|

1. Kodierung analysieren und konvertieren [0/2]
   - [ ] analysieren: UTF-8?
   - [ ] konvertieren
     # alt+F3 # -> "conv_sep" -> im Untermenü auswählen (z.B. *utf8-CSX)
2. Markup 1 [0/5]
   - [ ] Seitennummer fett
     # alt+F12 # -> "1"
   - [ ] Kopfzeile kursiv
     # alt+F12 # -> "2"
   - [ ] Fußnotenapparat redline + letzten manuell
     # alt+F12 # -> "3" + # strg+F8 # -> "redline"
   - [ ] redline verstecken
     # alt+F12 # -> "4"
   - [ ] hidden ausblenden
     # alt+F5 # -> "7", "
3. Cleanup [0/5]
   - [ ] "°", "<", ">" raus
   - [ ] überzählige Leerzeichen raus [0/3]
     # strg+F10 # -> "ref" (edit ref); Protokoll umschalten: # strg+bildUP # -> "3"
     - [ ] am Zeilenanfang
       # "[HRt]  [?]"
     - [ ] in der Zeilenmitte
       # "[?]  [?]"
     - [ ] am Zeilenende
       # "   [HRt]"
   - [ ] Stropheneinrückung einheitlich (5, 10, 15 Blanks)
     # alt+F12 # -> "bz" oder "bw"
   - [ ] Absatzeinrückung einheitlich (5 Blanks)
     # alt+F12 # -> "bz" oder "bw"
   - [ ] mehrspaltige Fußnoten umbrechen
     # alt+F12 # -> "fn "	
4. Markup 2 [0/4]
   - [ ] sonstige Nummerierung markieren
   - [ ] Fußnoten [0/4]
     - [ ] Strophen und Prosa markieren
       # alt+F10 # "refptsp0" für (15-, 10-,) 5-, 0er Absätze absteigend
       # Struktur der Markierung:
|----------+-------------+----------+--------------+----------+-------------+----------+-----------------------|
| Mark. -2 |             | Mark. -1 |              | Mark + 1 |             | Mark + 2 | Anwendungsbereich     |
|----------+-------------+----------+--------------+----------+-------------+----------+-----------------------|
|----------+-------------+----------+--------------+----------+-------------+----------+-----------------------|
| 12,100   | [evtl. Nr.] | 12,0     | Absatzbeginn | 12,102   | [evtl. Nr.] |          |                       |
|          |             | 12,101   | Fließtext    | 12,102   |             |          | Prosa                 |
|          |             |          | Absatzende   | 12,10    |             | 12,110   |                       |
|----------+-------------+----------+--------------+----------+-------------+----------+-----------------------|
| 12,100   |             | 12,0     | Zeile        | 12,10    |             | 12,110   | Strophen, Einzelzeile |
|----------+-------------+----------+--------------+----------+-------------+----------+-----------------------|
     - [ ] Fußnoten in superscript
       # alt+F12 # -> "7"
     - [ ] verstecken und ausblenden
       # alt+F12 # -> "8"
     - [ ] restliche Nummern suchen
       # alt+F12 # -> "NR"
   - [ ] je 5 Blanks zu "harten Blanks"
     # alt+F2 # -> " " mit # pos1+leer  # ersetzen
   - [ ] evtl. Kopfzeile vervollständigen
     # alt+F12 # -> "_hl" und "_hr"
5. Präambel [0/9]
   - [ ] Titel
   - [ ] Inhalt verschlagworten
   - [ ] Based on ...
   - [ ] Input by ...
   - [ ] Copyright
   - [ ] Notice
   - [ ] Additional Notes
   - [ ] Versionsinformation
   - [ ] Structure of References
6. Erzeugung der endgültigen WP-Versionen (z.B. aufgrund von "..._C.09") [0/2]
   - [ ] original layout (O) [0/7]
     - [ ] versteckten Text anzeigen und Marke "hidden" löschen
     - [ ] 12er- und 4er-Zeichensatz löschen
     - [ ] nach "<" und ">" prüfen (wegen HTML)
     - [ ] WP-Auszeichnungen ersetzen
     - [ ] Auszeichnungstabelle und Versionsinfo bearbeiten
     - [ ] Datum einfügen
     - [ ] als WP 5.1-Datei sichern: "...OC."
   - [ ] plain text version (P) [0/9]
     - [ ] versteckten Text löschen
       # alt+F12 # -> "09"
     - [ ] Fließtext erzeugen
       # alt+F12 # -> "f1"
     - [ ] Straddle-Nachricht in redline
       # alt+F12 # -> "f2"
     - [ ] 12er- und 4er-Zeichensatz löschen
       # alt+F12 # -> "_x" und "_y"
     - [ ] nach "<" und ">" prüfen (wegen HTML)
     - [ ] WP-Auszeichnungen ersetzen
       # alt+F3 # -> "formausz"
     - [ ] Auszeichnungstabelle und Versionsinfo bearbeiten
     - [ ] Datum einfügen
       # alt+F3 # -> "GRETdate"
     - [ ] als WP 5.1-Datei sichern: "...PC."
       # F10 #
	 # F10 #
	 # alt+F3 # -> "GRETdate"
	 # alt+F3 # -> "formausz"
	 # alt+F12 # -> "_x" und "_y"
	 # alt+F5 #, # alt+F2 # -> "HiddenOn" zu ""
7. Ausgabedateien mit WP5.1 erzeugen (Voraussetzung: endgültige Version in CSX) [0/5]
   - [ ] Vorlage erzeugen (hier aus "...OC" oder "...PC"-Datei) [0/5]
     - [ ] Druckertreiber auf "neutral" stellen
       # shift+F7 #
     - [ ] Leerzeichen setzen
     - [ ] Datei in Dokument laden
       # F5 #
     - [ ] speichern
       # F10 #
     - [ ] mit Macro unter selbem Namen in WP4.2 speichern
       # alt+F10 # -> "zuc"
   - [ ] CSX [0/4]
     - [ ] Vorspann löschen
     - [ ] mit Macro zu .txt
       # alt+F10 # -> "formtxt"
     - [ ] Zeichenliste zwischen Auszeichnungsindex und Textbeginn
       # Smartkey (Netz Windows): # alt+c (läd .../gretil/_dia_csx)
     - [ ] mit Macro als .txt speichern
       # alt+F10 # -> "zuct"
   - [ ] REE [0/8]
     - [ ] neues Fenster öffnen und REE-Drucker wählen
       # alt+F10 # -> "ree"
       # shift+F7 #
     - [ ] Druckertreiber auf "neutral" zurück stellen
       # shift+F7 #
     - [ ] Vorlage hineinladen
       # F5 #
     - [ ] speichern
       # F10 # -> "...R."
     - [ ] Vorspann löschen
     - [ ] mit Macro zu .txt
       # alt+F10 # -> "formtxt"
     - [ ] Zeichenliste zwischen Auszeichnungsindex und Textbeginn
       # Smartkey (Netz Windows): # alt+r (läd .../gretil/_dia_ree)
     - [ ] mit Macro als .txt speichern
       # alt+F10 # -> "zurt"
   - [ ] UTF-8 [0/6]
     - [ ] neues Fenster und mit Smartkey html-Umgebung einfügen
       # Smartkey (Netz Windows): # alt+ur
     - [ ] Vorlage hineinladen
       # F5 #
     - [ ] Auszeichnungsliste löschen
     - [ ] vom Anfang des hineingeladenen Abschnitts: [0/2]
       - [ ] <br> einfügen
	 # Smartkey (Netz Windows): # ü
       - [ ] Vorspann nach oben ziehen [0/2]
	 # Smartkey (Netz Windows): # ä
	 - [ ] Cursor an den Titel stellen
	   # ENTER
	 - [ ] Titel in html-Header ausbessern
     - [ ] mit Macro zu .htm
       # alt+F10 # -> "formhtm"
     - [ ] manuell als "...u.htm" speichern
       # strg+F5 # -> "1", "1"
   - [ ] Prüfen
8. GRETIL-Website-Dateien [0/3]
   - [ ] 4 Dateien nach folgendem Muster bearbeiten [0/4]
     - [ ] "GRETIL.___" = Hauptsite mit History und Links auf 3 folgenden Dateien
     - [ ] "GRET_CSX.___"
     - [ ] "GRET_REE.___"
     - [ ] "GRET_UTF.___"
   - [ ] alle in WP6.2 öffnen und in der Hauptdatei mit Makro neuen Eintrag erstellen
     # alt+F3 # -> "GR_NEU"
   - [ ] jeder Datei einzeln mit Makro zu html konvertieren
     # alt+F3 # -> "gretsave"

workflow since 2019

http://gretil.sub.uni-goettingen.de/gretil.html#Manu

  1. Transfer to TEI-conforming template GRETILdummy_sa.xml.
  2. Apply XSLT stylesheets with:

    java -jar /usr/share/java/saxon/saxon9he.jar -s:"$1" \ 
         -xsl:xslt2-stylesheets/plain-text.xsl \
         > transformations/plaintext/"${1%.*}.txt"
    java -jar /usr/share/java/saxon/saxon9he.jar -s:"$1" \
         -xsl:xslt2-stylesheets/html.xsl \
         > transformations/html/"${1%.*}.htm"
    
  3. Prepare for upload by running script prep4uplad.sh.
  4. Upload files to the server.

TEI for critical editing

recap

… on descriptive (as opposed to procedural) markup

  • structural information and content is separated from:
    1. procedural information (how to process content) and
    2. renditional information (how to render content).
  • thereby independent of any application or rendering.

elements and attributes for critical editing

3 methods of linking apparatus to text

Witness A 10.85cd:

somasūryavibhedena vinayas tatra kāraṇam

Witness D 10.85cd:

sūryasomavibhedena vinayas tatra kāraṇam

1. location-referenced method, internal

TEI/teiHeader/encodingDesc/:

<variantEncoding method="location-referenced" location="internal"/>

TEI/text/body/:

<l xml:id="_10.85cd">sūryasomavibhedena vinayas tatra kāraṇam
<app>
  <rdg wit="#D">sūryasoma</rdg>
  <rdg wit="#A">somasūrya</rdg>
</app>
</l>

1. location-referenced method, external

TEI/teiHeader/encodingDesc/:

<variantEncoding method="location-referenced" location="external"/>

TEI/text/body/:

<l xml:id="_10.85cd">sūryasomavibhedena vinayas tatra kāraṇam</l>

somewhere else in TEI/text/body/ or in a different file:

<app loc="#_10.85cd">
  <rdg wit="#D">sūryasoma</rdg>
  <rdg wit="#A">somasūrya</rdg>
</app>

2. double-end-point-attached method

TEI/teiHeader/encodingDesc/:

<variantEncoding method="doube-end-point" location="external"/>

TEI/text/body/:

<l xml:id="_10.85cd">sūryasoma<anchor xml:id="_10.85_1"/>
vibhedena  vinayas tatra kāraṇam</l>

<app from="#_10.85cd" to="#_10.85_1">
  <rdg wit="#D">sūryasoma</rdg>
  <rdg wit="#A">somasūrya</rdg>
</app>

3. parallel segmentation method

TEI/teiHeader/encodingDesc/:

<variantEncoding method="parallel-segmentation" location="internal"/>

TEI/text/body/:

<l xml:id="_10.85cd"><app>
  <rdg wit="#D">sūryasoma</rdg>
  <rdg wit="#A">somasūrya</rdg>
</app> vibhedena  vinayas tatra kāraṇam</l>

example with parallel segmentation method

Cf. example.

exercise 1 (15min)

  1. Go to https://teibyexample.org/tools/TBEvalidator.htm or your editor with schema support.
  2. Use the minimal template to sketch up a text with at least three witnesses and some variants,
  3. validate your xml against the TEI-schema,
  4. play around with the messages from the validator by adding and removing elements, attributes and values.

XSLT

= Extensible Stylesheet Language Transformations

  • programming language for manipulating and transforming XML data
  • XPath: expression language for selecting nodes in an XML document
  • typical scenario: automate down-translation of data from strongly modeled formats (like TEI) into more weakly modeled formats

processing

Figure 2: Kelly 2005, p. 6: The XSLT process.

navigating the XML-tree with XPath

  • absolute paths: /root/path/to/some/element
  • relative paths with 13 axes, can be indexed with [int]:
    1. self:: or .
    2. child:: or /
    3. descendant:: or //
    4. descendant-or-self::
    5. parent:: or ../
    6. ancestor::
    7. ancestor-or-self::
    8. preceeding::
    9. preceeding-sibling::
    10. following::
    11. following-sibling::
    12. attribute:: or @
    13. namespace::

wildcards and functions

  • * for any string, e.g. @attr only matches the attribute named ‘attr’, @* matches all attributes of an element,
  • node() matches elements, text, comments, and processing instructions (i.e. everything except attributes),
  • text() matches text content,
  • last() returns a number equal to the position number of the last node in the current context.

building on identity transformation

exercise 2 (20min)

  1. Go to http://xsltransform.net/ or your editor with XSLT support / + separate XSLT processor.
  2. Use your XML from the previous exercise and the XSLT templates provided and
  3. try to achieve the following result tree step by step:
    1. Output only <title>, <author>, and <listWit> elements of the <teiHeader>,
    2. output only the last child (<rdg> or <lem>) of each <app> element,
    3. remove the @wit attribute from the resulting <rdg> or <lem> elements.

pick and choose

Cf. TEI example styled with XSLT templates.

exercise 3 (10min)

  1. Use your XML from the previous exercises and the XSLT templates provided and
  2. try to produce the following:
    1. Set a variable for one particular witness-reference (e.g. "#C"),
    2. write the value of this variable under the line specifying "Text: …",
    3. output all Variants of that witness under "Variants: ". Did you miss anything? Why?

practical examples and outlook

references

  • Flanders 2019: The Shape of Data in the Digital Humanities. Modeling Texts and Text-based Resources. London.
  • Jannidis 2017: Digital Humanities. Eine Einführung. Stuttgart.
  • Kelly 2015: XSLT Jumpstarter. Raleigh.
  • Spence 2020: “English language and digital literacies”. IN: Adolphs (ed.): The Routledge Handbook of English Language and Digital Humanities. London, pp. 472–493.
  • Online resources: