Methodology

Начин истраживања

Način istraživanja


Beginning of Research – Text Gathering and Standardization

Of course, the first part of starting the research for any research project in literature and language lies in finding the exact texts for study. In consultation with librarians at the University of Pittsburgh, I found multiple printed and digital resources for written texts (citations can be found on the “bibliography” page). From there, the texts were written down into an XML document, which was then transformed using Regular Expressions (RegEx) in order to contain every word in an <origin> element, each line in a <l> element, and each stanza in a <lg> element. For texts from sung resources, I, with some help from Dr. Ljiljana Đurašković, transcribed the text by ear, and then underwent the same RegEx based transformation. All of these texts are written into the XML document in Bosnian Cyrillic, in order to make the XSLT transformation to Latin easier. Cyrillic to Latin is always one to one or one to two, but Latin to Cyrillic, being one or two to one, is more complex.

From there, each word was researched to find its linguistic origin, and then given an attribute value (@lang=””) according to this linguistic origin, as well as the general information about the text being written in a <meta> element before the <body> which contains the text of the poem. This XML document is then validated with the use of a Relax NG schema, in order to standardize each XML document.

Creating the Website

A major part of the digital humanities discipline lies in the use of digital tools, not just to examine a text, but to make the text more accessible using digital tools. This, in this case, is the usage of a website, hosted and created through github. There were multiple elements of this website which make the functioning more complex but make the user experience better. The first of these is the use of XSLT to make each part of the text written in Cyrillic also usable in Latin letters.

Often, the Western-European-centricity of research fields leads to texts being studied and examined by English speakers, or French speakers, or German speakers and then that research is presented in those languages, rather than the language of the texts examined, or the language of the culture from which the texts come. I want to avoid this problem by writing all of the examination of the texts and the analysis bilingually. This way, someone could use the entire website in Bosnian, and either in Latin or Cyrillic letters, or in English.

Final Examinations

The final part of the analysis in the texts is the creation of statistics for each of the poems individually and also aggregate statistics, and then displaying them as a Scalable Vector Graphic (SVG). These transformations are done through an XSLT document as a transformation scenario. These graphs are the final step before writing the analysis of the data.

Github

See the code