C07 - Multi-layered annotation of conversation-like narratives in German
Magdalena Repp
Research interest
For project C07, our aim was to investigate German personal and d-pronouns within larger, more naturalistic texts. To achieve this, we conducted a corpus analysis using two excerpts from German novels. The first excerpt was from the coming-of-age novel "Tschick" by Wolfgang Herrndorf, while the second was from the crime novel "Auferstehung der Toten" by Wolf Haas. We chose these two texts for two specific reasons. First, they contain a high number of demonstrative pronouns, making them suitable for our analysis. Second, both texts exhibit a conversation-like narration style. These conversation-like features are intertwined with the perspectival characteristics of the novels, in which the two novels differ. "Tschick" is written from a first-person narrator who is also the main character of the story (an auto-diegetic narrator). Furthermore, "Tschick" is characterized by its dialogue structure, cf. (1). In contrast, "Auferstehung der Toten" is narrated by a third-person narrator who is not a part of the story (a heterodiegetic narrator). However, this narrator is quite prominent, as they frequently provide evaluations of characters and events. Additionally, they often address the reader directly, creating the impression of a direct interaction between the narrator and the reader, cf. (2).
- [“What are you trying to tell me?] [That the water runs from the bottom to the top?“]
[“You have to suck it in.“]
[“Never heard (zero) of gravity?] [It doesn’t run up-wards.“] - [That doesn’t really belong here.] [But it didn’t happen any differently to the Brenner.] [He sits in his hot room] [and is supposed to (zero) think about his work,] [but instead he thinks about his apartment.] [And now pay
Data collection
The annotations were conducted using the web-based, multi-layer annotation software, WebAnno 3.6.7 (Yimam et al., 2014). See Figure 1 for a screenshot of the WebAnno annotation window. Before the annotations, the data was automatically sentence-segmented, with sentence boundaries indicated by sentence-final punctuation. The annotation process involved the simultaneous work of three linguistically trained annotators, all of whom were native German speakers. Both corpora went through multiple rounds of annotation, gradually refining the annotation scheme. As such, no inter-annotator agreement was calculated. The annotation procedure was as follows: First, we annotated sentence segments. Then, we annotated all referring expressions (REs) that referred to an animate referent and added a specification of the RE type for each annotated RE. Following that, we added information on grammatical and thematic roles to each annotated RE. Finally, we marked the referential chains between the previous antecedent and the RE. As a result, this work provides data in the form of finely-grained, multi-layered annotations suitable for various types of research focused on REs that refer to animate entities. Table 1 provides an overview of the distribution of the annotated REs.
Data accessibility
Both corpora are stored on the Open Science Framework website and are publicly available for educational, research, and non-profit purposes under appropriate attribution. A dataframe containing only the annotated REs and additional information is freely accessible for download. Since it is an excerpt of several chapters from each novel, the publisher has not allowed us to make the entire corpus freely accessible to the public. However, the publisher has permitted us to place the complete corpus behind a password-protected link due to copyright restrictions. Colleagues and interested individuals can access the entire corpus by contacting us for the password. We provide our data as a CSV file, making it interoperable and enabling data exchange and reuse among researchers, institutions, organizations, and countries. The corpus has been described in the publication by Repp, Schumacher, and Same (2023) and used for the empirical investigation in the publication by Repp and Schumacher (2023).
RF | Freq | % |
PersPron | 1390 | 42.59 |
Proper Name | 390 | 11.95 |
defDP | 350 | 10.72 |
zero | 306 | 9.38 |
PossPron | 250 | 7.66 |
D-Pron | 152 | 4.66 |
IndefPron | 133 | 4.07 |
IndefDP | 109 | 3.34 |
other | 184 | 5.64 |
Total | 3264 | 100.00 |
References:
Repp, Magdalena, Petra B. Schumacher & Fahime Same. 2023. Multi-layered annotation of conversation-like narratives in German. In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), 61–72. Toronto: Association for Computational Linguistics.
Repp, Magdalena & Petra B. Schumacher. 2023. What naturalistic stimuli tell us about pronoun resolution in real-time processing. Frontiers in Artificial Intelligence 6. 1058554.
Yimam, Seid Muhie, Chris Biemann, Richard Eckart de Castilho & Iryna Gurevych. 2014. Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 91–96. Baltimore, Maryland: Association for Computational Linguistics.
Herrndorf, Wolfgang. 2010. Tschick. Rowohlt, Reinbek Bei Hamburg.
Haas, Wolf. 1996. Auferstehung Der Toten. Rowohlt, Reinbek Bei Hamburg.