Authorship Attribution: Specifics for Slovene

  • Ana Zwitter Vitez Trojina, Zavod za uporabno slovenistiko

Abstract

The paper shows the importance of a quality analysis of linguistic features which enable the process of authorship attribution or author profiling in a forensic, literary or economic context (anonymous threat letters, plagiarism, literary works of unknown authorship, client profiling). It also highlights the lack of realized analyses for Slovene and outlines the methodology of detecting the syntactic, lexical, semantic and character features in order to quantify the author’s personal style.

Downloads

Download data is not yet available.

Author Biography

Ana Zwitter Vitez, Trojina, Zavod za uporabno slovenistiko

Ljubljana, Slovenija. E-pošta: ana.zwitter@guest.arnes.si

References

Corpus Gigafida (http://demo.gigafida.net, 25. 5. 2011)
Slovenska leposlovna klasika (http://sl.wikisource.org/wiki/Glavna_stran, 25. 5. 2011).
Slovenian part-of-speech tagger (http://oznacevalnik.slovenscina.eu, 25. 5. 2011)
Slovenian parser (http://razclenjevalnik.slovenscina.eu/, 25. 5. 2011)
__________________________
Shlomo ARGAMON, Shlomo LEVITAN, 2005: Measuring the usefulness of function words for authorship attribution. Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing.
Arald BAAYEN, Hans VAN HALTEREN, Fiona TWEEDIE, 1996: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11/3, 121–131.
John F. BURROWS, 1987: Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press.
Scott DEERWESTER et al., 1990: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407.
Joachim DIEDERICH et al., 2003: Authorship attribution with support vector machines. Applied Intelligence 19/1–2, 109–123.
Marijan DOVIĆ, 2002: Podbevšek in Cvelbar: Poskus empirične preverbe namigov o plagiatorstvu. Slavistična revija 50, 233–249.
Maciej EDER, 2010: Does Size Matter? Authorship Attribution, Small Samples, Big Problem. London: Proceedings of the Digital Humanities Conference 2010.
Neil GRAHAM et al., 2005: Segmenting documents by stylistic character. Journal of Natural Language Engineering, 11(4), 397–415.
Jack GRIEVE, 2007: Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251–270.
Graeme HIRST, Ol’ga FEIGUINA, 2007: Bigrams of syntactic labels for authorship
discrimination of short texts. Literary and Linguistic Computing 22/4, 405–417.
Moshe KOPPEL et al., 2002: Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401–412.
Marko LIMBEK, 2008: Usage of Multivariate Analysis in Authorship Attribution: Did Janez Mencinger Write the Story “Poštena Bohinčeka”? Metodološki zvezki, 5/1, 81–93.
Kim LUYCKX, Walter DAELEMANS, 2005: Shallow text analysis and machine learning for authorship attribution. Proceedings of the Fifteenth Meeting of Computational Linguistics in the Netherlands.
Philip MCCARTHY et al. 2006: Analyzing writing styles with coh-metrix. Proceedings of the Florida Artificial Intelligence Research Society International Conference, 764–769.
Sven MEYER ZU EISSEN et al., 2007: Plagiarism detection without reference collections. Advances in Data Analysis, 359–366.
Frederick MOSTELLER, David L. WALLACE, 1964: Inference and Disputed Authorship: the Federalist Papers, Reading, Mass.: Addison-Wesley.
Fuchun PENG et al., 2003: Language independent authorship attribution using character level language models. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, 267–274.
Joseph RUDMAN, 1998: The state of authorship attribution studies: Some problems and solutions. Computers and the Humanities, 31, 351–365.
Fabrizio SEBASTIANI, 2002: Machine learning in automated text categorization. ACM Computing Surveys, 34(1).
Michael SHAW et al. (2001). Knowledge management and data mining for marketing. Decision Support Systems, 31 (1), 127–137.
Efstathios STAMATATOS et al. 2000: Automatic text categorization in terms of genre and author. Computational Linguistics, 26(4), 471–495.
Efstathios STAMATATOS et al., 2001: Computer-based authorship attribution without lexical measures. Computers and the Humanities, 35(2), 193–214.
Efstathios STAMATATOS et al., 2006: Ensemble-based author identification using
character n-grams. Proceedings of the 3rd International Workshop on Text-based Information Retrieval, 41–46.
Efstathios STAMATATOS 2009: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology, 60(3), 538–556.
Ozlem UZUNER, Boris KATZ, 2005: A comparative study of language models for
book and author recognition. Proceedings of the 2nd International Joint Conference on Natural Language Processing, 969–980.
Published
2020-10-21
How to Cite
Zwitter Vitez A. (2020). Authorship Attribution: Specifics for Slovene. Slavia Centralis, 5(1), 75–85. https://doi.org/10.18690/scn.5.1.75–85.2012
Section
Articles