Authorship Attribution: Specifics for Slovene

  • Ana Zwitter Vitez Trojina, Institute for Applied Slovene Studies
Ključne besede: ugotavljanje avtorstva besedil, profiliranje avtorja, jezikovni parametri, jezikovne tehnologije, forenzično jezikoslovje

Povzetek

Ugotavljanje avtorstva besedil za slovenščino

V prispevku je poudarjena pomembnost kakovostne analize jezikovnih parametrov, ki omogočajo ugotavljanje avtorstva ali profiliranje avtorja besedila v forenzičnem, literarnozgodovinskem ali gospodarskem kontekstu (anonimna grozilna pisma, ugotavljanje plagiatorstva, literarna besedila neznanega izvora, profiliranje strank). Ker je tovrstne analize za slovenščino težko najti, predlagamo metodologijo luščenja skladenjskih, leksikalnih, semantičnih in znakovnih parametrov za potrebe kvantitativne obravnave avtorjevega osebnega sloga.

Prenosi

Podatki o prenosih še niso na voljo.

Biografija avtorja

Ana Zwitter Vitez, Trojina, Institute for Applied Slovene Studies

Ljubljana, Slovenia. E-mail: ana.zwitter@guest.arnes.si

Literatura

Corpus Gigafida (http://demo.gigafida.net, 25. 5. 2011)
Slovenska leposlovna klasika (http://sl.wikisource.org/wiki/Glavna_stran, 25. 5. 2011).
Slovenian part-of-speech tagger (http://oznacevalnik.slovenscina.eu, 25. 5. 2011)
Slovenian parser (http://razclenjevalnik.slovenscina.eu/, 25. 5. 2011)
__________________________
Shlomo ARGAMON, Shlomo LEVITAN, 2005: Measuring the usefulness of function words for authorship attribution. Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing.
Arald BAAYEN, Hans VAN HALTEREN, Fiona TWEEDIE, 1996: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11/3, 121–131.
John F. BURROWS, 1987: Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press.
Scott DEERWESTER et al., 1990: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407.
Joachim DIEDERICH et al., 2003: Authorship attribution with support vector machines. Applied Intelligence 19/1–2, 109–123.
Marijan DOVIĆ, 2002: Podbevšek in Cvelbar: Poskus empirične preverbe namigov o plagiatorstvu. Slavistična revija 50, 233–249.
Maciej EDER, 2010: Does Size Matter? Authorship Attribution, Small Samples, Big Problem. London: Proceedings of the Digital Humanities Conference 2010.
Neil GRAHAM et al., 2005: Segmenting documents by stylistic character. Journal of Natural Language Engineering, 11(4), 397–415.
Jack GRIEVE, 2007: Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251–270.
Graeme HIRST, Ol’ga FEIGUINA, 2007: Bigrams of syntactic labels for authorship
discrimination of short texts. Literary and Linguistic Computing 22/4, 405–417.
Moshe KOPPEL et al., 2002: Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401–412.
Marko LIMBEK, 2008: Usage of Multivariate Analysis in Authorship Attribution: Did Janez Mencinger Write the Story “Poštena Bohinčeka”? Metodološki zvezki, 5/1, 81–93.
Kim LUYCKX, Walter DAELEMANS, 2005: Shallow text analysis and machine learning for authorship attribution. Proceedings of the Fifteenth Meeting of Computational Linguistics in the Netherlands.
Philip MCCARTHY et al. 2006: Analyzing writing styles with coh-metrix. Proceedings of the Florida Artificial Intelligence Research Society International Conference, 764–769.
Sven MEYER ZU EISSEN et al., 2007: Plagiarism detection without reference collections. Advances in Data Analysis, 359–366.
Frederick MOSTELLER, David L. WALLACE, 1964: Inference and Disputed Authorship: the Federalist Papers, Reading, Mass.: Addison-Wesley.
Fuchun PENG et al., 2003: Language independent authorship attribution using character level language models. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, 267–274.
Joseph RUDMAN, 1998: The state of authorship attribution studies: Some problems and solutions. Computers and the Humanities, 31, 351–365.
Fabrizio SEBASTIANI, 2002: Machine learning in automated text categorization. ACM Computing Surveys, 34(1).
Michael SHAW et al. (2001). Knowledge management and data mining for marketing. Decision Support Systems, 31 (1), 127–137.
Efstathios STAMATATOS et al. 2000: Automatic text categorization in terms of genre and author. Computational Linguistics, 26(4), 471–495.
Efstathios STAMATATOS et al., 2001: Computer-based authorship attribution without lexical measures. Computers and the Humanities, 35(2), 193–214.
Efstathios STAMATATOS et al., 2006: Ensemble-based author identification using
character n-grams. Proceedings of the 3rd International Workshop on Text-based Information Retrieval, 41–46.
Efstathios STAMATATOS 2009: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology, 60(3), 538–556.
Ozlem UZUNER, Boris KATZ, 2005: A comparative study of language models for
book and author recognition. Proceedings of the 2nd International Joint Conference on Natural Language Processing, 969–980.
Objavljeno
2020-10-21
Kako citirati
Zwitter Vitez A. (2020). Authorship Attribution: Specifics for Slovene. Slavia Centralis, 5(1), 75–85. https://doi.org/10.18690/scn.5.1.75–85.2012
Rubrike
Articles