Authorship Attribution: Specifics for Slovene
Povzetek
Ugotavljanje avtorstva besedil za slovenščino
V prispevku je poudarjena pomembnost kakovostne analize jezikovnih parametrov, ki omogočajo ugotavljanje avtorstva ali profiliranje avtorja besedila v forenzičnem, literarnozgodovinskem ali gospodarskem kontekstu (anonimna grozilna pisma, ugotavljanje plagiatorstva, literarna besedila neznanega izvora, profiliranje strank). Ker je tovrstne analize za slovenščino težko najti, predlagamo metodologijo luščenja skladenjskih, leksikalnih, semantičnih in znakovnih parametrov za potrebe kvantitativne obravnave avtorjevega osebnega sloga.
Prenosi
Literatura
Slovenska leposlovna klasika (http://sl.wikisource.org/wiki/Glavna_stran, 25. 5. 2011).
Slovenian part-of-speech tagger (http://oznacevalnik.slovenscina.eu, 25. 5. 2011)
Slovenian parser (http://razclenjevalnik.slovenscina.eu/, 25. 5. 2011)
__________________________
Shlomo ARGAMON, Shlomo LEVITAN, 2005: Measuring the usefulness of function words for authorship attribution. Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing.
Arald BAAYEN, Hans VAN HALTEREN, Fiona TWEEDIE, 1996: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11/3, 121–131.
John F. BURROWS, 1987: Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press.
Scott DEERWESTER et al., 1990: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407.
Joachim DIEDERICH et al., 2003: Authorship attribution with support vector machines. Applied Intelligence 19/1–2, 109–123.
Marijan DOVIĆ, 2002: Podbevšek in Cvelbar: Poskus empirične preverbe namigov o plagiatorstvu. Slavistična revija 50, 233–249.
Maciej EDER, 2010: Does Size Matter? Authorship Attribution, Small Samples, Big Problem. London: Proceedings of the Digital Humanities Conference 2010.
Neil GRAHAM et al., 2005: Segmenting documents by stylistic character. Journal of Natural Language Engineering, 11(4), 397–415.
Jack GRIEVE, 2007: Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251–270.
Graeme HIRST, Ol’ga FEIGUINA, 2007: Bigrams of syntactic labels for authorship
discrimination of short texts. Literary and Linguistic Computing 22/4, 405–417.
Moshe KOPPEL et al., 2002: Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401–412.
Marko LIMBEK, 2008: Usage of Multivariate Analysis in Authorship Attribution: Did Janez Mencinger Write the Story “Poštena Bohinčeka”? Metodološki zvezki, 5/1, 81–93.
Kim LUYCKX, Walter DAELEMANS, 2005: Shallow text analysis and machine learning for authorship attribution. Proceedings of the Fifteenth Meeting of Computational Linguistics in the Netherlands.
Philip MCCARTHY et al. 2006: Analyzing writing styles with coh-metrix. Proceedings of the Florida Artificial Intelligence Research Society International Conference, 764–769.
Sven MEYER ZU EISSEN et al., 2007: Plagiarism detection without reference collections. Advances in Data Analysis, 359–366.
Frederick MOSTELLER, David L. WALLACE, 1964: Inference and Disputed Authorship: the Federalist Papers, Reading, Mass.: Addison-Wesley.
Fuchun PENG et al., 2003: Language independent authorship attribution using character level language models. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, 267–274.
Joseph RUDMAN, 1998: The state of authorship attribution studies: Some problems and solutions. Computers and the Humanities, 31, 351–365.
Fabrizio SEBASTIANI, 2002: Machine learning in automated text categorization. ACM Computing Surveys, 34(1).
Michael SHAW et al. (2001). Knowledge management and data mining for marketing. Decision Support Systems, 31 (1), 127–137.
Efstathios STAMATATOS et al. 2000: Automatic text categorization in terms of genre and author. Computational Linguistics, 26(4), 471–495.
Efstathios STAMATATOS et al., 2001: Computer-based authorship attribution without lexical measures. Computers and the Humanities, 35(2), 193–214.
Efstathios STAMATATOS et al., 2006: Ensemble-based author identification using
character n-grams. Proceedings of the 3rd International Workshop on Text-based Information Retrieval, 41–46.
Efstathios STAMATATOS 2009: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology, 60(3), 538–556.
Ozlem UZUNER, Boris KATZ, 2005: A comparative study of language models for
book and author recognition. Proceedings of the 2nd International Joint Conference on Natural Language Processing, 969–980.
Copyright (c) 2012 Univerzitetna založba Univerze v Mariboru
To delo je licencirano pod Creative Commons Priznanje avtorstva-Nekomercialno 4.0 mednarodno licenco.
Avtorske pravice
Avtorji sprejetih prispevkov ohranijo avtorske pravice svojega besedila, obenem pa uredništvu revije Slavia Centralis priznavajo pravico do elektronske distribucije prispevka. Avtorji lahko svoje besedilo (v natisnjeni ali elektronski verziji) ponovno objavijo zgolj ob navedbi prvotne objave v reviji Slavia Centralis. Avtorji lahko objavljeno besedilo dodajo tudi na osebno spletno stran, oddelčno spletno stran ali na institucionalne repozitorije.
Plagiatorstvo
Slavia Centralis je nekomercialna in prosto dostopna mednarodna znanstvena revija. Kot taka je zavezana etičnim načelom glede zaupnosti, izvirnosti in intelektualne poštenosti. Kršenje avtorskih pravic in plagiatorstvo obravnava zelo resno, zaradi česar z ustrezno programsko opremo preverja morebitno podobnost z vsebino drugih besedil.
Avtorji morajo upoštevati naslednje:
Predloženo besedilo mora biti izviren znanstveni članek. Vsi viri morajo biti korektno navedeni. Besedilo ne sme biti istočasno predloženo uredniški presoji drugih publikacij.
Za vključeno gradivo (citati, ilustracije, tabele ipd.) je treba pridobiti ustrezna dovoljenja, ki izhajajo iz avtorskih pravic.
Objava v reviji Slavia Centralia ne predvideva plačila.