Authorship Attribution: Specifics for Slovene
Abstract
The paper shows the importance of a quality analysis of linguistic features which enable the process of authorship attribution or author profiling in a forensic, literary or economic context (anonymous threat letters, plagiarism, literary works of unknown authorship, client profiling). It also highlights the lack of realized analyses for Slovene and outlines the methodology of detecting the syntactic, lexical, semantic and character features in order to quantify the author’s personal style.
Downloads
References
Slovenska leposlovna klasika (http://sl.wikisource.org/wiki/Glavna_stran, 25. 5. 2011).
Slovenian part-of-speech tagger (http://oznacevalnik.slovenscina.eu, 25. 5. 2011)
Slovenian parser (http://razclenjevalnik.slovenscina.eu/, 25. 5. 2011)
__________________________
Shlomo ARGAMON, Shlomo LEVITAN, 2005: Measuring the usefulness of function words for authorship attribution. Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing.
Arald BAAYEN, Hans VAN HALTEREN, Fiona TWEEDIE, 1996: Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11/3, 121–131.
John F. BURROWS, 1987: Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press.
Scott DEERWESTER et al., 1990: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407.
Joachim DIEDERICH et al., 2003: Authorship attribution with support vector machines. Applied Intelligence 19/1–2, 109–123.
Marijan DOVIĆ, 2002: Podbevšek in Cvelbar: Poskus empirične preverbe namigov o plagiatorstvu. Slavistična revija 50, 233–249.
Maciej EDER, 2010: Does Size Matter? Authorship Attribution, Small Samples, Big Problem. London: Proceedings of the Digital Humanities Conference 2010.
Neil GRAHAM et al., 2005: Segmenting documents by stylistic character. Journal of Natural Language Engineering, 11(4), 397–415.
Jack GRIEVE, 2007: Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251–270.
Graeme HIRST, Ol’ga FEIGUINA, 2007: Bigrams of syntactic labels for authorship
discrimination of short texts. Literary and Linguistic Computing 22/4, 405–417.
Moshe KOPPEL et al., 2002: Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401–412.
Marko LIMBEK, 2008: Usage of Multivariate Analysis in Authorship Attribution: Did Janez Mencinger Write the Story “Poštena Bohinčeka”? Metodološki zvezki, 5/1, 81–93.
Kim LUYCKX, Walter DAELEMANS, 2005: Shallow text analysis and machine learning for authorship attribution. Proceedings of the Fifteenth Meeting of Computational Linguistics in the Netherlands.
Philip MCCARTHY et al. 2006: Analyzing writing styles with coh-metrix. Proceedings of the Florida Artificial Intelligence Research Society International Conference, 764–769.
Sven MEYER ZU EISSEN et al., 2007: Plagiarism detection without reference collections. Advances in Data Analysis, 359–366.
Frederick MOSTELLER, David L. WALLACE, 1964: Inference and Disputed Authorship: the Federalist Papers, Reading, Mass.: Addison-Wesley.
Fuchun PENG et al., 2003: Language independent authorship attribution using character level language models. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, 267–274.
Joseph RUDMAN, 1998: The state of authorship attribution studies: Some problems and solutions. Computers and the Humanities, 31, 351–365.
Fabrizio SEBASTIANI, 2002: Machine learning in automated text categorization. ACM Computing Surveys, 34(1).
Michael SHAW et al. (2001). Knowledge management and data mining for marketing. Decision Support Systems, 31 (1), 127–137.
Efstathios STAMATATOS et al. 2000: Automatic text categorization in terms of genre and author. Computational Linguistics, 26(4), 471–495.
Efstathios STAMATATOS et al., 2001: Computer-based authorship attribution without lexical measures. Computers and the Humanities, 35(2), 193–214.
Efstathios STAMATATOS et al., 2006: Ensemble-based author identification using
character n-grams. Proceedings of the 3rd International Workshop on Text-based Information Retrieval, 41–46.
Efstathios STAMATATOS 2009: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology, 60(3), 538–556.
Ozlem UZUNER, Boris KATZ, 2005: A comparative study of language models for
book and author recognition. Proceedings of the 2nd International Joint Conference on Natural Language Processing, 969–980.
Copyright (c) 2012 University of Maribor Press
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyrights
Authors of accepted manuscripts will retain copyright in their work, but will assign to Slavia Centralis the permanent right to electronically distribute their article. Authors may republish their work (in print and/or electronic format) as long as they acknowledge Slavia Centralis as the original publisher. Authors may also share the published version on their own websites, departmental webpages, or institutional repositories.
Plagiarism Policy
Slavia Centralis is a non-commercial, open access, electronic research journal. As such it pledges to uphold certain ethical principles regarding confidentiality, originality and intellectual fair play. Slavia Centralis takes copyright infringement and plagiarism very seriously and all submissions may be checked with duplication detection software.
Authors must:
- Ensure that all work submitted is original, fully referenced and that all authors are represented accurately. The submission must be exclusive and not under consideration elsewhere.
- Obtain all permissions from copyright owners for 3rd party material (e.g. quotations, illustrations, tables, etc.).