Finding Efficient Linguistic Feature Set for Authorship Verification

Sandaruwan Prabath Kumara Ranatunga

Abstract


Authorship verification rely on identification of a given document is written by a particular author or not. Internally analyzing the document itself with respect to the variations in writing style of the author and identification of the author’s own idiolect is the main context of the authorship verification. Mainly, the detection performance depends on the used feature set for clustering the document. Linguistic features and stylistic features have been utilized for author identification according to the writing style of a particular author. Disclose the shallow changes of the author’s writing style is the major problem which should be addressed in the domain of authorship verification. It motivates the computer science researchers to do research on authorship verification in the field of computer forensics and this research also focuses this problem. The contributions from the research are two folded: Former is introducing a new feature extracting method with Natural Language Processing (NLP) and later is propose a new more efficient linguistic feature set for verification of author of the given document. Experiments on a corpus composed of freely downloadable genuine 19th century English Books and Self Organizing Maps has been used as the classifier to cluster the documents. Proper word segmentation also introduced in this work and it helps to demonstrate that the proposed strategy can produced promising results. Finally, it is realized that more accurate classification is generated by the proposed strategy with extracted linguistic feature set.

Full Text:

PDF


DOI: https://doi.org/10.31357/jcs.v1i1.1616

DOI (PDF): https://doi.org/10.31357/jcs.v1i1.1616.g952

Refbacks

  • There are currently no refbacks.