Siamese Hybrid Network Approach for Sentence Similarity
DOI:
https://doi.org/10.31357/vjs.v27i02.7833Abstract
This paper presents a novel Siamese Hybrid Network approach, namely Siamese Bidirectional Long Short Memory with Convolutional Neural Network (SiBiLConv), for evaluating the similarity in natural language. The model integrates a Siamese neural network architecture with similarity metrics, including Manhattan Distance and Cosine Similarity, to improve the accuracy of semantic relationships measurement between sentences. Evaluations were performed on Sinhala, a complex and under-resourced language spoken in Sri Lanka, which poses unique challenges due to its morphological richness and syntactic variability. The SiBiLConv model achieved an accuracy of 89.80%, an F1 score of 0.9041, and a mean squared error (MSE) of 0.0281 with the Cosine Distance metric outperforming baseline models such as MaLSTM, which achieved an accuracy of 78.99% and an F1 score of 0.7797. While existing methods for sentence similarity primarily focus on resource-rich languages, this work addresses the pressing need for tailored approaches in low-resource language contexts, where pre-trained models and annotated datasets are often limited. The novelty lies in SiBiLConv's hybrid architecture and metric integration, specifically designed to overcome the syntactic and semantic complexities of Sinhala. This research not only bridges a critical gap in the application of sentence similarity models for low-resource languages but also establishes a framework adaptable to other morphologically rich languages, advancing the broader scope of natural language processing.
Keywords: Siamese Hybrid Network, Sentences similarity, Sinhala sentence similarity, Morphologically Rich Language Processing