International Conference on Recent Progresses in Science, Engineering and Technology

Dr. Vaskar Deka

Dr. Vaskar Deka

Biography

Department of Information Technology
Gauhati University, Assam, India

Title of the Invited Talk: Knowledge Graph-based Question-answering System and the Assamese Language

Abstract: A question-answering system (QAS) processes natural language questions (NLQ) to provide answers by extracting information from various sources; wherein a question is a sentence, phrase, or word intended to elicit information, clarification, or an answer. In NLP (Natural Language Processing), an answer refers to the response generated or retrieved by a system in reaction to a question, query, or prompt. The characteristics of the answer depend on the specific application and the type of input, and it plays a crucial role in tasks such as Question Answering (QA), chatbots, and information retrieval systems. Based on the information source from which the answer is derived, QAS can be categorized into text-based and knowledge-based systems. Traditional QAS primarily focuses on document retrieval rather than pinpointing exact answers. Nowadays, research on QAS utilizing KGs (Knowledge Graphs) has expanded significantly. A KG is a structured representation of information that connects entities (e.g., people, places, things, concepts) and their relationships in a graph-like structure. Mathematically KG can be defined as G = {E,R, F}, where E, R, and F represent entities, relations, and facts. Facts are expressed as a triplet (head entity/subject, relation, tail entity/object) i.e. (h, r, t); where h, t ∈E, and r ∈R. It uses the structured and semantic relationships in the knowledge graph to retrieve accurate and contextually relevant answers to natural language questions. In the outlook of Indic Language, KGQA for Indic languages is a research area that aims to utilize KGs to respond to queries formulated in languages from the Indian subcontinent, and although the fundamental principles of KGQA remain consistent with those in English, there are distinct challenges and techniques specifically designed for Indic languages, owing to their unique linguistic characteristics and available resources. Assamese is a scheduled Indian language, spoken by the native inhabitants of the state of Assam, India.
The language is known for its highly inflected forms and the utilization of pronouns and noun plural markers in both honorific and non-honorific constructions. As the Assamese language is a resource-poor language in digital form, the initial step toward developing knowledge graph question answering (KGQA) systems for language like Assamese is to construct a KG in a particular domain (e.g. Tourism). This process can be categorized into three NLP subtasks: Preprocessing, Named Entity Recognition (NER), and Relation Extraction (RE). The next phase involves question processing, during which NER plays a vital role in identifying the entities mentioned in the question. The subsequent stages of the KGQA process will vary depending on the methodology employed. As such fine-tuning a transformer-based language model for the Assamese NER task is considered, followed by designing an RE framework to facilitate the creation of the KG for QAS in Assamese language.