الفهرس | Only 14 pages are availabe for public view |
Abstract More than 30 million biomedical research articles have been published, and this information is extensively distributed and plays a significant role in the advancement of biomedical science. Mining relations between medical entities and understanding the interactions is essential for a variety of tasks, including fundamental biomedical research, drug discovery, and precision medicine. In the proposed work two models are introduced, the main aim of both systems are to solve the relation extraction task. The first proposed approach takes advantage of pre-trained language models to extract the word embeddings of our input text rather than using traditional feature extraction techniques and then feed the extracted embedding to machine learning classifiers to perform the classification task. The proposed classification model was tested and evaluated on two benchmark datasets ChemProt and DDI. It is proved that traditional machine learning techniques can compete with recent approaches. Experiments on the ChemProt dataset show that the performance of our model’s f1-score is 74.6% and on the DDI dataset f1-score is 73.7% which is higher than other models. The Second proposed model aim to solve a real-world problem, The pipeline first retrieves articles mainly abstracts from PubMed (is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics), the resulting abstracts are then subjected to filtering and preprocessing to extract predefined relations from all abstracts and then the generated dataset is used to fine-tune a Bidirectional Encoder Representations from Transformers (BERT) model to carry out relation extraction. In the ChemProt dataset, our method achieved an F1-score of 85.8% and achieved a F1-score of 88.5% on DDI dataset. |