publications | Fenia Christopoulou

* denotes equal contribution

Up-to-date publications available on Google Scholar.

2024

SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks

Fenia Christopoulou, Ronald Cardenas^*, Gerasimos Lampouras, and 2 more authors

arXiv preprint arXiv:2410.05102, 2024

Abs arXiv Code

Preference Optimization (PO) has proven an effective step for aligning language models to human-desired behaviors. Current variants, following the offline Direct Preference Optimization objective, have focused on a strict setting where all tokens are contributing signals of KL divergence and rewards to the loss function. However, human preference is not affected by each word in a sequence equally but is often dependent on specific words or phrases, e.g. existence of toxic terms leads to non-preferred responses. Based on this observation, we argue that not all tokens should be weighted equally during PO and propose a flexible objective termed SparsePO, that aims to automatically learn to weight the KL divergence and reward corresponding to each token during PO training. We propose two different variants of weight-masks that can either be derived from the reference model itself or learned on the fly. Notably, our method induces sparsity in the learned masks, allowing the model to learn how to best weight reward and KL divergence contributions at the token level, learning an optimal level of mask sparsity. Extensive experiments on multiple domains, including sentiment control, dialogue, text summarization and text-to-code generation, illustrate that our approach assigns meaningful weights to tokens according to the target task, generates more responses with the desired preference and improves reasoning tasks by up to 2 percentage points compared to other token- and response-level PO methods.
Human-like episodic memory for infinite context llms

Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee, and 4 more authors

arXiv preprint arXiv:2407.09450, 2024

Abs arXiv Code Website

Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs with no fine-tuning, enabling them to handle practically infinite context lengths while maintaining computational efficiency. EM-LLM organises sequences of tokens into coherent episodic events using a combination of Bayesian surprise and graph-theoretic boundary refinement in an online fashion. When needed, these events are retrieved through a two-stage memory process, combining similarity-based and temporally contiguous retrieval for efficient and human-like access to relevant information. Experiments on the LongBench and InfiniteBench benchmarks demonstrate EM-LLM’s superior performance, consistently outperforming the state-of-the-art retrieval model InfLLM across various baseline LLMs. In addition, EM-LLM outperforms its popular counterpart, RAG, in a wide range of tasks, while requiring similar resources. Notably, EM-LLM’s performance even surpasses full-context models in most tasks, while successfully performing retrieval across 10 million tokens - a scale computationally infeasible for such models. Finally, our analysis reveals strong correlations between EM-LLM’s event segmentation and human-perceived events, suggesting a bridge between this artificial system and its biological counterpart, thereby offering a novel computational framework for exploring human memory mechanisms.
EACL

Text-to-Code Generation with Modality-relative Pre-training

Fenia Christopoulou, Guchun Zhang, and Gerasimos Lampouras

In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Mar 2024

Abs Video Code Slides

Large pre-trained language models have recently been expanded and applied to programming language tasks with great success, often through further pre-training of a strictly-natural language model–where training sequences typically contain both natural and (linearised) programming language. Such approaches effectively map both modalities of the sequence into the same embedding space. However, programming language keywords (e.g. “while”) often have very strictly defined semantics. As such, transfer learning from their natural language usage may not necessarily be beneficial to their code application and vise versa. Assuming an already pre-trained language model, in this work we investigate how sequence tokens can be adapted and represented differently, depending on which modality they belong to, and to the ultimate benefit of the downstream task. We experiment with separating embedding spaces between modalities during further model pre-training with modality-relative training objectives. We focus on text-to-code generation and observe consistent improvements across two backbone models and two test sets, measuring pass@k and a novel incremental variation.

2022

PanGu-Coder: Program Synthesis with Function-level Language Modeling

Fenia Christopoulou, Gerasimos Lampouras, Milan Gritta, and 8 more authors

arXiv preprint arXiv:2207.11280, Mar 2022
Knowledge graph enrichment of a semantic search system for construction safety

Emrah Inan, Paul Thompson, Fenia Christopoulou, and 2 more authors

In Proceedings of SAI Intelligent Systems Conference, Mar 2022
EMNLP

EntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching

Chenxi Whitehouse, Fenia Christopoulou, and Ignacio Iacobacci

In Findings of the Association for Computational Linguistics: EMNLP 2022, Dec 2022

Abs DOI Code Poster

Accurate alignment between languages is fundamental for improving cross-lingual pre-trained language models (XLMs). Motivated by the natural phenomenon of code-switching (CS) in multilingual speakers, CS has been used as an effective data augmentation method that offers language alignment at word- or phrase-level, in contrast to sentence-level via parallel instances. Existing approaches either use dictionaries or parallel sentences with word-alignment to generate CS data by randomly switching words in a sentence. However, such methods can be suboptimal as dictionaries disregard semantics, and syntax might become invalid after random word switching. In this work, we propose EntityCS, a method that focuses on Entity-level Code-Switching to capture fine-grained cross-lingual semantics without corrupting syntax. We use Wikidata and the English Wikipedia to construct an entity-centric CS corpus by switching entities to their counterparts in other languages. We further propose entity-oriented masking strategies during intermediate model training on the EntityCS corpus for improving entity prediction. Evaluation of the trained models on four entity-centric downstream tasks shows consistent improvements over the baseline with a notable increase of 10% in Fact Retrieval. We release the corpus and models to assist research on code-switching and enriching XLMs with external knowledge.
EMNLP

Training Dynamics for Curriculum Learning: A Study on Monolingual and Cross-lingual NLU

Fenia Christopoulou, Gerasimos Lampouras, and Ignacio Iacobacci

In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Dec 2022

Abs DOI Video Code Slides

Curriculum Learning (CL) is a technique of training models via ranking examples in a typically increasing difficulty trend with the aim of accelerating convergence and improving generalisability. Current approaches for Natural Language Understanding (NLU) tasks use CL to improve in-distribution data performance often via heuristic-oriented or task-agnostic difficulties. In this work, instead, we employ CL for NLU by taking advantage of training dynamics as difficulty metrics, i.e., statistics that measure the behavior of the model at hand on specific task-data instances during training and propose modifications of existing CL schedulers based on these statistics. Differently from existing works, we focus on evaluating models on in-distribution (ID), out-of-distribution (OOD) as well as zero-shot (ZS) cross-lingual transfer datasets. We show across several NLU tasks that CL with training dynamics can result in better performance mostly on zero-shot cross-lingual transfer and OOD settings with improvements up by 8.5% in certain cases. Overall, experiments indicate that training dynamics can lead to better performing models with smoother training compared to other difficulty metrics while being 20% faster on average. In addition, through analysis we shed light on the correlations of task-specific versus task-agnostic metrics.

2021

NAACL

Distantly Supervised Relation Extraction with Sentence Reconstruction and Knowledge Base Priors

Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou

In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun 2021

Abs DOI Code Poster

We propose a multi-task, probabilistic approach to facilitate distantly supervised relation extraction by bringing closer the representations of sentences that contain the same Knowledge Base pairs. To achieve this, we bias the latent space of sentences via a Variational Autoencoder (VAE) that is trained jointly with a relation classifier. The latent code guides the pair representations and influences sentence reconstruction. Experimental results on two datasets created via distant supervision indicate that multi-task learning results in performance benefits. Additional exploration of employing Knowledge Base priors into theVAE reveals that the sentence space can be shifted towards that of the Knowledge Base, offering interpretability and further improving results.

2020

JAMIA

Adverse Drug Events and Medication Relation Extraction in Electronic Health Records with Ensemble Deep Learning Methods

Fenia Christopoulou, Thy Thy Tran^*, Sunil Kumar Sahu, and 2 more authors

Journal of the American Medical Informatics Association, Aug 2020

Abs DOI HTML

Identification of drugs, associated medication entities, and interactions among them are crucial to prevent unwanted effects of drug therapy, known as adverse drug events. This article describes our participation to the n2c2 shared-task in extracting relations between medication-related entities in electronic health records.We proposed an ensemble approach for relation extraction and classification between drugs and medication-related entities. We incorporated state-of-the-art named-entity recognition (NER) models based on bidirectional long short-term memory (BiLSTM) networks and conditional random fields (CRF) for end-to-end extraction. We additionally developed separate models for intra- and inter-sentence relation extraction and combined them using an ensemble method. The intra-sentence models rely on bidirectional long short-term memory networks and attention mechanisms and are able to capture dependencies between multiple related pairs in the same sentence. For the inter-sentence relations, we adopted a neural architecture that utilizes the Transformer network to improve performance in longer sequences.Our team ranked third with a micro-averaged F1 score of 94.72\% and 87.65\% for relation and end-to-end relation extraction, respectively (Tracks 2 and 3). Our ensemble effectively takes advantages from our proposed models. Analysis of the reported results indicated that our proposed approach is more generalizable than the top-performing system, which employs additional training data- and corpus-driven processing techniques.We proposed a relation extraction system to identify relations between drugs and medication-related entities. The proposed approach is independent of external syntactic tools. Analysis showed that by using latent Drug-Drug interactions we were able to significantly improve the performance of non–Drug-Drug pairs in EHRs.
Textual Relation Extraction with Edge-oriented Graph Neural Models

Fenia Christopoulou

Aug 2020

PDF

2019

EMNLP

Connecting the Dots: Document-level Neural Relation Extraction with Edge-oriented Graphs

Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou

In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov 2019

Abs DOI Code Poster

Document-level relation extraction is a complex human process that requires logical inference to extract relationships between named entities in text. Existing approaches use graph-based neural models with words as nodes and edges as relations between them, to encode relations across sentences. These models are node-based, i.e., they form pair representations based solely on the two target node representations. However, entity relations can be better expressed through unique edge representations formed as paths between nodes. We thus propose an edge-oriented graph neural model for document-level relation extraction. The model utilises different types of nodes and edges to create a document-level graph. An inference mechanism on the graph edges enables to learn intra- and inter-sentence relations using multi-instance learning internally. Experiments on two document-level biomedical datasets for chemical-disease and gene-disease associations show the usefulness of the proposed edge-oriented approach.
ACL

Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network

Sunil Kumar Sahu, Fenia Christopoulou, Makoto Miwa, and 1 more author

In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Jul 2019

Abs DOI Video Slides

Inter-sentence relation extraction deals with a number of complex semantic relationships in documents, which require local, non-local, syntactic and semantic dependencies. Existing methods do not fully exploit such dependencies. We present a novel inter-sentence relation extraction model that builds a labelled edge graph convolutional neural network model on a document-level graph. The graph is constructed using various inter- and intra-sentence dependencies to capture local and non-local dependency information. In order to predict the relation of an entity pair, we utilise multi-instance learning with bi-affine pairwise scoring. Experimental results show that our model achieves comparable performance to the state-of-the-art neural models on two biochemistry datasets. Our analysis shows that all the types in the graph are effective for inter-sentence relation extraction.

2018

ACL

A Walk-based Model on Entity Graphs for Relation Extraction

Fenia Christopoulou, Makoto Miwa, and Sophia Ananiadou

In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jul 2018

Abs DOI Code Poster

We present a novel graph-based neural network model for relation extraction. Our model treats multiple pairs in a sentence simultaneously and considers interactions among them. All the entities in a sentence are placed as nodes in a fully-connected graph structure. The edges are represented with position-aware contexts around the entity pairs. In order to consider different relation paths between two entities, we construct up to l-length walks between each pair. The resulting walks are merged and iteratively used to update the edge representations into longer walks representations. We show that the model achieves performance comparable to the state-of-the-art systems on the ACE 2005 dataset without using any external tools.
ICSC

Mixture of Topic-Based Distributional Semantic and Affective Models

Fenia Christopoulou, Eleftheria Briakou, Elias Iosif, and 1 more author

In Proceedings of the 12th IEEE International Conference on Semantic Computing (ICSC), Jan 2018

Abs DOI PDF Code Poster Slides

Typically, Distributional Semantic Models (DSMs) estimate semantic similarity between words using a single-model, where the multiple senses of polysemous words are conflated in a single representation. Similarly, in textual affective analysis tasks, ambiguous words are usually not treated differently when estimating word affective scores. In this work, a semantic mixture model is proposed enabling the combination of word similarity scores estimated across multiple topic-specific DSMs (TDSMs). Based on the assumption that semantic similarity implies affective similarity, we extend this model to perform sentence-level affect estimation. The proposed model outperforms the baseline approach achieving state-of-the-art results for semantic similarity estimation and sentence-level polarity detection.

2016

SemEval

Tweester at SemEval-2016 Task 4: Sentiment Analysis in Twitter Using Semantic-Affective Model Adaptation

Elisavet Palogiannidi, Athanasia Kolovou, Fenia Christopoulou, and 6 more authors

In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval), Jun 2016

Abs DOI Poster

We describe our submission to SemEval2016Task 4: Sentiment Analysis in Twitter. The proposed system ranked first for the sub-task B. Our system comprises of multiple in-dependent models such as neural networks, semantic-affective models and topic modeling that are combined in a probabilistic way. The novelty of the system is the employment of a topic modeling approach in order to adapt the semantic-affective space for each tweet. In addition, significant enhancements were madein the main system dealing with the data pre-processing and feature extraction including the employment of word embeddings. Each model is used to predict a tweet’s sentiment (positive, negative or neutral) and a late fusion scheme is adopted for the final decision.
Sentence-level Sentiment Analysis using Topic Modeling

Fenia Christopoulou

Jun 2016

PDF