publications
Check my scholar profile for an updated list.
2025
- INFFUSIncreasing trust in AI through privacy preservation and model explainability: Federated Learning of Fuzzy Regression TreesInformation Fusion 2025
Federated Learning (FL) lets multiple data owners collaborate in training a global model without any violation of data privacy, which is a crucial requirement for enhancing users’ trust in Artificial Intelligence (AI) systems. Despite the significant momentum recently gained by the FL paradigm, most of the existing approaches in the field neglect another key pillar for the trustworthiness of AI systems, namely explainability. In this paper, we propose a novel approach for FL of fuzzy regression trees (FRTs), which are generally acknowledged as highly interpretable by-design models. The proposed FL procedure is designed for the scenario of horizontally partitioned data and is based on the transmission of aggregated statistics from the clients to a central server for the tree induction procedure. It is shown that the proposed approach faithfully approximates the ideal case in which the tree induction algorithm is applied on the union of all local datasets, while still ensuring privacy preservation. Furthermore, the FL approach brings benefits, in terms of generalization capability, compared to the local learning setting in which each participant learns its own FRT based only on the private, local, dataset. The adoption of linear models in the leaf nodes ensures a competitive level of performance, as assessed by an extensive experimental campaign on benchmark datasets. The analysis of the results covers both the aspects of accuracy and interpretability of FRT. Finally, we discuss the application of the proposed federated FRT to the task of Quality of Experience forecasting in an automotive case-study.
2024
- Trustworthy AI in Heterogeneous Settings: Federated Learning of Explainable ClassifiersIn IEEE International Conference on Fuzzy Systems 2024
Trustworthy Artificial Intelligence (AI) has gained significant relevance worldwide. Federated Learning (FL) and eXplainable Artificial Intelligence (XAI) are two among the most relevant paradigms for accomplishing the requirements of trustworthy AI-based applications. On the one hand, FL guarantees data privacy throughout a collaborative learning of an AI model from decentralized data. On the other hand, XAI models ensure transparency, accountability, and trust in AI-based systems by providing understandable explanations for their predictions and decisions. To the best of our knowledge, only few works have explored the combination of FL with inherently explainable models, especially for classification task. In this work, we investigate FL of explainable classifiers, namely Fuzzy Rule-based Classifiers. In the proposed FL scheme, each participant creates its own set of classification rules from its own local training data, resorting to a simple procedure that generates a rule for each training instance. Local rules are sent to a central server which is in charge of aggregating them by removing duplicates and solving conflicts. The aggregated set of rules is then forwarded to the single participants for inference purposes. In our experimental analysis we consider two real-world case studies focusing on heterogeneous settings, namely non-IID (Independent and Identically Distributed) scenarios. Our FL scheme offers significant advantages in terms of classification performance to the participants in the federation, preserving data privacy.
- Consistent Post-Hoc Explainability in Federated Learning through Federated Fuzzy ClusteringIn IEEE International Conference on Fuzzy Systems 2024
Ensuring trustworthiness of AI systems by enforcing, for instance, data privacy and model explainability, has become urgent in our society. Recently, the Federated Learning (FL) paradigm has been proposed to preserve data privacy during collaborative model learning. Unfortunately, FL poses critical challenges in the application of post-hoc explanation methods which are used to explain opaque models such as neural networks. In this paper we present an approach for enhancing the explainability of opaque models generated according to the FL paradigm. We focus on one of the most popular methods, namely SHapley Additive exPlanations method (SHAP). Given an input instance, SHAP can explain why an opaque model generated that specific output prediction from the input values. To provide the explanation SHAP needs access to a background dataset, typically consisting of representative training instances. In FL setting, however, the training data are scattered over multiple participants and cannot be shared due to privacy constraints. On the other side, the background dataset should be representative of the overall training set. To this aim, we propose to adopt a federated Fuzzy C-Means clustering for the generation of a common background dataset made up of cluster centers. The resulting background dataset is representative of the actual distribution of the data and can be made available to all participants without violating privacy, thus ensuring accuracy and consistency of the explanations. A thorough experimental analysis shows the validity of the proposed approach also in comparison with baseline and alternative approaches.
- COGCOMFederated Learning of XAI models in healthcare: a case study on Parkinson's DiseaseCognitive Computation 2024
Artificial intelligence (AI) systems are increasingly used in healthcare applications, although some challenges have not been completely overcome to make them fully trustworthy and compliant with modern regulations and societal needs. First of all, sensitive health data, essential to train AI systems, are typically stored and managed in several separate medical centers and cannot be shared due to privacy constraints, thus hindering the use of all available information in learning models. Further, transparency and explainability of such systems are becoming increasingly urgent, especially at a time when “opaque” or “black-box” models are commonly used. Recently, technological and algorithmic solutions to these challenges have been investigated: on the one hand, federated learning (FL) has been proposed as a paradigm for collaborative model training among multiple parties without any disclosure of private raw data; on the other hand, research on eXplainable AI (XAI) aims to enhance the explainability of AI systems, either through interpretable by-design approaches or post-hoc explanation techniques. In this paper, we focus on a healthcare case study, namely predicting the progression of Parkinson’s disease, and assume that raw data originate from different medical centers and data collection for centralized training is precluded due to privacy limitations. We aim to investigate how FL of XAI models can allow achieving a good level of accuracy and trustworthiness. Cognitive and biologically inspired approaches are adopted in our analysis: FL of an interpretable by-design fuzzy rule-based system and FL of a neural network explained using a federated version of the SHAP post-hoc explanation technique. We analyze accuracy, interpretability, and explainability of the two approaches, also varying the degree of heterogeneity across several data distribution scenarios. Although the neural network is generally more accurate, the results show that the fuzzy rule-based system achieves competitive performance in the federated setting and presents desirable properties in terms of interpretability and transparency.
- TAIFederated c-means and Fuzzy c-means Clustering Algorithms for Horizontally and Vertically Partitioned DataIEEE Transactions on Artificial Intelligence 2024
Federated clustering lets multiple data owners collaborate in discovering patterns from distributed data without violating privacy requirements. The federated versions of traditional clustering algorithms proposed so far are, however, “lossy” since they fail to identify exactly the same clusters as the original versions executed on the merged data stored in a centralized server, as would happen if no privacy constraint occurred. In this paper, we propose federated procedures for losslessly executing the C-Means (CM) and the Fuzzy C-Means (FCM) algorithms in both horizontally and vertically partitioned data scenarios, while preserving data privacy. We formally prove that the proposed federated procedures identify the same clusters determined by applying the algorithms to the union of all local data. Further, we present an extensive experimental analysis for characterizing the behavior of the proposed approach in a typical federated learning scenario, that is, as the fraction of participants in the federation changes. We focus on the federated FCM and the horizontally partitioned data, which is the most interesting scenario. We show that the proposed procedure is effective and is able to achieve competitive performance with respect to two recently proposed versions of federated FCM for horizontally partitioned data.
2023
- COMCOMEnabling federated learning of explainable AI models within beyond-5G/6G networksBárcena, J. L. Corcuera, Ducange, P., Marcelloni, F., Nardini, G., Noferi, A., Renda, A., Ruffini, F., Schiavo, A., Stea, G., and Virdis, A.Computer Communications 2023
The quest for trustworthiness in Artificial Intelligence (AI) is increasingly urgent, especially in the field of next-generation wireless networks. Future Beyond 5G (B5G)/6G networks will connect a huge amount of devices and will offer innovative services empowered with AI and Machine Learning tools. Nevertheless, private user data, which are essential for training such services, are not an asset that can be unrestrictedly shared over the network, mainly because of privacy concerns. To overcome this issue, Federated Learning (FL) has recently been proposed as a paradigm to enable collaborative model training among multiple parties, without any disclosure of private raw data. However, the initiative to natively integrate FL services into mobile networks is still far from being accomplished. In this paper we propose a novel FL-as-a-Service framework that provides the B5G/6G network with flexible mechanisms to allow end users to exploit FL services, and we describe its applicability to a Quality of Experience (QoE) forecasting service based on a vehicular networking use case. Specifically, we show how FL of eXplainable AI (XAI) models can be leveraged for the QoE forecasting task, and induces a benefit in terms of both accuracy, compared to local learning, and trustworthiness, thanks to the adoption of inherently interpretable models. Such considerations are supported by an extensive experimental analysis on a publicly available simulated dataset. Finally, we assessed how the learning process is affected by the system deployment and the performance of the underlying communication and computation infrastructure, through system-level simulations, which show the benefits of deploying the proposed framework in edge-based environments.
- SOFT-XOpenFL-XAI: Federated learning of explainable artificial intelligence models in PythonSoftwareX 2023
Artificial Intelligence (AI) systems play a significant role in manifold decision-making processes in our daily lives, making trustworthiness of AI more and more crucial for its widespread acceptance. Among others, privacy and explainability are considered key requirements for enabling trust in AI. Building on these needs, we propose a software for Federated Learning (FL) of Rule-Based Systems (RBSs): on one hand FL prioritizes user data privacy during collaborative model training. On the other hand, RBSs are deemed as interpretable-by-design models and ensure high transparency in the decision-making process. The proposed software, developed as an extension to the Intel® OpenFL open-source framework, offers a viable solution for developing AI applications balancing accuracy, privacy, and interpretability.
- IEEE AccessThe Hexa-X Project Vision on Artificial Intelligence and Machine Learning-Driven Communication and Computation Co-Design for 6GMerluzzi, Mattia, Borsos, Tamás, Rajatheva, Nandana, Benczúr, András A., Farhadi, Hamed, Yassine, Taha, Müeck, Markus Dominik, Barmpounakis, Sokratis, Strinati, Emilio Calvanese, Dampahalage, Dilin, Demestichas, Panagiotis, Ducange, Pietro, Filippou, Miltiadis C., Baltar, Leonardo Gomes, Haraldson, Johan, Karaçay, Leyli, Korpi, Dani, Lamprousi, Vasiliki, Marcelloni, Francesco, Mohammadi, Jafar, Rajapaksha, Nuwanthika, Renda, A., and Uusitalo, Mikko A.IEEE Access 2023
This paper provides an overview of the most recent advancements and outcomes of the European 6G flagship project Hexa-X, on the topic of in-network Artificial Intelligence (AI) and Machine Learning (ML). We first present a general introduction to the project and its ambitions in terms of use cases (UCs), key performance indicators (KPIs), and key value indicators (KVIs). Then, we identify the key challenges to realize, implement, and enable the native integration of AI and ML in 6G, both as a means for designing flexible, low-complexity, and reconfigurable networks ( learning to communicate ), and as an intrinsic in-network intelligence feature ( communicating to learn or, 6G as an efficient AI/ML platform). We present a high level description of down selected technical enablers and their implications on the Hexa-X identified UCs, KPIs and KVIs. Our solutions cover lower layer aspects, including channel estimation, transceiver design, power amplifier and distributed MIMO related challenges, and higher layer aspects, including AI/ML workload management and orchestration, as well as distributed AI. The latter entails Federated Learning and explainability as means for privacy preserving and trustworthy AI. To bridge the gap between the technical enablers and the 6G targets, some representative numerical results accompany the high level description. Overall, the methodology of the paper starts from the UCs and KPIs/KVIs, to then focus on the proposed technical solutions able to realize them. Finally, a brief discussion of the ongoing regulation activities related to AI is presented, to close our vision towards an AI and ML-driven communication and computation co-design for 6G.
- Federated TSK Models for Predicting Quality of Experience in B5G/6G NetworksIn 2023 IEEE International Conference on Fuzzy Systems (FUZZ) 2023
Real-time applications based on streaming data collected from remote devices, such as smartphones and vehicles, are commonly developed using Artificial Intelligence (AI). Such applications must fulfill different requirements: on one hand, they must ensure good performance and must deliver results in a timely manner; on the other hand, with the objective of being compliant with the AI-specific regulations, they shall preserve data privacy and guarantee a certain level of explainability. In this paper, we describe an AI-based application to predict the Quality of Experience (QoE) for videos acquired by moving vehicles from Beyond 5G and 6G (B5G/6G) network data. To this aim, we exploit a Takagi-Sugeno-Kang (TSK) fuzzy model learned by employing a federated approach, thus meeting, simultaneously, the requests for explainability and data privacy preservation. A thorough experimental analysis, involving also the comparison with an opaque baseline (i.e., a neural network model), is presented and shows that the TSK model can be regarded as a viable solution which guarantees on the one side an optimal trade-off between interpretability and accuracy, and on the other side preserves the data privacy.
- An Application for Federated Learning of XAI Models in Edge Computing EnvironmentsIn 2023 IEEE International Conference on Fuzzy Systems (FUZZ) 2023
The next generation of wireless networks will feature an increasing number of connected devices, which will produce an unprecedented volume of data. Knowledge extraction from decentralized data imposes the exploitation of computing and learning paradigms able to tame the complexity of the network and meet the growing requirement of trustworthiness. In this regard, edge computing overcomes the limitations of cloud computing by moving virtualized computing and storage resources closer to data sources. Furthermore, Federated Learning has been recently proposed as a means to let multiple parties collaboratively train an ML model without disclosing private data. In this paper, we propose an application that enables Federated Learning of eXplainable AI models (Fed-XAI) in an edge computing environment. The proposal represents a step forward towards the adoption of trustworthy AI in next generation wireless networks, ensuring both privacy preservation and explainability. The application components are described, along with the workflow for the training and inference stages. Finally, we discuss the application deployment, in a simulated setting, for addressing a task of video streaming Quality of Experience forecasting in a vehicular network case study.
- Federated Learning of Explainable Artificial Intelligence Models for Predicting Parkinson's Disease ProgressionIn Explainable Artificial Intelligence 2023
Services based on Artificial Intelligence (AI) are becoming increasingly pervasive in our society. At the same time, however, we are also witnessing a growing awareness towards the ethical aspects and the trustworthiness of AI tools, especially in high stakes domains, such as the healthcare one. In this paper, we propose the adoption of AI techniques for predicting Parkinson's Disease progression with the overarching aim of accommodating the urgent need for trustworthiness. We address two key requirements towards trustworthy AI, namely privacy preservation in learning AI models and their explainability. As for the former aspect, we consider the (rather common) case of medical data coming from different health institutions, assuming that they cannot be shared due to privacy concerns. To address this shortcoming, we leverage federated learning (FL) as a paradigm for collaborative model training among multiple parties without any disclosure of private raw data. As for the latter aspect, we focus on highly interpretable models, i.e., those for which humans are able to understand how decisions have been taken. An extensive experimental analysis carried out on a well-known Parkinson Telemonitoring dataset highlights how the proposed approach based on FL of fuzzy rule-based systems allows achieving, simultaneously, data privacy and interpretability. Results are reported for different data partitioning scenarios, also comparing the interpretable-by-design model with an opaque neural network model.
- Federated Learning of Explainable Artificial Intelligence Models: A Proof-of-Concept for Video-streaming Quality Forecasting in B5G/6G networksBárcena, J. L. Corcuera, Daole, M., Ducange, P., Marcelloni, F., Nardini, G., Renda, A., and Stea, G.In xAI-2023 Late-breaking Work, Demos and Doctoral Consortium Joint Proceedings 2023
The next generation of mobile networks is poised to rely extensively on Artificial Intelligence (AI) to deliver innovative services. However, it is crucial for AI systems to fulfill key requirements such as trustworthiness, inclusiveness, and sustainability. Starting from these requirements, we proposed Federated Learning of eXplainable AI (Fed-XAI) models within the Hexa-X EU Flagship Project for 6G. This paper focuses on the implementation of a real-time testbed, serving as a proof of concept for the Fed-XAI paradigm. The testbed utilizes genuine applications and real devices that interact with a mobile network, emulated using the Simu5G simulator. Its primary objective is to provide explainable predictions regarding video-streaming quality in an automotive scenario.
- Trustworthy AI for Next Generation Networks: the Fed-XAI innovative paradigm from the Hexa-X EU Flagship ProjectDucange, P., Marcelloni, F., Micheli, D., Nardini, G., Renda, A., Sabella, D., Stea, G., and Virdis, A.In Proceedings of the Italia Intelligenza Artificiale - Thematic Workshops co-located with the 3rd CINI National Lab AIIS Conference on Artificial Intelligence (Ital IA 2023) 2023
This work presents the joint research activities on AI in and for 6G carried out by University of Pisa, Intel Corporation Italia s.p.a. and Telecom Italia s.p.a., within the Hexa-X EU project. Specifically, we focus on Federated Learning of Explainable Artificial Intelligence (Fed-XAI), which has been recently awarded as key innovation by the EU Innovation Radar. We present the main recent achievements, that can be summarised in algorithms for generating federated XAI models in a privacy-preserved environment, a communication framework for Federated-Learning-as-a-Service and orchestration algorithms of federated learning participants. Finally, we discuss a proof of concept, that showcases the aforementioned Fed-XAI components.
- Experimental Assessment of Heterogeneous Fuzzy Regression Trees2023
Fuzzy Regression Trees (FRTs) are widely acknowledged as highly interpretable ML models, capable of dealing with noise and/or uncertainty thanks to the adoption of fuzziness. The accuracy of FRTs, however, strongly depends on the polynomial function adopted in the leaf nodes. Indeed, their modelling capability increases with the order of the polynomial, even if at the cost of greater complexity and reduced interpretability. In this paper we introduce the concept of Heterogeneous FRT: the order of the polynomial function is selected on each leaf node and can lead either to a zero-order or a first-order approximation. In our experimental assessment, the percentage of the two approximation orders is varied to cover the whole spectrum from pure zero-order to pure first-order FRTs, thus allowing an in-depth analysis of the trade-off between accuracy and interpretability. We present and discuss the results in terms of accuracy and interpretability obtained by the corresponding FRTs on nine benchmark datasets.
2022
- Increasing Accuracy and Explainability in Fuzzy Regression Trees: An Experimental AnalysisIn 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) 2022
Regression Trees (RTs) have been widely used in the last decades in various domains, also thanks to their inherent explainability. Fuzzy RTs (FRTs) extend RTs by using fuzzy sets and have proven to be particularly suitable for dealing with noisy and/or uncertain environments. The modelling capability of FRTs depends, among other factors, on the model used in the leaves for determining the output, and on the inference strategy. Nevertheless, the impact of such factors on FRTs accuracy and explainability has not been adequately investigated.In this paper, we extend a recently proposed learning scheme for FRTs by employing both linear models in the leaves and the maximum matching inference strategy. The former extension aims to increase accuracy, and the latter to improve explainability. We carried out an extensive experimental analysis by comparing the four FRT versions corresponding to any possible combination of the two extensions introduced in the paper. The results show that the best trade-off between accuracy and explainability is obtained by employing both of them.
- An Approach to Federated Learning of Explainable Fuzzy Regression ModelsIn 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) 2022
Federated Learning (FL) has been proposed as a privacy preserving paradigm for collaboratively training AI models: in an FL scenario data owners learn a shared model by aggregating locally-computed partial models, with no need to share their raw data with other parties. Although FL is today extensively studied, a few works have discussed federated approaches to generate explainable AI (XAI) models. In this context, we propose an FL approach to learn Takagi-Sugeno-Kang Fuzzy Rule-based Systems (TSK-FRBSs), which can be considered as XAI models in regression problems. In particular, a number of independent data owner nodes participate in the learning process, where each of them generates its own local TSK-FRBS by exploiting an ad-hoc defined procedure. Then, these models are forwarded to a server that is responsible for aggregating them and generating a global TSK-FRBS, which is sent back to the nodes. An appropriate aggregation strategy is proposed to preserve the explainability of the global TSK-FRBS. A thorough experimental analysis highlights that the proposed approach brings benefits, in terms of accuracy, to data owners participating in the federation preserving the privacy of the data. Indeed, the accuracy achieved by the global TSK-FRBS is higher than the ones of the TSK-FRBSs learned by exploiting only local training data.
- Fed-XAI: Federated Learning of Explainable Artificial Intelligence ModelsBárcenaa, J. L. Corcuera, Daole, M., Ducange, P., Marcelloni, F., Renda, A., Ruffini, F., and Schiavo, A.In XAI.it 2022, the 3rd Italian Workshop on Explainable Artificiale Intelligence 2022
The current era is characterized by an increasing pervasiveness of applications and services based on data processing and often built on Artificial Intelligence (AI) and, in particular, Machine Learning (ML) algorithms. In fact, extracting insights from data is so common in daily life of individuals, companies, and public entities and so relevant for the market players, to become an important matter of interest for institutional organizations. The theme is so relevant that ad hoc regulations have been proposed. One important aspect is given by the capability of the applications to tackle the data privacy issue. Additionally, depending on the specific application field, paramount importance is given to the possibility for the humans to understand why a certain AI/ML-based application is providing that specific output. In this paper, we discuss the concept of Federated Learning of eXplainable AI (XAI) models, in short FED-XAI, purposely designed to address these two requirements simultaneously. AI/ML models are trained with the simultaneous goals of preserving the data privacy (Federated Learning (FL) side) and ensuring a certain level of explainability of the system (XAI side). We first introduce the motivations at the foundation of FL and XAI, along with their basic concepts; then, we discuss the current status of this field of study, providing a brief survey regarding approaches, models, and results. Finally, we highlight the main future challenges
- InformationFederated Learning of Explainable AI Models in 6G Systems: Towards Secure and Automated Vehicle NetworkingRenda, A., Ducange, P., Marcelloni, F., Sabella, D., Filippou, M.C., Nardini, G., Stea, G., Virdis, A., Micheli, D., Rapone, D., and Baltar, L.G.Information 2022
This article presents the concept of federated learning (FL) of eXplainable Artificial Intelligence (XAI) models as an enabling technology in advanced 5G towards 6G systems and discusses its applicability to the automated vehicle networking use case. Although the FL of neural networks has been widely investigated exploiting variants of stochastic gradient descent as the optimization method, it has not yet been adequately studied in the context of inherently explainable models. On the one side, XAI permits improving user experience of the offered communication services by helping end users trust (by design) that in-network AI functionality issues appropriate action recommendations. On the other side, FL ensures security and privacy of both vehicular and user data across the whole system. These desiderata are often ignored in existing AI-based solutions for wireless network planning, design and operation. In this perspective, the article provides a detailed description of relevant 6G use cases, with a focus on vehicle-to-everything (V2X) environments: we describe a framework to evaluate the proposed approach involving online training based on real data from live networks. FL of XAI models is expected to bring benefits as a methodology for achieving seamless availability of decentralized, lightweight and communication efficient intelligence. Impacts of the proposed approach (including standardization perspectives) consist in a better trustworthiness of operations, e.g., via explainability of quality of experience (QoE) predictions, along with security and privacy-preserving management of data from sensors, terminals, users and applications.
- Pervasive Artificial Intelligence in Next Generation Wireless: The Hexa-X Project PerspectiveM.C. Filippou, et al.In First International Workshop on Artificial Intelligence in beyond 5G and 6G Wireless Networks - AI6G2022 2022
The European 6G flagship project Hexa-X has the objective to conduct exploratory research on the next generation of mobile networks with the intention to connect human, physical and digital worlds with a fabric of technology enablers. Within this scope, one of the main research challenges is the ambition for beyond 5G (B5G)/6G systems to support, enhance and enable real-time trustworthy control by transforming Artificial Intelligence (AI)/Machine Learning (ML) technologies into a vital and trusted tool for large-scale deployment of interconnected intelligence available to the wider society. Hence, the study and development of concepts and solutions enabling AI-driven communication and computation co-design for a B5G/6G communication system is required. This paper focuses on describing the possibilities that emerge with the application of AI/ML mechanisms to 6G networks, identifying the resulting challenges and proposing some potential solution approaches.
- Towards Trustworthy AI for QoE prediction in B5G/6G NetworksBárcena, J. L. Corcuera, Ducange, P., Marcelloni, F., Nardini, G., Noferi, A., Renda, A., Stea, G., and Virdis, A.In First International Workshop on Artificial Intelligence in beyond 5G and 6G Wireless Networks - AI6G2022 2022
The ability to forecast Quality of Experience (QoE) metrics will be crucial in several applications and services offered by the future B5G/6G networks. However, QoE timeseries forecasting has not been adequately investigated so far, mainly due to the lack of available realistic datasets. In this paper, we first present a novel QoE forecasting dataset obtained from realistic 5G network simulations and characterized by Quality of Service (QoS) and QoE metrics for a video-streaming application; then, we embrace the topical challenge of trustworthiness in the adoption of AI systems for tackling the QoE prediction task. We show how an eXplainable Artificial Intelligence (XAI) model, namely Decision Tree, can be effectively leveraged for addressing the forecasting problem. Finally, we identify federated learning as a suitable paradigm for privacy-preserving collaborative model training and outline the related challenges from both an algorithmic and 6G network support perspective.
- Online Monitoring of Stance from Tweets: The case of Green Pass in ItalyIn 2022 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS) 2022
- ACM TKDDA News-Based Framework for Uncovering and Tracking City Area Profiles: Assessment in Covid-19 SettingACM Trans. Knowl. Discov. Data 2022
In the last years, there has been an ever-increasing interest in profiling various aspects of city life, especially in the context of smart cities. This interest has become even more relevant recently when we have realised how dramatic events, such as the Covid-19 pandemic, can deeply affect the city life, producing drastic changes. Identifying and analyzing such changes, both at the city level and within single neighborhoods, may be a fundamental tool to better manage the current situation and provide sound strategies for future planning. Furthermore, such fine-grained and up-to-date characterization can represent a valuable asset for other tools and services, e.g. web mapping applications or real estate agency platforms. In this paper, we propose a framework featuring a novel methodology to model and track changes in areas of the city by extracting information from online newspaper articles. The problem of uncovering clusters of news at specific times is tackled by means of the joint use of state-of-the-art language models to represent the articles, and of a density-based streaming clustering algorithm, properly shaped to deal with high-dimensional text embeddings. Further, we propose a method to automatically label the obtained clusters in a semantically meaningful way, and we introduce a set of metrics aimed to track the temporal evolution of clusters. A case study focusing on the city of Rome during the Covid-19 pandemic is illustrated and discussed to evaluate the effectiveness of the proposed approach.
- Responsible Artificial Intelligence as a Driver of Innovation in Society and Industry2022
We describe the most recent research activities of the Artificial Intelligence R&D (AI-RD) group of the Department of Information Engineering of the University of Pisa. The group includes the authors of this contribution (one full professor, two associate professors, and two researchers) along with some PhD students. We also give a glimpse on some of the projects in which the group is involved. Specifically, we focus on those projects where AI is exploited as a driver of innovation in society and industry, especially pointing out some features that led us to label AI as "Responsible".
2021
- A federated fuzzy c-means clustering algorithmIn Int'l Workshop on Fuzzy Logic and Applications 2021 2021
Traditional clustering algorithms require data to be centralized on a single machine or in a datacenter. Due to privacy issues and traffic limitations, in several real applications data cannot be transferred, thus hampering the effectiveness of traditional clustering algorithms, which can operate only on locally stored data. In the last years a new paradigm has been gaining popularity: Federated Learning (FL). FL enables the collaborative training of data mining models and, at the same time, preserves data locally at the data owners’ places, decoupling the ability to perform machine learning from the need to transfer data. In this context, we propose the federated version of the popular fuzzy 𝑐-means clustering algorithm. We first describe this version through pseudo-code and then demonstrate that the clusters obtained by the federated approach coincide with those generated by the classical algorithm executed on the union of all the local datasets. We also present an analysis on how privacy is preserved. Finally, we show some experimental results on the performance of the federated version when only a number of clients are involved in the clustering process.
- XAI Models for Quality of Experience Prediction in Wireless NetworksIn 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) 2021
Explainable Artificial Intelligence (XAI) is expected to play a key role in the design phase of next generation cellular networks. As 5G is being implemented and 6G is just in the conceptualization stage, it is increasingly clear that AI will be essential to manage the ever-growing complexity of the network. However, AI models will not only be required to deliver high levels of performance, but also high levels of explainability. In this paper we show how fuzzy models may be well suited to address this challenge. We compare fuzzy and classical decision tree models with a Random Forest (RF) classifier on a Quality of Experience classification dataset. The comparison suggests that, in our setting, fuzzy decision trees are easier to interpret and perform comparably or even better than classical ones in identifying stall events in a video streaming application. The accuracy drop with respect to RF classifier, which is considered to be a black-box ensemble model, is counterbalanced by a significant gain in terms of explainability.
- IEEE AccessAddressing Event-Driven Concept Drift in Twitter Stream: A Stance Detection ApplicationIEEE Access 2021
The content posted by users on Social Networks represents an important source of information for a myriad of applications in the wide field known as ‘social sensing’. The Twitter platform in particular hosts the thoughts, opinions and comments of its users, expressed in the form of tweets: as a consequence, tweets are often analyzed with text mining and natural language processing techniques for relevant tasks, ranging from brand reputation and sentiment analysis to stance detection. In most cases the intelligent systems designed to accomplish these tasks are based on a classification model that, once trained, is deployed into the data flow for online monitoring. In this work we show how this approach turns out to be inadequate for the task of stance detection from tweets. In fact, the sequence of tweets that are collected everyday represents a data stream. As it is well known in the literature on data stream mining, classification models may suffer from concept drift, i.e. a change in the data distribution can potentially degrade the performance. We present a broad experimental campaign for the case study of the online monitoring of the stance expressed on Twitter about the vaccination topic in Italy. We compare different learning schemes and propose yet a novel one, aimed at addressing the event-driven concept drift.
- Integration of Web-Scraped Data in CPM Tools: The Case of Project SibillaIn Proceedings of Fifth International Congress on Information and Communication Technology 2021
Modern corporate performance management (CPM) systems are crucial tools for enterprises, but they typically lack a seamless integration with solutions in the Industry 4.0 domain for the exploitation of large amounts of data originated outside the enterprise boundaries. In this paper, we propose a solution to this problem, according to lessons learned in the development of project “Sibilla,” aimed at devising innovative tools in the business intelligence area. A proper software module is introduced with the purpose of enriching existing predictive analysis models with knowledge extracted from the Web and social networks. In particular, we describe how to support two functionalities: identification of planned real-world events and monitoring of public opinion on topics of interest to the company. The effectiveness of the proposed solution has been evaluated by means of a long-term experimental campaign.
- Mining the Stream of News for City Areas Profiling: a Case Study for the City of RomeIn 2021 IEEE International Conference on Smart Computing (SMARTCOMP) 2021
Tracking and profiling changes in the occurrence of notable events in a city, in terms of what happens in the different areas and how possible changes are perceived, is an important issue in the context of smart cities: in fact, it may be helpful in developing applications to help administrations and citizens alike. In this paper, we propose an approach to provide time-sensitive snapshots of events within the different areas of a city, and the city as a whole. To probe inside neighborhoods and communities, we propose to use articles in online newspapers, as they represent an accessible source of information on what notable events actually happen, and on the most relevant topics at a given moment in time. We adopt an approach to group up articles by means of clustering, and to automatically assign labels to clusters by analyzing their content. The outcomes of this procedure, repeated along a certain timespan, are able to describe the temporal evolution of notable events in specific city areas. In this paper we show the effectiveness of the proposed methodology by reporting a case study for the city of Rome, over an investigation span of few years, which includes also the Covid-19 pandemic period.
2020
- IEEE ISStance Analysis of Twitter Users: the Case of the Vaccination Topic in ItalyIEEE Intelligent Systems 2020
People's opinion around social and political issues is currently witnessed by messages ordinarily posted on social media. With reference to a specific case study, namely the vaccination topic in Italy, this work discusses a crucial aspect in structuring the data processing pipeline in intelligent systems aimed at monitoring the public opinion through Twitter messages: a plain analysis of tweet contents is not sufficient to grasp the diversity of behavior across users. To get a sharper picture of the public opinion as expressed on social media, user-related information must be incorporated in the analysis. Relying on a dataset of tweets about vaccination and on an established text classification system, we present the results of a stance monitoring campaign with advanced analysis on temporal and spatial scales. The overall methodological workflow provides a sound solution for public opinion assessment from Twitter data.
- IEEE TFSTSF-DBSCAN: a Novel Fuzzy Density-based Approach for Clustering Unbounded Data StreamsIEEE Transactions on Fuzzy Systems 2020
In recent years, several clustering algorithms have been proposed with the aim of mining knowledge from streams of data generated at a high speed by a variety of hardware platforms and software applications. Among these algorithms, density-based approaches have proved to be particularly attractive, thanks to their capability of handling outliers and capturing clusters with arbitrary shapes. The streaming setting poses additional challenges that need to be addressed as well: data streams are potentially unbounded and affected by concept drift, i.e. a modification over time in the underlying data generation process. In this paper, we propose Temporal Streaming Fuzzy DBSCAN (TSF-DBSCAN), a novel fuzzy clustering algorithm for streaming data. TSF-DBSCAN is an extension of the well-known DBSCAN algorithm, one of the most popular density-based clustering approaches. Fuzziness is introduced in TSF-DBSCAN to model the uncertainty about the distance threshold that defines the neighborhood of an object. As a consequence, TSF-DBSCAN identifies clusters with fuzzy overlapping borders. A fading model, which makes objects less relevant as they become more remote in time, endows TSF-DBSCAN with the capability of adapting to evolving data streams. The integration of the model in a two-stage approach ensures computational and memory efficiency: during the online stage continuously arriving objects are organized in proper data structures that are later exploited in the offline stage to determine a fine-grained partition. An extensive experimental analysis on synthetic and real world datasets shows that TSF-DBSCAN yields competitive performance when compared to other clustering algorithms recently proposed for streaming data.
- FDBSCAN-APT: A Fuzzy Density-based Clustering Algorithm with Automatic Parameter TuningIn 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) 2020
Density-based clustering algorithms represent a convenient approach when the number of clusters is not known in advance and their shapes are arbitrary. Nevertheless, they are highly sensitive to the input parameter setting, especially when clusters' borders are close to each other, or even overlap. In this paper we propose FDBSCAN-APT, a fuzzy extension of the DBSCAN algorithm. FDBSCAN-APT is able to discover clusters with fuzzy overlapping borders and relies on the automatic setting of input parameters thanks to the definition of a novel heuristic based on the statistical modelling of the density distribution of objects. An extensive experimental analysis carried out on synthetic datasets shows that FDBSCAN-APT always finds reasonable parameter configurations and produces good clustering results in a variety of challenging scenarios.
2019
- A Fuzzy Density-based Clustering Algorithm for Streaming DataIn 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) 2019
The exploitation of data streams, nowadays provided nonstop by a myriad of diverse applications, asks for specific analysis methods. In this paper, we propose SF-DBSCAN, a fuzzy version of the DBSCAN algorithm, aimed to perform unsupervised analysis of streaming data. Fuzziness is introduced by fuzzy borders of density-based clusters. We describe and discuss the proposed algorithm, which evolves the clusters at each occurrence of a new object. Three synthetic datasets are used to show the ability of SF-DBSCAN to successfully track changes of data distribution, thus properly addressing concept drift. SF-DBSCAN is compared with a basic, crisp streaming version of DBSCAN with regard to modelling effectiveness.
- ESWAMonitoring the public opinion about the vaccination topic from tweets analysisExpert Systems with Applications 2019
The paper presents an intelligent system to automatically infer trends in the public opinion regarding the stance towards the vaccination topic: it enables the detection of significant opinion shifts, which can be possibly explained with the occurrence of specific social context-related events. The Italian setting has been taken as the reference use case. The source of information exploited by the system is represented by the collection of vaccine-related tweets, fetched from Twitter according to specific criteria; subsequently, tweets undergo a textual elaboration and a final classification to detect the expressed stance towards vaccination (i.e. in favor, not in favor, and neutral). In tuning the system, we tested multiple combinations of different text representations and classification approaches: the best accuracy was achieved by the scheme that adopts the bag-of-words, with stemmed n-grams as tokens, for text representation and the support vector machine model for the classification. By presenting the results of a monitoring campaign lasting 10 months, we show that the system may be used to track and monitor the public opinion about vaccination decision making, in a low-cost, real-time, and quick fashion. Finally, we also verified that the proposed scheme for continuous tweet classification does not seem to suffer particularly from concept drift, considering the time span of the monitoring campaign.
- ESWAComparing ensemble strategies for deep learning: An application to facial expression recognitionExpert Systems with Applications 2019
Recent works have shown that Convolutional Neural Networks (CNNs), because of their effectiveness in feature extraction and classification tasks, are suitable tools to address the Facial Expression Recognition (FER) problem. Further, it has been pointed out how ensembles of CNNs allow improving classification accuracy. Nevertheless, a detailed experimental analysis on how ensembles of CNNs could be effectively generated in the FER context has not been performed yet, although it would have considerable value for improving the results obtained in the FER task. This paper aims to present an extensive investigation on different aspects of the ensemble generation, focusing on the factors that influence the classification accuracy on the FER context. In particular, we evaluate several strategies for the ensemble generation, different aggregation schemes, and the dependence upon the number of base classifiers in the ensemble. The final objective is to provide some indications for building up effective ensembles of CNNs. Specifically, we observed that exploiting different sources of variability is crucial for the improvement of the overall accuracy. To this aim, pre-processing and pre-training procedures are able to provide a satisfactory variability across the base classifiers, while the use of different seeds does not appear as an effective solution. Bagging ensures a high ensemble gain, but the overall accuracy is limited by poor-performing base classifiers. The impact of increasing the ensemble size specifically depends on the adopted strategy, but also in the best case the performance gain obtained by involving additional base classifiers becomes not significant beyond a certain limit size, thus suggesting to avoid very large ensembles. Finally, the classic averaging voting proves to be an appropriate aggregation scheme, achieving accuracy values comparable to or slightly better than the other experimented operators.
- Opinion and Job & Business Opportunity Mining2019
Il gruppo di Data Mining e Machine Learning del Dipartimento di Ingegneria dell’Informazione dell’Università di Pisa si sta attualmente occupando dell’applicazione di tecniche tipiche dell’Intelligenza Artificiale a vari sistemi di supporto alla Business Intelligence aziendale. In questo documento si presentano alcune attività progettuali attualmente svolte dal gruppo, focalizzate sull’estrazione di conoscenza da grandi moli di dati per renderla disponibile a strumenti software utilizzati dalle aziende a supporto del loro business e per la definizione delle relative strategie di marketing e di sviluppo. I dati considerati sono prevalentemente testuali e possono venire estratti anche da Web e Social Networks.
- Assessing Accuracy of Ensemble Learning for Facial Expression Recognition with CNNs2019
Automatic facial expression recognition has recently attracted the interest of researchers in the field of computer vision and deep learning. Convolutional Neural Networks (CNNs) have proved to be an effective solution for feature extraction and classification of emotions from facial images. Further, ensembles of CNNs are typically adopted to boost classification performance.In this paper, we investigate two straightforward strategies adopted to generate error-independent base classifiers in an ensemble: the first strategy varies the seed of the pseudo-random number generator for determining the random components of the networks; the second one combines the seed variation with different transformations of the input images. The comparison between the strategies is performed under two different scenarios, namely, training from scratch an ad-hoc architecture and fine-tuning a state-of-the-art model. As expected, the second strategy, which adopts a higher level of variability, yields to a more effective ensemble for both the scenarios. Furthermore, training from scratch an ad-hoc architecture allows achieving on average a higher classification accuracy than fine-tuning a very deep pretrained model. Finally, we observe that, in our experimental setup, the increase of the ensemble size does not guarantee an accuracy gain.