PC蛋蛋

统计学国家重点学科(2007) English

学术讲座
【学术讲座】ParKShap: A new algorithm for computing Shapley values that can identify sub-populations on which features have different effects on outcomes. (ParKShap:一种识别特征对结果具有不同影响子群体的Shapley值新算法)
2026-03-18 14:29

主  题:ParKShap: A new algorithm for computing Shapley values that can identify sub-populations on which features have different effects on outcomes. ParKShap:一种识别特征对结果具有不同影响子群体的Shapley值新算法)

主讲人:英国利兹大学数学PC蛋蛋 副教授 Georgios Aivaliotis

主持人:统计与数据科学PC蛋蛋 张尧钧

时间:2026327日(周五)1000-1100

地点:柳林校区诚正楼1320会议室

主办单位:统计与数据科学PC蛋蛋


主讲人简介:

Dr Georgios Aivaliotis's research interests evolve around probability, statistics, stochastic processes and machine learning. Applications of his research can be found in the broad area of financial and actuarial mathematics, data analytics, survival analysis as well as financial technologies.

He has worked in the area of stochastic control for mean-variance type problems and applications in portfolio selection and agent remuneration. Currently, He is applying ideas from probability and statistics into the field of Data Analytics, in particular he is working on robust temporal pattern mining from data that are time stamped. This is part of the EPSRC funded project QuantiCode in which he is a Co-Investigator.

He is an Alan Turing Fellow and co-Investigator on the Alan Turing Project "Modelling the joint effects of temporal, heterogeneous datasets".

Throughout his career, he has kept close links with several industrial partners and government organisations that he has had projects with (CallCredit, Leeds City Council, Jet2 and others). He is an associate member of the Leeds Institute for Data Analytics (LIDA).

Georgios Aivaliotis的研究兴趣集中于概率论、统计学、随机过程及机器学习领域。其研究成果可广泛应用于金融数学、精算数学、数据分析、生存分析及金融科技等交叉领域。

他曾在均值-方差型随机控制问题及其在投资组合选择与代理人薪酬中的应用领域开展研究。目前,他正将概率论与统计学思想应用于数据分析领域,重点研究基于时间戳数据的稳健时间模式挖掘——这是其作为共同研究员参与的EPSRC资助项目"QuantiCode"的重要组成部分。

作为艾伦·图灵研究所研究员,他共同领导了图灵项目"时变异构数据集的联合效应建模"。在其学术生涯中,始终与CallCredit、利兹市议会、Jet2等多家行业伙伴及政府机构保持紧密项目合作。现为利兹数据分析研究所(LIDA)会员。

内容提要:

In this work we try to address the question of identifying sub-populations of interest within a dataset of instances for which it appears that the effects of the features/covariates on the outcome differ. We consider this to be equivalent to the statement that the instances of these sub-population come from a different data generating process. Another explanation is that there exists an unobservable variable which interacts with the observed features. An example of such a situation is when a certain medication generally found to be effective for treatment of a disease is ineffective for a small number of patients. We try to address this question using Shapley values, a concept used traditionally in attributing model predictions to features. Our approach is model free in the sense that we do not try to interpret a specific model. We rather try to use the underlying idea of Shapley values to create clusters of data instances that present the property of interest and the attribute the cluster membership to features. This way we try to shed light to the mechanism that leads to this different effect of particular features to the outcome of interest.

本研究旨在解决数据集中特定子群体的识别问题——这些子群体中的特征/协变量对结果的影响存在显著差异。我们认为这种现象等价于这些子群体实例源自不同的数据生成过程,另一种可能的解释是存在与观测特征相互作用的未观测变量。例如,某种对疾病治疗普遍有效的药物在少数患者身上却无效的情况。我们尝试利用将模型预测归因于特征的Shapley值概念来解决这个问题。该方法具有模型无关性,即我们不试图解释特定模型,而是通过Shapley值的基本思想,对呈现该特性的数据实例进行聚类,并将聚类成员关系归因于特征。通过这种方式,我们试图揭示特定特征对目标结果产生不同影响的内在机制。