Data organized in graphs have become omnipresent in machine learning and more generally in data science. They allow to represent both entities (variables, individuals, etc.) and the complex interaction relationships between them. Combined with machine learning (ML) methods, this point of view on the data has allowed spectacular advances in many fields such as in bioinformatics (e.g. prediction of 3D structure of proteins, see AlphaFold algorithm), in brain imaging, genomics, or in computational social sciences. These data raise specific issues: they do not admit a simple representation in vector form, are often heterogeneous and of high dimension. Dealing with them thus requires advanced analysis methods.
We have therefore witnessed a recent explosion of scientific works aiming at modeling/manipulating graphs, leading to various methodologies and approaches (graph learning, graph signal processing, optimal transport on graphs, graph kernels, graph neural networks, etc.).
The objective of this course is threefold. First, it aims at presenting the essential tools of graph theory for data science, second, it introduces the classical machine learning methods for dealing with graphs and finally, it presents some of the most recent ML methods with structured data.
Particular emphasis will be placed on the practical implementation of the various algorithms seen during the course (using Python).
[1] Aric A. Hagberg, Daniel A. Schult and Pieter J. Swart, Exploring network structure, dynamics, and function using NetworkX. https://networkx.org/
[2] Michaël Defferrard, Lionel Martin, Rodrigo Pena, and Nathanaël Perraudin. PyGSP: Graph Signal Processing in Python. https://github.com/epfl-lts2/pygsp/
[3] Giannis Siglidis, Giannis Nikolentzos, Stratis Limnios, Christos Giatsidis, Konstantinos Skianis, Michalis Vazirgiannis. GraKeL: A Graph Kernel Library in Python. https://github.com/ysig/GraKeL
[4] Fey, Matthias and Lenssen, Jan E. Fast Graph Representation Learning with PyTorch Geometric https://github.com/pyg-team/pytorch_geometric
[5] Rémi Flamary, Nicolas Courty, Alexandre Gramfort et al. POT: Python Optimal Transport. https://github.com/PythonOT/POT
Some notions in linear algebra, statistics and Python programming. Notions in machine learning theory is a plus but an introduction will be given at the beginning of the course.
The evaluation consists in a report and an oral presentation (15th November) about an article in the list proposed underneath. It can be done by pairs of students (if you agree, not required). The work has to be sent to the lecturers.
The report has to be in two parts:
1) Answer the following questions, so as to provide a general overview and of some technical points of the article (with an indication of the expected length of pages).
a) What is the main scientific questions raised in this article ? What is the field of the question, and the main field of the authors or their group/laboratory ?
b) Described the main contribution of the article; what are the novelties (ideas, techniques) ?
c) Write a stat-of-the-art about the scientific question asked, trying to summarize what are alternative solutions in the scientific literature, and writing what is different between the proposed method and other existing ones.
d) Write the model or the question put forward in this article.
e) Explain a technical point that is important in finding the solution (note: the objective is not paraphrasing the article, but finding a point where additional explanations are useful, and develop them). (1 p.)
f) Describe the significance of the experimental results
g) Give what you think are some strong points, and some weak points, of the article
2) Choose an aspect of the article that can be implemented and studied. Then, code a method proposed or described in the article and conduct numerical experiments. A report (with equations, figures, tables,...) about what you did should be given (2 to 4 pages) in addition to a code that can be run in python, preferentially in a notebook. This code should have an identified main program, the data used should be appended (or clearly identified, with a direct donwnload link), so that we can run it and reproduce the results obtained.
On Valid Optimal Assignment Kernels and Applications to Graph Classification, Nils M. Kriege, Pierre-Louis Giscard, Richard C. Wilson, NeurIPS, 2016. (taken)
Wasserstein Weisfeiler-Lehman Graph Kernels, Matteo Togninalli, Elisabetta Ghisu, Felipe Llinares-López, Bastian Rieck, and Karsten Borgwardt, NeurIPS 2019.
The following two articles contitute actually one work (because some of them will be presented in the course). If you choose these articles, we ask you to make a comparison between them from a theoretical point of view (different architectures, purposes, the limitations ...).
How Powerful are Graph Neural Networks?, Keyulu Xu, Weihua Hu, Jure Leskovec, Stefanie Jegelka, ICLR 2019. (taken)
Template based Graph Neural Network with Optimal Transport Distances, Cédric Vincent-Cuaz, Rémi Flamary, Marco Corneli, Titouan Vayer, Nicolas Courty, NeurIPS 2022. (taken)
Invariant and Equivariant Graph Networks, Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman, ICLR 2019.
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst, NeurIPS 2016.(taken)
Hierarchical Graph Representation Learning with Differentiable Pooling, Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, Jure Leskovec, NeurIPS 2019
Position-aware Graph Neural Networks, Jiaxuan You, Rex Ying, Jure Leskovec, ICML 2019(taken)
For all GNN in the supervised part, there is a bonus if the experiments are done after taking into account:
A Fair Comparison of Graph Neural Networks for Graph Classification, Federico Errica, Marco Podda, Davide Bacciu, Alessio Micheli, ICLR 2020
Open Graph Benchmark: Datasets for Machine Learning on Graphs, Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure, 2020.
Learning graphs from data: A signal representation perspective, Xiaowen Dong, Dorina Thanou, Michael Rabbat, Pascal Frossard, IEEE Signal Processing Magazine, 2019.
Compressive Spectral Clustering, Nicolas Tremblay, Gilles Puy, Remi Gribonval, Pierre Vandergheynst Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1002-1011, 2016. (taken)
Simultaneous Graph Signal Clustering and Graph Learning, Abdullah Karaaslanli, Selin Aviyente Proceedings of the 39th International Conference on Machine Learning, PMLR 162:10762-10772, 2022.
Learning Structural Node Embeddings via Diffusion Wavelets. Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. KDD 2018. (taken)