Data organized in graphs have become omnipresent in machine learning and more generally in data science. They allow to represent both entities (variables, individuals, etc.) and the complex interaction relationships between them. Combined with machine learning (ML) methods, this point of view on the data has allowed spectacular advances in many fields such as in bioinformatics (e.g. prediction of 3D structure of proteins, see AlphaFold algorithm), in brain imaging, genomics, or in computational social sciences. These data raise specific issues: they do not admit a simple representation in vector form, are often heterogeneous and of high dimension. Dealing with them thus requires advanced analysis methods.
We have therefore witnessed a recent explosion of scientific works aiming at modeling/manipulating graphs, leading to various methodologies and approaches (graph learning, graph signal processing, optimal transport on graphs, graph kernels, graph neural networks, etc.).
The objective of this course is threefold. First, it aims at presenting the essential tools of graph theory for data science, second, it introduces the classical machine learning methods for dealing with graphs and finally, it presents some of the most recent ML methods with structured data.
Particular emphasis will be placed on the practical implementation of the various algorithms seen during the course (using Python).
[1] Aric A. Hagberg, Daniel A. Schult and Pieter J. Swart, Exploring network structure, dynamics, and function using NetworkX. https://networkx.org/
[2] Michaël Defferrard, Lionel Martin, Rodrigo Pena, and Nathanaël Perraudin. PyGSP: Graph Signal Processing in Python. https://github.com/epfl-lts2/pygsp/
[3] Giannis Siglidis, Giannis Nikolentzos, Stratis Limnios, Christos Giatsidis, Konstantinos Skianis, Michalis Vazirgiannis. GraKeL: A Graph Kernel Library in Python. https://github.com/ysig/GraKeL
[4] Fey, Matthias and Lenssen, Jan E. Fast Graph Representation Learning with PyTorch Geometric https://github.com/pyg-team/pytorch_geometric
[5] Rémi Flamary, Nicolas Courty, Alexandre Gramfort et al. POT: Python Optimal Transport. https://github.com/PythonOT/POT
Some notions in linear algebra, statistics and Python programming. It's advisable to have some knowledge of machine learning. A short introduction/recap of ML will be given at the beginning of the course.
The evaluation consists in a report and an oral presentation (6th November) about an article in the list proposed underneath. It can be done by pairs of students (if you agree, not required). The work has to be sent to the lecturers.
The report has to be in two parts:
1) Answer the following questions, so as to provide a general overview and of some technical points of the article (with an indication of the expected length of pages).
a) What is the main scientific questions raised in this article ? What is the field of the question, and the main field of the authors or their group/laboratory ?
b) Described the main contribution of the article; what are the novelties (ideas, techniques) ?
c) Write a stat-of-the-art about the scientific question asked, trying to summarize what are alternative solutions in the scientific literature, and writing what is different between the proposed method and other existing ones.
d) Write the model or the question put forward in this article.
e) Explain a technical point that is important in finding the solution (note: the objective is not paraphrasing the article, but finding a point where additional explanations are useful, and develop them). (1 p.)
f) Describe the significance of the experimental results
g) Give what you think are some strong points, and some weak points, of the article
2) Choose an aspect of the article that can be implemented and studied. Then, code a method proposed or described in the article and conduct numerical experiments. A report (with equations, figures, tables,...) about what you did should be given (2 to 4 pages) in addition to a code that can be run in python, preferentially in a notebook. This code should have an identified main program, the data used should be appended (or clearly identified, with a direct donwnload link), so that we can run it and reproduce the results obtained.
Wasserstein Weisfeiler-Lehman Graph Kernels, Matteo Togninalli, Elisabetta Ghisu, Felipe Llinares-López, Bastian Rieck, and Karsten Borgwardt, NeurIPS 2019.
Inductive Representation Learning on Large Graphs, William L. Hamilton, Rex Ying, Jure Leskovec, NeurIPS 2017.
Simplifying Graph Convolutional Networks, Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, Kilian Weinberger ICML, 2019.
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric Lenssen, Gaurav Rattan, Martin Grohe, AAAI 2019.
Invariant and Equivariant Graph Networks, Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman, ICLR 2019.
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst, NeurIPS 2016.
Hierarchical Graph Representation Learning with Differentiable Pooling, Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, Jure Leskovec, NeurIPS 2019
MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian Y He, M Perlmutter, G Reinert, M Cucuringu, Learning on Graphs Conference, 2022
For all GNN in the supervised part, there is a bonus if the experiments are done after taking into account:
A Fair Comparison of Graph Neural Networks for Graph Classification, Federico Errica, Marco Podda, Davide Bacciu, Alessio Micheli, ICLR 2020
Open Graph Benchmark: Datasets for Machine Learning on Graphs, Hu, Weihua and Fey, Matthias and Zitnik, Marinka and Dong, Yuxiao and Ren, Hongyu and Liu, Bowen and Catasta, Michele and Leskovec, Jure, 2020.
Time-varying graph learning with constraints on graph temporal variation. K Yamada, Y Tanaka, A Ortega. arXiv preprint arXiv:2001.03346, 2020.
Simultaneous Graph Signal Clustering and Graph Learning, Abdullah Karaaslanli, Selin Aviyente Proceedings of the 39th International Conference on Machine Learning, PMLR 162:10762-10772, 2022.