Paris-Saclay #JDSE2018

Speakers

Christine Balagué
Professor at Institut Mines Telecom Business School, Titulaire de la Chaire Réseaux Sociaux, France

Enjeux éthiques et responsabilité des technologies

Abstract:

Les technologies d'intelligence artificielle et les usages croissants des systèmes algorithmiques impactent la vie quotidienne des individus et nos sociétés.

En 2018, la révolution digitale s’est retrouvée au cœur de nombreux débats sociétaux, le clivage devenant plus marqué entre des représentations de la technologie très positives d’une part et d’autres plus fermement négatives.

Ces débats sont liés aux enjeux éthiques qu'engendrent le développement massif des technologies et leurs usages dans nos sociétés. Les modèles dominants sont portés par les Etats-Unis et la Chine et portent des valeurs profondément différentes de celles qui ont crée l'Europe. Nous discuterons dans cet exposé les différents enjeux éthiques des technologies, depuis la recherche jusqu'aux applications, ainsi que des pistes futures permettant de développer un modèle plus responsable des technologies.

Bio:

Christine Balagué est Professeur et Titulaire de la Chaire réseaux sociaux et objets connectés à l’Institut Mines-Télécom Business School, et a été Vice-présidente du Conseil National du Numérique de 2013 à 2015. Ses recherches portent sur la modélisation du comportement des individus connectés, en particulier sur les réseaux sociaux et avec des objets connectés. Elle est également membre de la CERNA (Comité d’Ethique de la Recherche sur le Numérique d’Allistène) et de l’Institut de Convergences DATAIA sur les sciences de données et l’intelligence artificielle. En tant que VP du Conseil National du Numérique, elle a participé à différents travaux remis au gouvernement français sur les grandes questions du numérique (Neutralité du Net, Neutralité des plateformes, E-inclusion, E-éducation, E-santé, concertation nationale). Elle est également l’auteur de nombreux ouvrages sur le développement de l’Internet en France et sur les réseaux sociaux. Habilitée à Diriger des Recherches, Christine Balagué est docteur en Sciences de Gestion, diplômée de l’ESSEC et d’un Master d’économétrie à l’ENSAE.

Juliana Freire
Professor of Computer Science and Engineering and Data Science Executive Director, NYU Moore-Sloan Data Science Environment, United States of America

Democratizing Urban Data Exploration

Abstract:

The large volumes of urban data, along with vastly increased computing power, open up new opportunities to better understand cities. Encouraging success stories show that data can be leveraged to make operations more efficient, inform policies and planning, and improve the quality of life for residents. However, analyzing urban data often requires a staggering amount of work, from identifying relevant data sets, cleaning and integrating them, to performing exploratory analyses and creating predictive models that must take into account spatio-temporal processes. Our long-term goal is to enable domain experts to crack the code of cities by freely exploring the vast amounts of urban data. In this talk, we will present methods and systems that combine data management, analytics, and visualization to increase the level of interactivity, scalability, and usability for urban data exploration.

Bio:

Juliana Freire is a Professor of Computer Science and Engineering and Data Science at New York University. She holds an appointment at the Courant Institute for Mathematical Science, is a faculty member at the NYU Center for Urban Science and at the NYU Center of Data Science. She is the executive director of the NYU Moore-Sloan Data Science Environment, chair of the ACM SIGMOD and a council member of the Computing Community Consortium (CCC). Her recent research has focused on big-data analysis and visualization, large-scale information integration, web crawling and domain discovery, provenance management, and computational reproducibility. Prof. Freire is an active member of the database and Web research communities, with over 170 technical papers, several open-source systems, and 12 U.S. patents. She is an ACM Fellow and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She has chaired or co-chaired workshops and conferences, and participated as a program committee member in over 70 events. Her research grants are from the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T, the University of Utah, New York University, Microsoft Research, Yahoo! and IBM.

Patrice Simard
Microsoft Research AI Lab, Redmond, United States of America

Machine Learning -- What’s next?

Abstract:

For many Machine Learning (ML) problems, labeled data is readily available. When this is the case, algorithms and training time are the performance bottleneck. This is the ML researcher’s paradise! Vision and Speech are good examples of such problems because they have a stable distribution and additional human labels can be collected each year. Problems that extract their labels from history, such as click prediction, data analytics, and forecasting are also blessed with large numbers of labels. Unfortunately, there are only a few problems for which we can rely on such an endless supply of free labels. They receive a disproportionally large amount of attention from the media.

We are interested in tackling the much larger class of ML problems where labeled data is sparse. For example, consider a dialog system for a specific app to recognize specific commands such as “lights on first floor off”, “increase spacing between 2nd and 3rd paragraph”, “make doctor appointment after Hawaii vacation”. Anyone who has attempted building such a system has soon discovered that generalizing to new instances from a small custom set of labeled instances is far more difficult than they originally thought. Each domain has its own generalization challenges, data exploration and discovery, custom features, and decomposition structure. Creating labeled data to communicate custom knowledge is inefficient. It also leads to embarrassing errors resulting from over-training on small sets. ML algorithms and processing power are not a bottleneck when labeled data is scarce. The bottleneck is the teacher and the teaching language.

To address this problem, we change our focus from the learning algorithm to teachers. We define “Machine Teaching” as improving the human productivity given a learning algorithm. If ML is the science and engineering of extracting knowledge from data, Machine Teaching is the science and engineering of extracting knowledge from teachers. A similar shift of focus has happened in computer science. While computing is revolutionizing our lives, systems sciences (e.g., programming languages, operating systems, networking) have shifted their foci to human productivity. We expect a similar trend will shift science from Machine Learning to Machine Teaching.

The aim of this talk is to convince the audience that we are asking the right questions. We provide some answers and some spectacular results. The most exciting part, however, is the research opportunities that come with the emergence of a new field.

Bio:

Patrice Simard is a Distinguished Engineer in the Microsoft Research AI Lab in Redmond. He is passionate about finding new ways to combine engineering and science in the field of machine learning. Simard’s research is currently focused on human teachers. His goal is to extend the teaching language, science, and engineering, beyond the traditional (input, label) pairs. Simard completed his PhD thesis in Computer Science at the University of Rochester in 1991. He then spent 8 years at AT&T Bell Laboratories working on neural networks. He joined Microsoft Research in 1998. In 2002, he started MSR’s Document Processing and Understanding research group. In 2006, he left MSR to become the Chief Scientist and General Manager of Microsoft’s Live Labs Research. In 2009, he became the Chief Scientist of Microsoft’s AdCenter (the organization that monetizes Bing search). In 2012, he returned to Microsoft Research to work on his passion, Machine Learning research. Specifically, he founded the Computer-Human Interactive Learning (CHIL) group to study Machine Teaching and to make machine learning accessible to everyone.

Program

Thursday 13 September 2018

8:15 Coffee and conference registration
8:45 Opening

9:20 - 10:20 Keynote Patrice Simard (Microsoft Research AI Lab) Machine Learning -- What’s next?

10:20 - 10:50 Coffee break

Machine Learning

10:50 - 11:10 Thermodynamics of Restricted Boltzmann Machines. Giancarlo Fissore, Aurelien Decelle and Cyril Furtlehner. Inria

11:10 - 11:30 Wasserstein regularization for sparse multi-task regression. Hicham Janati, Marco Cuturi and Alexandre Gramfort. Inria

11:30 - 11:50 Automated machine learning with Monte Carlo Tree Search. Herilalaina Rakotoarison and Michèle Sebag. CNRS, Inria, LRI

11:50 - 12:10 Infinite-Task Learning with Vector-Valued RKHSs. Alex Lambert, Romain Brault, Zoltan Szabo, Maxime Sangnier and Florence d'Alché-buc. Télécom ParisTech

12:10 - 12:30 Predictive Data Mining for Multi-Variate Time Series in a Distributed Environment. Jingwei Zuo, Karine Zeitouni, Yehia Taher and Raef Mousheimish. UVSQ

12:30 - 13h50 Lunch at Proto 204

13:50 - 14:50 Keynote Juliana Freire (NYU Moore-Sloan Data Science Environment) Democratizing Urban Data Exploration

14:50 - 15:20 Coffee break

Statistical theory for data science

15:20 - 15:40 Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions. Boris Muzellec and Marco Cuturi. ENSAE

15:40 - 16:00 Finite Sample Bounds for Superquantile Linear Prediction. Evgenii Chzhen, Joseph Salmon and Zaid Harchaoui. Télécom ParisTech

Deep learning

16:00 - 16:20 Optimizing deep video representation to match brain activity. Hugo Richard, Ana Luisa Pinho, Bertrand Thirion and Guillaume Charpiat. Inria

16:20 - 16:40 Generative Neural Networks for Global Optimization with Gradients. Louis Faury and Olivier Fercoq. Criteo

16:40 - 17:00 Neuroevolution with CMA-ES for the tuning of a PID controller of nonholonomic car-like mobile robot. Mohamed Outahar and Eric Lucet. CEA

17:00 - 19:00 Poster session and cocktail at Proto 204

Improving the generalization capacity of nonlinear regression and classiﬁcation models with generative models. Nabil Benmerad. DCBrain - Microsoft AI Factory

H_/ H_infinity Robust Observer for Actuator Fault Detection and Diagnosis of a Quadrotor. Eslam Abouselima, Said Mammar and Dalil Ichalal. CEA

Identification of causal factors leading people to choose high protein food. Irène Demongeot, Antoine Cornuejols and Nicolas Darcel. AgroParisTech & INRA

Logical approach to identify Boolean networks modelling cell differentiation. Stéphanie Chevalier, Andrei Zinovyev, Christine Froidevaux and Loïc Paulevé. LRI

A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization. Robin Vogel, Aurélien Bellet and Stéphan Clémençon. Télécom ParisTech

A Latent Model for Representation Learning on Networks. Abdulkadir Celikkanat and Fragkiskos Malliaros. CentraleSupelec

Graph Matching and Transfer Learning: How to learn a new model from an existing one. Jiang You AgroParisTech & INRA

Dimension Reduction Adapted to Paleogenomics. Séverine Liegeois, Olivier François and Flora Jay. LRI

Memory Bandits: Toward The Switching Bandit Problem Best Resolution. Reda Alami. Inria & Orange Labs

Transcriptomics data explorations to decipher iron homeostasis in the pathogenesis yeast Candida glabrata. Thomas Denecker and Gaëlle Lelandais. CEA

In addition, all contributed speakers listed in the program are invited to present a poster.

Friday 14 September 2018

8:30: Coffee and conference registration

Optimization

8:45 - 9:05 Stochastic algorithms for ICA. Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso and Francis Bach. Inria

9:05 - 9:25 Celer: dual extrapolation for the Lasso. Mathurin Massias, Alexandre Gramfort and Joseph Salmon. Inria

9:25 - 9:45 A Stochastic Fixed Point Method for Empirical Risk Minimization. Rui Yuan, Robert M. Gower and Olivier Fercoq Télécom ParisTech

9:45 - 10:05 Optimal mini-batch size for stochastic variance reduced methods. Nidham Gazagnadou and Robert M. Gower. Télécom ParisTech

10:05 - 10:35 Coffee break

Text processing

10:35 - 10:55 Automated extraction of food-drug interactions from scientific articles. Tsanta Randriatsitohaina. LIMSI

10:55 - 11:15 User modelling for the characterisation of eating behaviors. Sema Akkoyunlu. AgroParisTech

11:15 - 11:30 Talk and poster award

11:30 - 12:30 Keynote Christine Balagué (Institut Mines Telecom Business School) Enjeux éthiques et responsabilité des technologies

Lunchbox to go

Friday afternoon will be reserved for visits to company labs:

Microsoft Research: Presentation of IA use cases in Microsoft Research Center (places limited to 20 people)
Dataiku: Hands on session using DSS for a data science use case (places limited to 40 people)

Call for submissions

2nd call for contributions: Posters and Demos

We invite Master M2 and PhD students from Université Paris-Saclay to submit an extended abstract of up to 3 pages describing new or preliminary results of their work as a poster or poster-demo (in English) Master students are encouraged to submit posters even if they do not have substantial results at the time of submission. Submissions should be formatted according to the Springer Lecture Notes in Computer Science style.

We will award a prize to the best communication.

Extended Abstract Instructions:

Please follow this simplified template: PDF ; latex source or Word template

Language = English

Number of pages = 1 to 3

Maximum number of Table or Figure = 1

Mandatory paragraphs = Abstract, Keywords, Motivation, References (max 10 ref.)

Each extended abstract must be submitted online via the Easychair submission system

SUBMIT

The topics of the conference are listed below:

data mining

databases

big data analytics

machine learning

statistics

semantic web

scientific workflows

distributed data and computing

applications of data science (biomedical and biological data, physics, chemistry, smart cities, image, documents, audio, video, on-line advertisement, ...)

The extended abstracts will be reviewed by the scientific program committee, possibly including one junior PC member (PhD student or postdoc in data science). They will be selected for oral (15 min) or poster presentation (flash talks, poster and poster-demo sessions) according to their originality and relevance to the conference topics. All the presentations should be in English. Electronic versions of the extended abstracts will be accessible on the conference web site. The book of abstracts will not be published and the extended abstracts will not constitute a formal publication.

Note: Only Master M2 and PhD students from Université Paris-Saclay are invited to contribute.

About

News

Speakers

Program

Thursday 13 September 2018

9:20 - 10:20 Keynote Patrice Simard (Microsoft Research AI Lab) Machine Learning -- What’s next?

Machine Learning

13:50 - 14:50 Keynote Juliana Freire (NYU Moore-Sloan Data Science Environment) Democratizing Urban Data Exploration

Statistical theory for data science

Deep learning

Friday 14 September 2018

Optimization

Text processing

11:30 - 12:30 Keynote Christine Balagué (Institut Mines Telecom Business School) Enjeux éthiques et responsabilité des technologies

Call for submissions

2nd call for contributions: Posters and Demos

Important Dates

Registration

Organizers

Steering Committee

Scientific Committee

Program Committee

Partners and sponsors

Venue