The third edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed to PhD students in their first year, M2 students and third year students at Engineering schools at Paris-Saclay. It will offer these students the opportunity to present their scientific works developed at internships, or in the first year of thesis, and also to grow their critical sense thanks to a professional conference hosting prestigious invited speakers, academics and industry scientists.
The conference aims at gathering a large public of master, engineering school and PhD students, and is an excellent means of discovering the research world in Data Science and Engineering.
PhD students are also involved in the conference organization as reviewers, session chairs, organizers of networking events.
Please contact us if you are interested in joining the team.
#JDSE2018
Follow us on twitter!
#JDSE2018
Photo credits: I. Manolescu, M2 BIBS students, L. Pauleve
Congratulations to Boris Muzelec (ENSAE ParisTech, “Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions”) who won the best talk award and to Séverine Liegeois (LRI Paris-Sud, “Dimension Reduction Adapted to Paleogenomics”) who won the best poster award.
Contact us as early as you want if you would like to be part of the junior or senior program or local committees of JDSE2018
i.e. to participate to one or several of the following tasks: planning the event beforehand, spreading the word, organizing the student friday activity, reviewing scientific papers, chairing sessions, helping with logistics during the conference, and so on.
The conference is free but registration is mandatory. Please fill in the form. Places are limited for Friday afternoon's student activity so please do not wait for the last moment.
The talks will be broadcasted during the conference on the webcast page.
The booklet with all extended abstracts can be found here.
Christine Balagué
Professor at Institut Mines Telecom Business School, Titulaire de la Chaire Réseaux Sociaux, France
Enjeux éthiques et responsabilité des technologies
Abstract:
Les technologies d'intelligence artificielle et les usages croissants des systèmes algorithmiques impactent la vie quotidienne des individus et nos sociétés.
En 2018, la révolution digitale s’est retrouvée au cœur de nombreux débats sociétaux, le clivage devenant plus marqué entre des représentations de la technologie très positives d’une part et d’autres plus fermement négatives.
Ces débats sont liés aux enjeux éthiques qu'engendrent le développement massif des technologies et leurs usages dans nos sociétés. Les modèles dominants sont portés par les Etats-Unis et la Chine et portent des valeurs profondément différentes de celles qui ont crée l'Europe. Nous discuterons dans cet exposé les différents enjeux éthiques des technologies, depuis la recherche jusqu'aux applications, ainsi que des pistes futures permettant de développer un modèle plus responsable des technologies.
Bio:
Christine Balagué est Professeur et Titulaire de la Chaire réseaux sociaux et objets connectés à l’Institut Mines-Télécom Business School, et a été Vice-présidente du Conseil National du Numérique de 2013 à 2015. Ses recherches portent sur la modélisation du comportement des individus connectés, en particulier sur les réseaux sociaux et avec des objets connectés. Elle est également membre de la CERNA (Comité d’Ethique de la Recherche sur le Numérique d’Allistène) et de l’Institut de Convergences DATAIA sur les sciences de données et l’intelligence artificielle. En tant que VP du Conseil National du Numérique, elle a participé à différents travaux remis au gouvernement français sur les grandes questions du numérique (Neutralité du Net, Neutralité des plateformes, E-inclusion, E-éducation, E-santé, concertation nationale). Elle est également l’auteur de nombreux ouvrages sur le développement de l’Internet en France et sur les réseaux sociaux. Habilitée à Diriger des Recherches, Christine Balagué est docteur en Sciences de Gestion, diplômée de l’ESSEC et d’un Master d’économétrie à l’ENSAE.
Juliana Freire
Professor of Computer Science and Engineering and Data Science Executive Director, NYU Moore-Sloan Data Science Environment, United States of America
Democratizing Urban Data Exploration
Abstract:
The large volumes of urban data, along with vastly increased computing power, open up new opportunities to better understand cities. Encouraging success stories show that data can be leveraged to make operations more efficient, inform policies and planning, and improve the quality of life for residents. However, analyzing urban data often requires a staggering amount of work, from identifying relevant data sets, cleaning and integrating them, to performing exploratory analyses and creating predictive models that must take into account spatio-temporal processes. Our long-term goal is to enable domain experts to crack the code of cities by freely exploring the vast amounts of urban data. In this talk, we will present methods and systems that combine data management, analytics, and visualization to increase the level of interactivity, scalability, and usability for urban data exploration.
Bio:
Juliana Freire is a Professor of Computer Science and Engineering and Data Science at New York University. She holds an appointment at the Courant Institute for Mathematical Science, is a faculty member at the NYU Center for Urban Science and at the NYU Center of Data Science. She is the executive director of the NYU Moore-Sloan Data Science Environment, chair of the ACM SIGMOD and a council member of the Computing Community Consortium (CCC). Her recent research has focused on big-data analysis and visualization, large-scale information integration, web crawling and domain discovery, provenance management, and computational reproducibility. Prof. Freire is an active member of the database and Web research communities, with over 170 technical papers, several open-source systems, and 12 U.S. patents. She is an ACM Fellow and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She has chaired or co-chaired workshops and conferences, and participated as a program committee member in over 70 events. Her research grants are from the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T, the University of Utah, New York University, Microsoft Research, Yahoo! and IBM.
Patrice Simard
Microsoft Research AI Lab, Redmond, United States of America
Machine Learning -- What’s next?
Abstract:
For many Machine Learning (ML) problems, labeled data is readily available. When this is the case, algorithms and training time are the performance bottleneck. This is the ML researcher’s paradise! Vision and Speech are good examples of such problems because they have a stable distribution and additional human labels can be collected each year. Problems that extract their labels from history, such as click prediction, data analytics, and forecasting are also blessed with large numbers of labels. Unfortunately, there are only a few problems for which we can rely on such an endless supply of free labels. They receive a disproportionally large amount of attention from the media.
We are interested in tackling the much larger class of ML problems where labeled data is sparse. For example, consider a dialog system for a specific app to recognize specific commands such as “lights on first floor off”, “increase spacing between 2nd and 3rd paragraph”, “make doctor appointment after Hawaii vacation”. Anyone who has attempted building such a system has soon discovered that generalizing to new instances from a small custom set of labeled instances is far more difficult than they originally thought. Each domain has its own generalization challenges, data exploration and discovery, custom features, and decomposition structure. Creating labeled data to communicate custom knowledge is inefficient. It also leads to embarrassing errors resulting from over-training on small sets. ML algorithms and processing power are not a bottleneck when labeled data is scarce. The bottleneck is the teacher and the teaching language.
To address this problem, we change our focus from the learning algorithm to teachers. We define “Machine Teaching” as improving the human productivity given a learning algorithm. If ML is the science and engineering of extracting knowledge from data, Machine Teaching is the science and engineering of extracting knowledge from teachers. A similar shift of focus has happened in computer science. While computing is revolutionizing our lives, systems sciences (e.g., programming languages, operating systems, networking) have shifted their foci to human productivity. We expect a similar trend will shift science from Machine Learning to Machine Teaching.
The aim of this talk is to convince the audience that we are asking the right questions. We provide some answers and some spectacular results. The most exciting part, however, is the research opportunities that come with the emergence of a new field.
Bio:
Patrice Simard is a Distinguished Engineer in the Microsoft Research AI Lab in Redmond. He is passionate about finding new ways to combine engineering and science in the field of machine learning. Simard’s research is currently focused on human teachers. His goal is to extend the teaching language, science, and engineering, beyond the traditional (input, label) pairs. Simard completed his PhD thesis in Computer Science at the University of Rochester in 1991. He then spent 8 years at AT&T Bell Laboratories working on neural networks. He joined Microsoft Research in 1998. In 2002, he started MSR’s Document Processing and Understanding research group. In 2006, he left MSR to become the Chief Scientist and General Manager of Microsoft’s Live Labs Research. In 2009, he became the Chief Scientist of Microsoft’s AdCenter (the organization that monetizes Bing search). In 2012, he returned to Microsoft Research to work on his passion, Machine Learning research. Specifically, he founded the Computer-Human Interactive Learning (CHIL) group to study Machine Teaching and to make machine learning accessible to everyone.
8:15 Coffee and conference registration
8:45 Opening
10:20 - 10:50 Coffee break
10:50 - 11:10 Thermodynamics of Restricted Boltzmann Machines. Giancarlo Fissore, Aurelien Decelle and Cyril Furtlehner.
11:10 - 11:30 Wasserstein regularization for sparse multi-task regression.
Hicham Janati, Marco Cuturi and Alexandre Gramfort.
11:30 - 11:50 Automated machine learning with Monte Carlo Tree Search.
Herilalaina Rakotoarison and Michèle Sebag.
11:50 - 12:10 Infinite-Task Learning with Vector-Valued RKHSs.
Alex Lambert, Romain Brault, Zoltan Szabo, Maxime Sangnier and Florence d'Alché-buc.
12:10 - 12:30 Predictive Data Mining for Multi-Variate Time Series in a Distributed Environment.
Jingwei Zuo, Karine Zeitouni, Yehia Taher and Raef Mousheimish.
12:30 - 13h50 Lunch at Proto 204
14:50 - 15:20 Coffee break
15:20 - 15:40 Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions.
Boris Muzellec and Marco Cuturi.
15:40 - 16:00 Finite Sample Bounds for Superquantile Linear Prediction.
Evgenii Chzhen, Joseph Salmon and Zaid Harchaoui.
16:00 - 16:20 Optimizing deep video representation to match brain activity.
Hugo Richard, Ana Luisa Pinho, Bertrand Thirion and Guillaume Charpiat.
16:20 - 16:40 Generative Neural Networks for Global Optimization with Gradients.
Louis Faury and Olivier Fercoq.
16:40 - 17:00 Neuroevolution with CMA-ES for the tuning of a PID controller of nonholonomic car-like mobile robot.
Mohamed Outahar and Eric Lucet.
17:00 - 19:00 Poster session and cocktail at Proto 204
Improving the generalization capacity of nonlinear regression and classification models with generative models. Nabil Benmerad.
H_/ H_infinity Robust Observer for Actuator Fault Detection and Diagnosis of a Quadrotor.
Eslam Abouselima, Said Mammar and Dalil Ichalal.
Identification of causal factors leading people to choose high protein food.
Irène Demongeot, Antoine Cornuejols and Nicolas Darcel.
Logical approach to identify Boolean networks modelling cell differentiation.
Stéphanie Chevalier, Andrei Zinovyev, Christine Froidevaux and Loïc Paulevé.
A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization.
Robin Vogel, Aurélien Bellet and Stéphan Clémençon.
A Latent Model for Representation Learning on Networks.
Abdulkadir Celikkanat and Fragkiskos Malliaros.
Graph Matching and Transfer Learning: How to learn a new model from an existing one.
Jiang You
Dimension Reduction Adapted to Paleogenomics.
Séverine Liegeois, Olivier François and Flora Jay.
Memory Bandits: Toward The Switching Bandit Problem Best Resolution.
Reda Alami.
Transcriptomics data explorations to decipher iron homeostasis in the pathogenesis yeast Candida glabrata. Thomas Denecker and Gaëlle Lelandais.
In addition, all contributed speakers listed in the program are invited to present a poster.
8:30: Coffee and conference registration
8:45 - 9:05 Stochastic algorithms for ICA.
Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso and Francis Bach.
9:05 - 9:25 Celer: dual extrapolation for the Lasso.
Mathurin Massias, Alexandre Gramfort and Joseph Salmon.
9:25 - 9:45 A Stochastic Fixed Point Method for Empirical Risk Minimization.
Rui Yuan, Robert M. Gower and Olivier Fercoq
9:45 - 10:05 Optimal mini-batch size for stochastic variance reduced methods.
Nidham Gazagnadou and Robert M. Gower.
10:05 - 10:35 Coffee break
10:35 - 10:55 Automated extraction of food-drug interactions from scientific articles.
Tsanta Randriatsitohaina.
10:55 - 11:15 User modelling for the characterisation of eating behaviors.
Sema Akkoyunlu.
11:15 - 11:30 Talk and poster award
Lunchbox to go
Friday afternoon will be reserved for visits to company labs:
Microsoft Research: Presentation of IA use cases in Microsoft Research Center (places limited to 20 people)
Dataiku: Hands on session using DSS for a data science use case (places limited to 40 people)
We invite Master M2 and PhD students from Université Paris-Saclay to submit an extended abstract of up to 3 pages describing new or preliminary results of their work as a poster or poster-demo (in English) Master students are encouraged to submit posters even if they do not have substantial results at the time of submission. Submissions should be formatted according to the Springer Lecture Notes in Computer Science style.
We will award a prize to the best communication.
Extended Abstract Instructions:
Each extended abstract must be submitted online via the Easychair submission system
The topics of the conference are listed below:
The extended abstracts will be reviewed by the scientific program committee, possibly including one junior PC member (PhD student or postdoc in data science). They will be selected for oral (15 min) or poster presentation (flash talks, poster and poster-demo sessions) according to their originality and relevance to the conference topics. All the presentations should be in English. Electronic versions of the extended abstracts will be accessible on the conference web site. The book of abstracts will not be published and the extended abstracts will not constitute a formal publication.
Note: Only Master M2 and PhD students from Université Paris-Saclay are invited to contribute.
First call for contributions (closed): June 4st, 2018
Second call (posters and demos only) - deadline: July 23rd 2018
Acceptance notification: August 10th, 2018
Registration closes: September 9th, 2018
Conference: September 13-14th, 2018
This third edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed only to students and researchers at University Paris-Saclay. Registration is free but it is mandatory.
The organization comittee takes care of the logistic and communication aspects of the conference. It builds the scientific committee.
Fatiha Saïs
Chair, Université Paris-Sud, UPsay
Olivier Fercoq
Co chair, Télécom ParisTech, UPsay
Isabelle Huteau
Conference Communication Leader, Digicosme
Antoine Naulet
M2 DataScience
Pierre Andrieu
PhD student @LRI
Stéphanie Chevalier
Master Student - intern @LRI
Houssem-Eddine Gharbi
Master Student - intern @Dannone
Thomas Denecker
PhD student @I2BC
Jocelyn Vernay
Master Student
Issa Memari
M2 Data & Knowledge - intern @EDF
The role of the steering committee is helping in making the conference durable by its involvement in the choice of the scientific content of the conference, the proposal and the designation of the futur organizers and the location of the conference. It will also be involved in the constitution of the scientific program of the conference by participating in the selection of invited speakers and in scientific committee.
Florence d'Alché-Buc
Télécom ParisTech, UPsay
Sylvain Arlot
Université Paris-Sud, UPsay
Albert Bifet
Télécom ParisTech, UPsay
Sarah Cohen-Boulakia
Université Paris-Sud, UPsay
Flora Jay
CNRS, UPsay
Joseph Salmon
Télécom ParisTech, UPsay
Karine Zeitouni
Université de Versailles-Saint-Quentin-en-Yvelines, UPSay
The scientific committee chooses the keynote speakers, builds and supervises the program committee. It decides on the rules for the prize for the best communication. It also helps the organization committee asking for funding and writes the call for contributions.
Florence d'Alché-Buc
Télécom ParisTech, UPsay
Zacharie Ales
ENSTA ParisTech, UPsay
Sarah Cohen-Boulakia
Université Paris-Sud, UPsay
Marco Cuturi
ENSAE ParisTech, UPSay
Olivier Fercoq
Télécom ParisTech, UPsay
Alexandre Gramfort
Inria Saclay, UPSay
Flora Jay
CNRS, UPsay
Cristina Manfredotti
AgroParisTech, UPSay
Nicoleta Preda
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay
Gianluca Quercini
Centrale-Supélec, UPSay
Fatiha Saïs
Université Paris-Sud, UPsay
Michaël Thomazo
Inria Saclay, UPSay
Paola Tubaro
CNRS, UPSay
The program committee reviews the extended abstracts submitted and selects the ones which will be allowed to be presented at the conference. It will also be in charge of the reviewing and the selection of the student posters and demos to be presented during the conference.
Laboratoire de l'Accélérateur Linéaire Centre Scientifique d'Orsay Bâtiment 200 - BP 34 91898 ORSAY Cedex