Junior Conference on Data Science and Engineering
Paris-Saclay, 13th-14th September 2018




About


The third edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed to PhD students in their first year, M2 students and third year students at Engineering schools at Paris-Saclay. It will offer these students the opportunity to present their scientific works developed at internships, or in the first year of thesis, and also to grow their critical sense thanks to a professional conference hosting prestigious invited speakers, academics and industry scientists.


The conference aims at gathering a large public of master, engineering school and PhD students, and is an excellent means of discovering the research world in Data Science and Engineering.


PhD students are also involved in the conference organization as reviewers, session chairs, organizers of networking events.
Please contact us if you are interested in joining the team.



2016 program 2017 program


#JDSE2018
Follow us on twitter! #JDSE2018


Photo credits: I. Manolescu, M2 BIBS students, L. Pauleve

News


Congratulations to Boris Muzelec (ENSAE ParisTech, “Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions”) who won the best talk award and to Séverine Liegeois (LRI Paris-Sud, “Dimension Reduction Adapted to Paleogenomics”) who won the best poster award.


Contact us as early as you want if you would like to be part of the junior or senior program or local committees of JDSE2018
i.e. to participate to one or several of the following tasks: planning the event beforehand, spreading the word, organizing the student friday activity, reviewing scientific papers, chairing sessions, helping with logistics during the conference, and so on.

The conference is free but registration is mandatory. Please fill in the form. Places are limited for Friday afternoon's student activity so please do not wait for the last moment.

The talks will be broadcasted during the conference on the webcast page.

The booklet with all extended abstracts can be found here.

Speakers


Christine

Christine Balagué
Professor at Institut Mines Telecom Business School, Titulaire de la Chaire Réseaux Sociaux, France


Enjeux éthiques et responsabilité des technologies 


Abstract:

Les technologies d'intelligence artificielle et les usages croissants des systèmes algorithmiques impactent la vie quotidienne des individus et nos sociétés.

En 2018, la révolution digitale s’est retrouvée au cœur de nombreux débats sociétaux, le clivage devenant plus marqué entre des représentations de la technologie très positives d’une part et d’autres plus fermement négatives.

Ces débats sont liés aux enjeux éthiques qu'engendrent le développement massif des technologies et leurs usages dans nos sociétés. Les modèles dominants sont portés par les Etats-Unis et la Chine et portent des valeurs profondément différentes de celles qui ont crée l'Europe. Nous discuterons dans cet exposé les différents enjeux éthiques des technologies, depuis la recherche jusqu'aux applications, ainsi que des pistes futures permettant de développer un modèle plus responsable des technologies. 


Bio:

Christine Balagué est Professeur et Titulaire de la Chaire réseaux sociaux et objets connectés à l’Institut Mines-Télécom Business School, et a été Vice-présidente du Conseil National du Numérique de 2013 à 2015. Ses recherches portent sur la modélisation du comportement des individus connectés, en particulier sur les réseaux sociaux et avec des objets connectés. Elle est également membre de la CERNA (Comité d’Ethique de la Recherche sur le Numérique d’Allistène) et de l’Institut de Convergences DATAIA sur les sciences de données et l’intelligence artificielle. En tant que VP du Conseil National du Numérique, elle a participé à différents travaux remis au gouvernement français sur les grandes questions du numérique (Neutralité du Net, Neutralité des plateformes, E-inclusion, E-éducation, E-santé, concertation nationale). Elle est également l’auteur de nombreux ouvrages sur le développement de l’Internet en France et sur les réseaux sociaux. Habilitée à Diriger des Recherches, Christine Balagué est docteur en Sciences de Gestion, diplômée de l’ESSEC et d’un Master d’économétrie à l’ENSAE.



Juliana_Freire

Juliana Freire
Professor of Computer Science and Engineering and Data Science Executive Director, NYU Moore-Sloan Data Science Environment, United States of America


Democratizing Urban Data Exploration


Abstract:

The large volumes of urban data, along with vastly increased computing power, open up new opportunities to better understand cities. Encouraging success stories show that data can be leveraged to make operations more efficient, inform policies and planning, and improve the quality of life for residents. However, analyzing urban data often requires a staggering amount of work, from identifying relevant data sets, cleaning and integrating them, to performing exploratory analyses and creating predictive models that must take into account spatio-temporal processes. Our long-term goal is to enable domain experts to crack the code of cities by freely exploring the vast amounts of urban data. In this talk, we will present methods and systems that combine data management, analytics, and visualization to increase the level of interactivity, scalability, and usability for urban data exploration.


Bio:

Juliana Freire is a Professor of Computer Science and Engineering and Data Science at New York University. She holds an appointment at the Courant Institute for Mathematical Science, is a faculty member at the NYU Center for Urban Science and at the NYU Center of Data Science. She is the executive director of the NYU Moore-Sloan Data Science Environment, chair of the ACM SIGMOD and a council member of the Computing Community Consortium (CCC). Her recent research has focused on big-data analysis and visualization, large-scale information integration, web crawling and domain discovery, provenance management, and computational reproducibility. Prof. Freire is an active member of the database and Web research communities, with over 170 technical papers, several open-source systems, and 12 U.S. patents. She is an ACM Fellow and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She has chaired or co-chaired workshops and conferences, and participated as a program committee member in over 70 events. Her research grants are from the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T, the University of Utah, New York University, Microsoft Research, Yahoo! and IBM.



Patrice_Simard

Patrice Simard
Microsoft Research AI Lab, Redmond, United States of America


Machine Learning -- What’s next?


Abstract:

For many Machine Learning (ML) problems, labeled data is readily available. When this is the case, algorithms and training time are the performance bottleneck. This is the ML researcher’s paradise! Vision and Speech are good examples of such problems because they have a stable distribution and additional human labels can be collected each year. Problems that extract their labels from history, such as click prediction, data analytics, and forecasting are also blessed with large numbers of labels. Unfortunately, there are only a few problems for which we can rely on such an endless supply of free labels. They receive a disproportionally large amount of attention from the media.

We are interested in tackling the much larger class of ML problems where labeled data is sparse. For example, consider a dialog system for a specific app to recognize specific commands such as “lights on first floor off”, “increase spacing between 2nd and 3rd paragraph”, “make doctor appointment after Hawaii vacation”. Anyone who has attempted building such a system has soon discovered that generalizing to new instances from a small custom set of labeled instances is far more difficult than they originally thought. Each domain has its own generalization challenges, data exploration and discovery, custom features, and decomposition structure. Creating labeled data to communicate custom knowledge is inefficient. It also leads to embarrassing errors resulting from over-training on small sets. ML algorithms and processing power are not a bottleneck when labeled data is scarce. The bottleneck is the teacher and the teaching language.

To address this problem, we change our focus from the learning algorithm to teachers. We define “Machine Teaching” as improving the human productivity given a learning algorithm. If ML is the science and engineering of extracting knowledge from data, Machine Teaching is the science and engineering of extracting knowledge from teachers. A similar shift of focus has happened in computer science. While computing is revolutionizing our lives, systems sciences (e.g., programming languages, operating systems, networking) have shifted their foci to human productivity. We expect a similar trend will shift science from Machine Learning to Machine Teaching.

The aim of this talk is to convince the audience that we are asking the right questions. We provide some answers and some spectacular results. The most exciting part, however, is the research opportunities that come with the emergence of a new field.


Bio:

Patrice Simard is a Distinguished Engineer in the Microsoft Research AI Lab in Redmond. He is passionate about finding new ways to combine engineering and science in the field of machine learning. Simard’s research is currently focused on human teachers. His goal is to extend the teaching language, science, and engineering, beyond the traditional (input, label) pairs. Simard completed his PhD thesis in Computer Science at the University of Rochester in 1991. He then spent 8 years at AT&T Bell Laboratories working on neural networks. He joined Microsoft Research in 1998. In 2002, he started MSR’s Document Processing and Understanding research group. In 2006, he left MSR to become the Chief Scientist and General Manager of Microsoft’s Live Labs Research. In 2009, he became the Chief Scientist of Microsoft’s AdCenter (the organization that monetizes Bing search). In 2012, he returned to Microsoft Research to work on his passion, Machine Learning research. Specifically, he founded the Computer-Human Interactive Learning (CHIL) group to study Machine Teaching and to make machine learning accessible to everyone.



Program


Thursday 13 September 2018


8:15 Coffee and conference registration
8:45 Opening

9:20 - 10:20 Keynote Patrice Simard (Microsoft Research AI Lab) Machine Learning -- What’s next?


10:20 - 10:50 Coffee break

Machine Learning

10:50 - 11:10 Thermodynamics of Restricted Boltzmann Machines. Giancarlo Fissore, Aurelien Decelle and Cyril Furtlehner. Inria

11:10 - 11:30 Wasserstein regularization for sparse multi-task regression. Hicham Janati, Marco Cuturi and Alexandre Gramfort. Inria

11:30 - 11:50 Automated machine learning with Monte Carlo Tree Search. Herilalaina Rakotoarison and Michèle Sebag. CNRS, Inria, LRI

11:50 - 12:10 Infinite-Task Learning with Vector-Valued RKHSs. Alex Lambert, Romain Brault, Zoltan Szabo, Maxime Sangnier and Florence d'Alché-buc. Télécom ParisTech

12:10 - 12:30 Predictive Data Mining for Multi-Variate Time Series in a Distributed Environment. Jingwei Zuo, Karine Zeitouni, Yehia Taher and Raef Mousheimish. UVSQ


12:30 - 13h50 Lunch at Proto 204

13:50 - 14:50 Keynote Juliana Freire (NYU Moore-Sloan Data Science Environment) Democratizing Urban Data Exploration


14:50 - 15:20 Coffee break

Statistical theory for data science

15:20 - 15:40 Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions. Boris Muzellec and Marco Cuturi. ENSAE

15:40 - 16:00 Finite Sample Bounds for Superquantile Linear Prediction. Evgenii Chzhen, Joseph Salmon and Zaid Harchaoui. Télécom ParisTech

Deep learning

16:00 - 16:20 Optimizing deep video representation to match brain activity. Hugo Richard, Ana Luisa Pinho, Bertrand Thirion and Guillaume Charpiat. Inria

16:20 - 16:40 Generative Neural Networks for Global Optimization with Gradients. Louis Faury and Olivier Fercoq. Criteo

16:40 - 17:00 Neuroevolution with CMA-ES for the tuning of a PID controller of nonholonomic car-like mobile robot. Mohamed Outahar and Eric Lucet. CEA


17:00 - 19:00 Poster session and cocktail at Proto 204

Improving the generalization capacity of nonlinear regression and classification models with generative models. Nabil Benmerad. DCBrain - Microsoft AI Factory

H_/ H_infinity Robust Observer for Actuator Fault Detection and Diagnosis of a Quadrotor. Eslam Abouselima, Said Mammar and Dalil Ichalal. CEA

Identification of causal factors leading people to choose high protein food. Irène Demongeot, Antoine Cornuejols and Nicolas Darcel. AgroParisTech & INRA

Logical approach to identify Boolean networks modelling cell differentiation. Stéphanie Chevalier, Andrei Zinovyev, Christine Froidevaux and Loïc Paulevé. LRI

A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization. Robin Vogel, Aurélien Bellet and Stéphan Clémençon. Télécom ParisTech

A Latent Model for Representation Learning on Networks. Abdulkadir Celikkanat and Fragkiskos Malliaros. CentraleSupelec

Graph Matching and Transfer Learning: How to learn a new model from an existing one. Jiang You AgroParisTech & INRA

Dimension Reduction Adapted to Paleogenomics. Séverine Liegeois, Olivier François and Flora Jay. LRI

Memory Bandits: Toward The Switching Bandit Problem Best Resolution. Reda Alami. Inria & Orange Labs

Transcriptomics data explorations to decipher iron homeostasis in the pathogenesis yeast Candida glabrata. Thomas Denecker and Gaëlle Lelandais. CEA

In addition, all contributed speakers listed in the program are invited to present a poster.


Friday 14 September 2018


8:30: Coffee and conference registration

Optimization

8:45 - 9:05 Stochastic algorithms for ICA. Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso and Francis Bach. Inria

9:05 - 9:25 Celer: dual extrapolation for the Lasso. Mathurin Massias, Alexandre Gramfort and Joseph Salmon. Inria

9:25 - 9:45 A Stochastic Fixed Point Method for Empirical Risk Minimization. Rui Yuan, Robert M. Gower and Olivier Fercoq Télécom ParisTech

9:45 - 10:05 Optimal mini-batch size for stochastic variance reduced methods. Nidham Gazagnadou and Robert M. Gower. Télécom ParisTech

10:05 - 10:35 Coffee break

Text processing

10:35 - 10:55 Automated extraction of food-drug interactions from scientific articles. Tsanta Randriatsitohaina. LIMSI

10:55 - 11:15 User modelling for the characterisation of eating behaviors. Sema Akkoyunlu. AgroParisTech

11:15 - 11:30 Talk and poster award

11:30 - 12:30 Keynote Christine Balagué (Institut Mines Telecom Business School) Enjeux éthiques et responsabilité des technologies


Lunchbox to go

Friday afternoon will be reserved for visits to company labs:

  • Microsoft Research: Presentation of IA use cases in Microsoft Research Center (places limited to 20 people)

  • Dataiku: Hands on session using DSS for a data science use case (places limited to 40 people)

Call for submissions


2nd call for contributions: Posters and Demos



We invite Master M2 and PhD students from Université Paris-Saclay to submit an extended abstract of up to 3 pages describing new or preliminary results of their work as a poster or poster-demo (in English) Master students are encouraged to submit posters even if they do not have substantial results at the time of submission. Submissions should be formatted according to the Springer Lecture Notes in Computer Science style.

We will award a prize to the best communication.


Extended Abstract Instructions:

  • Please follow this simplified template: PDF ; latex source or Word template
  • Language = English
  • Number of pages = 1 to 3
  • Maximum number of Table or Figure = 1
  • Mandatory paragraphs = Abstract, Keywords, Motivation, References (max 10 ref.)

Each extended abstract must be submitted online via the Easychair submission system

SUBMIT

The topics of the conference are listed below:

  • data mining
  • databases
  • big data analytics
  • machine learning
  • statistics
  • semantic web
  • scientific workflows
  • distributed data and computing
  • applications of data science (biomedical and biological data, physics, chemistry, smart cities, image, documents, audio, video, on-line advertisement, ...)


The extended abstracts will be reviewed by the scientific program committee, possibly including one junior PC member (PhD student or postdoc in data science). They will be selected for oral (15 min) or poster presentation (flash talks, poster and poster-demo sessions) according to their originality and relevance to the conference topics. All the presentations should be in English. Electronic versions of the extended abstracts will be accessible on the conference web site. The book of abstracts will not be published and the extended abstracts will not constitute a formal publication.

Note: Only Master M2 and PhD students from Université Paris-Saclay are invited to contribute.

Important Dates


First call for contributions (closed): June 4st, 2018

Second call (posters and demos only) - deadline: July 23rd 2018

Acceptance notification: August 10th, 2018

Registration is open

Registration closes: September 9th, 2018

Conference: September 13-14th, 2018

Registration


This third edition of the Paris-Saclay Junior Conference on Data Science and Engineering is addressed only to students and researchers at University Paris-Saclay. Registration is free but it is mandatory.

Conference registration form



Organizers

The organization comittee takes care of the logistic and communication aspects of the conference. It builds the scientific committee.


Fatiha Saïs
Chair, Université Paris-Sud, UPsay

Olivier Fercoq
Co chair, Télécom ParisTech, UPsay

Isabelle Huteau
Conference Communication Leader, Digicosme



Antoine Naulet
M2 DataScience

Pierre Andrieu
PhD student @LRI

Stéphanie Chevalier
Master Student - intern @LRI

Houssem-Eddine Gharbi
Master Student - intern @Dannone

Thomas Denecker
PhD student @I2BC

Jocelyn Vernay
Master Student

Issa Memari
M2 Data & Knowledge - intern @EDF

Steering Committee

The role of the steering committee is helping in making the conference durable by its involvement in the choice of the scientific content of the conference, the proposal and the designation of the futur organizers and the location of the conference. It will also be involved in the constitution of the scientific program of the conference by participating in the selection of invited speakers and in scientific committee.


Florence d'Alché-Buc
Télécom ParisTech, UPsay

Sylvain Arlot
Université Paris-Sud, UPsay

Albert Bifet
Télécom ParisTech, UPsay

Sarah Cohen-Boulakia
Université Paris-Sud, UPsay

Flora Jay
CNRS, UPsay

Joseph Salmon
Télécom ParisTech, UPsay

Karine Zeitouni
Université de Versailles-Saint-Quentin-en-Yvelines, UPSay

Scientific Committee

The scientific committee chooses the keynote speakers, builds and supervises the program committee. It decides on the rules for the prize for the best communication. It also helps the organization committee asking for funding and writes the call for contributions.


Florence d'Alché-Buc
Télécom ParisTech, UPsay

Zacharie Ales
ENSTA ParisTech, UPsay

Sarah Cohen-Boulakia
Université Paris-Sud, UPsay

Marco Cuturi
ENSAE ParisTech, UPSay

Olivier Fercoq
Télécom ParisTech, UPsay

Alexandre Gramfort
Inria Saclay, UPSay

Flora Jay
CNRS, UPsay

Cristina Manfredotti
AgroParisTech, UPSay

Nicoleta Preda
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Gianluca Quercini
Centrale-Supélec, UPSay

Fatiha Saïs
Université Paris-Sud, UPsay

Michaël Thomazo
Inria Saclay, UPSay

Paola Tubaro
CNRS, UPSay

Program Committee

The program committee reviews the extended abstracts submitted and selects the ones which will be allowed to be presented at the conference. It will also be in charge of the reviewing and the selection of the student posters and demos to be presented during the conference.



Partners and sponsors






This work was partially supported by a public grant as part of the Investissement d'avenir project (ANR-11-LABX-0056-LMH), and by Labex DigiCosme (ANR-11-LABEX-0045-DIGICOSME), operated by ANR as part of the program Investissement d'Avenir Idex Paris Saclay (ANR-11-IDEX-0003-02).

Venue


Laboratoire de l'Accélérateur Linéaire
Centre Scientifique d'Orsay
Bâtiment 200 - BP 34
91898 ORSAY Cedex