Junior Conference on Data Science and Engineering
Paris-Saclay, 12th-13th September 2019



About


The fourth edition of the Paris-Saclay Junior Conference on Data Science and Engineering (JDSE) is addressed to first year Ph.D. students, M2 students and third year students at Engineering schools at Paris-Saclay and Institut Polytechnique de Paris. It will offer these students the opportunity to present some scientific work developed during their internships or first year of Ph.D. thesis, and also to grow their critical sense thanks to a professional conference hosting prestigious invited speakers, academics and industry scientists.


The conference aims at gathering a vast audience and is an excellent means of discovering research activities in Data Science and Engineering.


Ph.D. students are also involved in the conference organization as reviewers, session chairs, organizers of networking events.
Please contact us if you are interested in joining the team.

Past editions


#JDSE2019
Follow us on twitter! #JDSE2019


News


11/09/2019. The conference will be broadcast live HERE

05/09/2019. The registration is now closed.

18/06/2019. The poster submission system is now open till Monday the 15th of July, 2019, midnight.

27/05/2019. The paper submission deadline is now extended till Tuesday the 11th of June, 2019, midnight.

22/04/2019. The paper submission system is now open until Tuesday the 4th of June, 2019, midnight (see the call for submissions below). We invite M2 and Ph.D. students from the Université Paris-Saclay to submit an extended abstract of up to 3 pages describing new or preliminary results for a 15 minutes talk or a poster on https://easychair.org/conferences/?conf=jdse2019.

01/02/2019. Contact us as early as you want if you would like to be part of the junior or senior program or local committees of JDSE2019 i.e. to participate to one or several of the following tasks: planning the event beforehand, spreading the word, organizing the student Friday activity, reviewing scientific papers, chairing sessions, helping with logistics during the conference, and so on.

Speakers


Aapo_Hyvarinen

Aapo Hyvarinen
Professor of Computer Science and Machine Learning, University of Helsinki and University College London


Nonlinear independent component analysis. A principled framework for unsupervised deep learning


Abstract:

Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, e.g. with independent component analysis (ICA) and sparse coding. However, extending ICA to the nonlinear case has proven to be extremely difficult. A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data. Here, we show that this problem can be solved by using additional information either in the form of temporal structure or an additional, auxiliary variable. We start by formulating two generative models in which the data is an arbitrary but invertible nonlinear transformation of time series (components) which are statistically independent of each other. Drawing from the theory of linear ICA, we formulate two distinct classes of temporal structure of the components which enable identification, i.e. recovery of the original independent components. We show that in both cases, the actual learning can be performed by ordinary neural network training where only the input is defined in an unconventional manner, making software implementations. We further generalize the framework to the case where instead of temporal structure, an additional auxiliary variable is observed (e.g. audio in addition to video). Our methods are closely related to "self-supervised" methods heuristically proposed in computer vision, and also provide a theoretical foundation for such methods. The talk is based on the following papers. http://www.cs.helsinki.fi/u/ahyvarin/papers/NIPS16.pdf, http://www.cs.helsinki.fi/u/ahyvarin/papers/AISTATS17.pdf, https://arxiv.org/pdf/1805.08651


Bio:

Aapo Hyvarinen studied undergraduate mathematics at the universities of Helsinki (Finland), Vienna (Austria), and Paris (France), and obtained a Ph.D. degree in Information Science at the Helsinki University of Technology in 1997. In 2008, he was appointed Professor at the University of Helsinki. From 2016 to 2019, he was Professor of Machine Learning at the Gatsby Computational Neuroscience Unit, University College London, UK. Aapo Hyvarinen is the main author of the books "Independent Component Analysis" (2001) and "Natural Image Statistics" (2009), and author or coauthor of more than 200 scientific articles. Google Scholar gives him approximately 40,000 citations. His current work concentrates on unsupervised machine learning and its applications to neuroscience.



Oscar_Romero

Oscar Romero
Associate Lecturer at Universitat Politècnica de Catalunya (UPC), Barcelona, Spain


Data Management for Data Science.


Abstract:

The role of data management for data science is gaining relevance as more complex analytical systems are being developed. Techniques such as deep learning heavily rely on the efficient management of data (e.g., distribution of data) and specialised hardware to improve its training time.

Indeed, any data-intensive system can drastically benefit from the well-grounded theory of data management. In this talk we will explore the current role of data management for data science and focus on the main research lines in this aspect. We will pay special attention to data integration and data governance issues, while providing an overall view of the main challenges and proposed solutions in this field.


Bio:

Oscar Romero is an associate lecturer at Universitat Politècnica de Catalunya (UPC).

He obtained his PhD in Computing from UPC in 2010. Since then, he is a member of the Database Technologies and Information Management (DTIM) and Information Modeling and Processing (IMP) research groups. His research mainly focuses on complex information systems that automate the data management lifecycle, specially in the Business Intelligence and Big Data fields. More specifically, his main interests are OLAP and data warehousing, NOSQL (and any technology beyond relational databases), data integration, self-tuning database systems and semantic-aware systems (based on semantic formalisms such as ontology languages or RDF(S)).

Currently, he is the UPC coordinator of the Erasmus Mundus Joint Master in Big Data Management and Analytics (BDMA), the Data Science track of the Master in Innovation and Research in Informatics (MIRI-DS) and the life-long learning master in Big Data Management, Technologies and Analytics (BDMTA). He also participates in the Erasmus Mundus Joint PhD in Information Technologies for Business Intelligence - Doctoral Consortium (IT4BI-DC), where he supervised 3 successfully finalised PhD thesis and 8 additional on-going theses.

He has also participated in several technology transfer projects with relevant companies or organisations such as the World Health Organisation (WHO), SAP, HP Labs, Siemens, Atos and Zurich Insurance among others. Similarly, he has participated in more than 10 competitive projects.



Oana

Oana Goga
Researcher at the French National Centre for Scientific Research (CNRS), Laboratoire d'Informatique de Grenoble


Challenges in making social media advertising more transparent


Abstract:

In this talk, I will share my research efforts in understanding and tackling security and privacy threats in social media targeted advertising. Despite a number of recent controversies regarding privacy violations, lack of transparency, or vulnerability to discrimination or propaganda by dishonest actors; users still have little understanding of what data targeted advertising platforms have about them and why they are shown the ads they see. To address such concerns, Facebook recently introduced the “Why am I seeing this?” button that provides users with an explanation of why they were shown a particular ad. I first investigate the level of transparency provided by this mechanism by empirically measuring whether it satisfies a number of key properties and what are the consequences of the current design choices. To provide a better understanding of the Facebook advertising ecosystem, we developed a tool called AdAnalyst that collects the ads users receive and provides aggregate statistics. I will then share our findings from analyzing data from over 600 real-world AdAnalyst users; in particular on who is advertising on Facebook and how these advertisers are targeting users and customizing ads via the platform.


Bio:

Oana Goga is a CNRS research scientist in the Laboratoire d'Informatique Grenoble (France) since October 2017. Prior to this, she was a postdoc at the Max Plank Institute for Software Systems and obtained a Ph.D. in 2014 from Pierre et Marie Curie University in Paris. She is the recipient of a young researcher award from the French National Research Agency (ANR). Her research interests are in security and privacy issues that arise in online systems that have at their core user provided data



Program


Thursday 12 September 2019


8:45- 9:30: Coffee and registration
9:30 - 9:45: Welcome message and opening talk. To be announced

9:45 - 10:45: Oana Goga , (SLIDE team at LIG, Grenoble). Challenges in making social media advertising more transparent
Chair: Gianluca Quercini

10:45 - 11:15: Coffee break

11:20 - 12:00: Morning Session
User-guided algorithms and privacy
Chair: Anderson Carlos Ferreira da Silva

11:20 - 11:40: Privacy preserving algorithm. Marie Garin, ENS Paris-Saclay
11:40 - 12:00: DAS3H: Modeling Student Learning and Forgetting for Optimally Scheduling Distributed Practice of Skills. Benoit Choffin, LRI, Centrale-Supelec

Generative and parsimonious models
Chair: Pierre-Louis Guhur

12:00 - 12:20: Convolutional Capsule Networks. Max Cohen, Samovar, Telecom SudParis
12:20 - 12:40: Labeled text generation for natural language understanding. Hugo Boulanger, LIMSI, ENSIEE



12:40 - 14:00: Lunch in room E101


14:00 - 15:00: Aapo Hyvarinen , (University of Helsinki). Nonlinear independent component analysis. A principled framework for unsupervised deep learning
Chair: Sylvain Le Corff

15:00 - 15:30: Coffee break

15:30 – 16:50: Afternoon Session
Time series
Chair: Thomas Moreau

15:30 - 15:50: Manifold-regression to predict from MEG/EEG brain signals without source modeling. David Sabbagh, INRIA
15:50 - 16:10: Seq2VAR: a multivariate time series representation learning framework. Edouard Pineau, Safran Tech, Signal and Information Technologies
16:10 - 16:30: Learning representations from neural time series with self-supervision. Hubert Banville, INRIA, InteraXon Inc.
16:30 - 16:50: Convergence and Dynamical Behavior of the ADAM Algorithm for Non Convex Stochastic Optimization. Anas Barakat, Telecom Paris


17:00 – 19:00: Poster Session and Cocktail

Posters will be presented in room E101.



Friday 13 September 2019


8:30 - 9:00: Coffee and registration

9:00 - 10:00: Oscar Romero, (Universitat Politecnica de Catalunya). Big Data Variety: The Challenge of On-demand Data Integration.
Chair: Gianluca Quercini

10:00 - 10:30: Coffee break

10:30 - 11:50: Morning Session
Complex and high dimensional data
Chair: Marie Perrot-Dockes

10:30 - 10:50: Non-Stationary Thompson Sampling For Stochastic Bandits with Graph-Structured Feedback. Reda Alami, Universite Paris-Sud, Orange Labs
10:50 - 11:10: Encoding high-cardinality string categorical data. Patricio Cerda, INRIA, Parietal team
11:10 - 11:30: On Binary Classification in Extreme Regions. Hamid Jalalzai, Telecom Paris
11:30 - 11:50: A MQTT OPCUA Configuration Tool. Zepeng Liu, Telecom Paris

11:50 - 12:30: Lunchbox to go

Afternoon Student Activity


Google visit

14:00 - 15:00: Welcome talk. Local team.
15:00 - 15:15: General introduction. Bertrand Rondepierre.
15:15 - 16:00: SWE and OR at Google. Bertrand LeCun.
16:00 - 16:45: Reinforcement learning and robustness. Leonard Hussenot.
16:45 - 17:30: Genomics and machine learning. Felix Raimundo.

Call for submissions


We invite Master M2 and Ph.D. students from the Université Paris-Saclay to submit an extended abstract of up to 3 pages describing new or preliminary results of their scientific work. The following paper categories are welcome:

  • Full paper. The author will be given the chance to present the paper in a 15-minutes talk.
  • Poster. The author will be given the chance to present the paper in the poster session.
  • Poster + demo. The author will be given the chance to present the paper in the poster session and to showcase a demo.

Master students are especially encouraged to submit posters even if they do not have substantial results at the time of submission.

Submissions must be in PDF  and must adopt the style of the Springer Publications format for Lecture Notes in Computer Science (LNCS):

  • A sample submission can be found here. Your paper must be in English, have between 1 and 3 pages,   contain no more than one figure or table, and include the following paragraphs: abstract, keywords, motivation, references (up to 10).
  • You can use either  this latex template or this Word template, at your choice.

We will award a prize to the best full paper and to the best poster.

Submission URL

All papers must be submitted via EasyChair at the following URL: https://easychair.org/conferences/?conf=jdse2019

List of topics

Topics of interest include, but are not limited to:

  • Data mining
  • Databases
  • Big Data analytics
  • Machine learning
  • Statistics
  • Semantic web
  • Scientific workflows
  • Distributed data and computing
  • Applications of data science (biomedical and biological data, physics, chemistry, smart cities, image, documents, audio, video, on-line advertisement, ...)

The extended abstracts will be reviewed by the scientific program committee, including one junior PC member (PhD student or postdoc in data science). All the presentations must be in English. Electronic versions of the extended abstracts will be accessible on the conference web site. The book of abstracts will not be published and the extended abstracts will not constitute a formal publication.

Note: Only Master M2 and Ph.D. students from Université Paris-Saclay are invited to contribute.

Important Dates


Contributions submission deadline: Tuesday the 4th of June, 2019, midnight Tuesday the 11th of June, 2019, midnight

Acceptance notification: Wednesday the 3rd of July, 2019 Thursday the 11th of July, 2019

Second call (posters only) - deadline: Monday the 15th of July, 2019, midnight

Conference: the 12-13th of September, 2019

Registration


This fourth edition of the Paris-Saclay Junior Conference on Data Science and Engineering (JDSE) is only addressed to students and researchers at the University Paris-Saclay. Registration is free but mandatory.

The registration is now closed.



Steering Committee

The role of the steering committee is helping in making the conference durable by its involvement in the choice of the scientific content of the conference, the proposal and the designation of the futur organizers and the location of the conference. It will also be involved in the constitution of the scientific program of the conference by participating in the selection of invited speakers and in scientific committee.


Florence d'Alché-Buc
Télécom ParisTech, UPsay

Sylvain Arlot
Université Paris-Sud, UPsay

Albert Bifet
Télécom ParisTech, UPsay

Sarah Cohen-Boulakia
Université Paris-Sud, UPsay

Flora Jay
CNRS, UPsay

Organizers

The organization comittee manages the logistic and communication aspects of the conference.


Gianluca Quercini
Chair, Centrale Supélec, UPsay

Sylvain Le Corff
Co chair, Télécom SudParis, UPsay

Isabelle Huteau
Conference Communication Leader, Digicosme

Junior Organizing Committee

Molood Arman, Ph.D. student
CentraleSupélec, UPsay

Eloïse Berthier, Ph.D. student
Université Paris-Saclay

Max Cohen, M2 student
Telecom SudParis

Redouane Bouhamoum, Ph.D. student
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Hafsa El Hafyani, Ph.D. student
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Anderson Carlos Ferreira da Silva, Research engineer
LRI, UPsay

Pierre-Louis Guhur, M2 student
École normale supérieure Paris-Saclay, UPsay

Minh Huong Le Nguyen, M2 student
Télécom ParisTech, UPsay

Armita Khajeh Nassiri, M2 student
LRI, UPsay

Souheir Mehanna, Ph.D. student
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Yassine Ouali, Ph.D. student
CentraleSupélec, UPsay

Marie Perrot-Dockes, Ph.D. student
AgroParisTech, UPsay

Mathilde Veron, Research engineer
LIMSI, UPsay

Adnan Zeddoun, Engineering student
CentraleSupélec, UPsay

Jingwei Zuo, Ph.D. student
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Senior Scientific Committee

The senior scientific committee is composed of senior researchers and assistant professors at Université Paris-Saclay. Members of the scientific committee review submitted papers and decide the criteria to assign the prize for the best paper and poster.


Florence d'Alché-Buc
Télécom ParisTech, UPsay

Albert Bifet
Télécom ParisTech, UPsay

Anna Bonnet
Université Pierre et Marie Curie

Sarah Cohen Boulakia
Université Paris-Sud, UPsay

Olivier Fercoq
Télécom ParisTech, UPsay

Pierre Gloaguen
AgroParisTech, UPsay

Alexandre Gramfort
Inria Saclay, UPSay

Céline Hudelot
CentraleSupélec, UPsay

Flora Jay
CNRS, UPSay

Sylvain Le Corff
SAMOVAR, Télécom SudParis

Camille Coron
Université Paris-Sud, UPsay

Matthieu Lerasle
Université Paris-Sud, UPsay

Anne-Laure Ligozat
ENSIIE

Cristina Manfredotti
AgroParisTech, UPSay

Adelaïde Olivier
Université Paris-Sud, UPsay

Gianluca Quercini
CentraleSupélec, UPSay

Fatiha Saïs
Université Paris-Sud, UPsay

Karine Zeitouni
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Junior Scientific Committee

The junior scientific committee is composed of 2nd-3rd year Ph.D. students and postdoctoral researchers and assists the senior scientific committee in the task of reviewing submitted papers.


Marie Al-Ghossein, Postdoctoral researcher
Télécom ParisTech, UPsay

Redouane Bouhamoum, Ph.D. student
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Victor Bouvier, Ph.D. student
CentraleSupélec, UPsay

Benoît Choffin, Ph.D. student
CentraleSupélec, UPsay

Hafsa El Hafyani, Ph.D. student
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Julien Hay, Ph.D. student
CentraleSupélec, UPsay

Luc Lehericy, Ph.D. student
Université Paris-Sud, UPsay

Souheir Mehanna, Ph.D. student
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Heitor Murilo Gomes, Postdoctoral researcher
Télécom ParisTech, UPsay

Thomas Moreau, Postdoctoral researcher
Inria Saclay, UPsay

Ana Luísa Pinho, Postdoctoral researcher
Inria Saclay, UPsay

Augustin Touron, Ph.D. student
Université Paris-Sud, UPsay

Mathilde Veron, Research engineer
LIMSI, UPsay

Jingwei Zuo, Ph.D. student
Université de Versailles-Saint-Quentin-en-Yvelines, UPsay

Partners and sponsors





This work was partially supported by a public grant as part of the Investissement d'avenir project (ANR-11-LABX-0056-LMH), and by Labex DigiCosme (ANR-11-LABEX-0045-DIGICOSME), operated by ANR as part of the program Investissement d'Avenir Idex Paris Saclay (ANR-11-IDEX-0003-02).

Venue


CentraleSupélec
Paris-Saclay campus
Bâtiment Bouygues, 9 rue Joliot Curie
91190 Gif-sur-Yvette

Directions

By public transport

From Paris, take the line RER B to Saint-Remy-lès-Chevreuse. You have three options:

By car