By Charles Ollion, Head of Research at Heuritech
I just returned from ICML2015 (July 6-11), and wanted to share some of my impressions. ICML is, with NIPS, one the most well known Machine Learning conference, in which you can meet the best scientists of the field. I was mostly interested in Deep Learning, especially text and image analysis, as the advances in those field directly influence my job at Heuritech.
Deep Learning was huge at ICML : the biggest lecture hall was almost always filled (sometimes with a thousand people). Many interesting papers were presented by Google (Google and/or Deepmind) or Facebook, but academia was also bringing some very interesting advances. Deep Learning seems more mature than ever for several industry applications, which explains why so many companies were sponsoring or just present at the conference. There were both web giants (Google, Facebook, Amazon, Microsoft Reasearch, Twitter,…) as well as a few startups.
Panel Discussion on the Future of Deep Learning
At the end of the workshop on Deep Learning, Yoshua Bengio (University of Montreal), Neil Lawrence (University of Sheffield), Juergen Schmidhuber (IDSIA), Demis Hassabis(Google DeepMind), Yann LeCun (Facebook, NYU) and Kevin Murphy (Google) were invited to talk about the future of Deep Learning (and AI a little bit). They also discussed about DL/AI hype and possible AI winter, as well as Academia vs Industry Research.
This blog post by Kyunghyun Cho (organizer of the DL Workshop) gives a very interesting summary of the discussion.
A few papers worth reading
In the following, I have selected a few presentations that deal with text and image analysis. Feel free to comment!
Word and Phrases Embeddings
BilBOWA / presented by Stephan Gouws – github code
Jointly trained multi lingual embedding (i.e. English/French word embedding).
Train a model from an english only dataset, a french only dataset (both are easy to find) and a small parallel english/french dataset (here Europarl).
Tricks to make it work efficiently : aligned phrases sampling, subsampling for too frequent words, asynchronous learning of the different data (2 monolingual and 1 bilingual)
Modeling Order in Neural Word Embeddings at Scale / presented by David Gilmore
Two different models (which are then combined):
– letter2vec for a morphological/syntax analysis of words
– a new skipgram model involving word order
Interesting model, but the representation of words seems quite impractical (more that 5K dimensions for a single word, 160billion parameters…)
From Word Embeddings to Document Distances / presented by Matt Kusner
New method to compare documents based on their wordvectors. Instead of averaging the documents, they use a Word Mover Distance between the documents.
Quite simple and easy to implement, and very nice improvement over the averaging of wordvectors.
Jointly train the two following:
– an embedding with PCA (skip gram or other would be possible) for predicting the wordvectors
– a regularizer so that the sum of the wordvectors should be close to the distribution probability of the context words
A short sentence can the be accurately represented by averaging its wordvectors, which can be very useful.
Skip-thought Vectors / discussion with Kyunghyng Cho – github code
I had the opportunity to discuss with Kyunghyng Cho about Skip-Thought Vectors even though he isn’t one of the authors (Ryan Kiros wasn’t at the conference).
More generally we discussed about how to build efficient sentence and event document representation with little or no supervision. He had some great insights, and I will follow very closely the following work of these people. I’ll post more on that
Multimodal approaches: Images and Captions
Show Attend and Tell: Neural Image Caption Generation with Visual Attention / presented by Kelvin Xu – github code
Training a model to generate captions from images (Microsoft COCO dataset). Using a pretrained CNN encoder, and a LSTM decoder (this is the current trend).
On top of that, an attention mechanism is added, which specifies which areas of the image are being processed. There are two kinds of attention tested:
– soft (deterministic): multiply image input by the attention layer (like gating units)
– hard (stochastic): maximizing a variational lower bound to select which places to ‘attend’
A few others : Embedding of programming languages!
Bimodal Modeling of Source Code and Natural Language / by Miltos Allamanis MSR
Learning Program Embedding to propagate feedback on student code / by Chris Piech
Interesting new Autoencoder approach
MADE : Masked AutoEncoder for Distribution Estimation by Mathieu Germain
I had a really great time at ICML, met a lot of interesting people, and confronted my ideas with many researchers. I came back to Heuritech more motivated than ever, and with a lot of insights that I am currently using in my new models!