DATE
November 22, 2016
TIME
3pm
LOGISTICS
- hangouts
AGENDA
- report progress for all tasks. Discuss barcamp and hackathon (aka data competition).
PARTICIPANTS
Jean-Pierre Lorre, Sarah Zribi, Tom Jorquera, Maxence Bunel, Zied Sellami (LINAGORA); Michalis Vazirgiannis, Polykarpos Meladianos, Antoine Tixier (LIX)
MINUTES
- Linagora is working on the speech-to-text model for French. The corpus enrichment strategies did not significantly improve performance, the main finding was that a bigger corpus was required. The old 100-hours corpus has now doubled in size. In parallel Ilyes in working on improving the ASR model itself, both from the architecture and parameter tuning standpoint. In particular he is investigating autoencoders with bottleneck features.
- Linagora has also constructed a large corpus of French text from Wikipedia to build a French language model (based on RNNs).
- LIX has extended the offline summarizer to French (using the corpus sent by Tom for the custom list of stopwords), has puhsed the code to the SP5 github repo and deployed the updated web app. LIX also keeps looking into NLG to develop the next version of the offline summarizer.
- LIX has started working on the design of the APIs in accordance with Tom's initial proposal. The offline system's API will be based on a simple rest while the real-time's will need both rest and socket.
- Preliminary versions of deliverables 5.2. and 5.3. have been delivered by LIX.
- An email recommendation data challenge (for M1 or M2 course, to be determined) will be launched in January. There will be two outreach events: one at the beginning of the competition and one at the end. Linagora will tell LIX whether an internal data set or the Enron data set should be used.
- The barcamp for SP5 will be held in Palaiseau in January, and will be about finalizing the API communication. To minimize travel, the barcamp and the first outreach event for the data competition will be synchronized.