DATE
November 22, 2016
TIME
3pm
LOGISTICS
- hangouts
AGENDA
- report progress for all tasks
PARTICIPANTS
Jean-Pierre Lorre, Sarah Zribi, Tom Jorquera, Maxence Bunel, Zied Sellami (LINAGORA); Michalis Vazirgiannis, Polykarpos Meladianos, Antoine Tixier (LIX)
MINUTES
- Linagora is working on the speech-to-text model for French. The corpus enrichment strategies did not significantly improve performance, the main finding was that a bigger corpus was required. The old 100-hours corpus has now doubled in size. In parallel Ilyes in working on improving the ASR model itself, both from the architecture and tuning standpoint. In particular he is investigating autoencoders and bottleneck features.
- Linagora has also constructed a large corpus of French text from Wikipedia to build a French language model (based on RNN).
- LIX has extended the offline summarizer to French (using the corpus sent by Tom to build a custom list of stopwords), has puhsed the code to the dedicated repo and deployed the updated web app. LIX also keeps looking into NLG to improve the offline summarizer.
- LIX has started working on the design of the APIs in accordance with Tom's initial proposal. The offline system will be based on simple rest while the real-time will need both rest and socket.
- Preliminary versions of deliverables 5.2. and 5.3. have been delivered by LIX
- An email recommendation data challenge (for M1 or M2 course, to be determined) will be launched in January. There will be two outreach events: one at the beginning of the competition and one at the end. Linagora will tell LIX whether an internal data set or the Enron data set should be used.
- The barcamp for SP5 will be held in Palaiseau in January, and will be about finalizing the API communication. To minimize travel, the barcamp and the first outreach event for the data competition will be synchronized.