Consumer motion sequences are among the many strongest inputs in recommender techniques: your subsequent click on, learn, watch, play, or buy is probably going no less than considerably associated to what you’ve clicked on, learn, watched, performed, or bought minutes, hours, days, months, and even years in the past.
Traditionally, the established order for modeling such consumer engagement sequences has been pooling: for instance, a basic 2016 YouTube paper describes a system that takes the most recent 50 watched movies, collects their embeddings from an embedding desk, and swimming pools these right into a single function vector with sum pooling. To save lots of reminiscence, the embedding desk for these sequence movies is shared with the embedding desk for candidate movies themselves.
This simplistic strategy corresponds roughly to a bag-of-words strategy within the NLP area: it really works, nevertheless it’s removed from perfect. Pooling doesn’t take into consideration the sequential nature of inputs, nor the relevance of the merchandise within the consumer historical past with respect to the candidate merchandise we have to rank, nor any of the temporal data: an…