putting hybrid recommendation system into production

Putting a Hybrid Recommendation System into Production

There aren’t many code examples on the internet about Recommendation Systems, and even less about how to put a hybrid (collaborative filtering and content based) recommendation system into production. With this blog post, we aim to fill the gap. For this example, we will be using LightFM – a fast and scalable Python implementation of hybrid recommender algorithms, and the MovieLens dataset. You can find the complete code on GitHub here.

We’ll assume that you are familiar with recommendation systems, collaborative filtering, content-based filtering, and hybrid methods based on matrix factorization. So, let’s get started!

There are many challenges when deploying a recommendation system into production. Here are some of the most frequent ones that you might need to solve as well in your recommendation system:

  • How to make Real-time recommendations
  • Quickly incorporate new interactions to the model
  • Predicting for new users / items
  • Measures beyond accuracy, like how surprising the recommendations are
  • Popularity bias
  • Ranking bias
  • Self-fulfilling prophecy
  • Robustness against fraud

For the purpose of this article, we will focus on the first 3 challenges (which when solved will allow you to offer ‘fast’ recommendations to users) – how to make real-time recommendations, how to incorporate new interactions rapidly, and how to predict for new users / items. Furthermore, we will also present an example that illustrates this, using the LightFM package.

How to make Real-time Recommendations

You can achieve real-time recommendations by the right architecture design for your problem. You have two options:

  1. After the recommender model was updated, generate scores for all the user-item pairs, and save it to a database.
  2. Host the recommender model on a server and serve recommendations on demand for the user-item pairs at the time it is needed, via an API.

Choose the first option if you have small to medium amount of data because:

  • Generating scores for all user-item pairs will be relatively fast
  • It uses computation resources only when training and batch-predicting for the users, after that the virtual machine can be shut down. You don’t need to keep a machine live to receive requests any time
  • It is simpler because it’s enough to run batch-prediction as a job and you don’t need to deal with API’s and the complexities of how to update the model without downtime

Choose the second option if you have large amount of data, because:

  • It is inefficient or impossible to generate scores for all user-item pairs

 

The Data

Going forward, we will be using the MovieLens dataset to illustrate how to productionize a hybrid recommendation system, using LightFM. This version of the dataset was used from Kaggle. It includes user and item features, and explicit ratings for the movies between 1 and 5. However LightFM is designed for implicit feedback data. So, we will treat the ratings as implicit feedback. It means that if the user rated the movie it counts as positive feedback, if not then negative. We do one more trick, if the user rated the movie to below 3 stars, we also treat it as negative feedback, because we know that the user didn’t like the movie.

The dataset is split into two parts. The “current” interactions, representing the data that is available when training the complete model, and “new” interactions, representing the ones that happened after training, but we still want to consider for recommendations. The split is done using a threshold on the time of the interaction. As a result, in the “new” dataset, there will be completely new users, that weren’t present in the “current” set, and there will be new interaction from old users too. One of the challenges is to incorporate the completely new users without having to retrain the complete model.

user interactions
Figure 1. User interactions in time. Users are ordered by their first interaction. Every dot represents an interaction, and the red line represents the ‘current’, ‘new’ split

 

Then the “current” set is split to training and validation set. In this post we present the highlights of the code.

interactions_current, interactions_new = current_new_split(interactions, 0.9)
interactions_current_train, interactions_current_val = train_test_split_randomly(interactions_current, 0.3)

Training the complete model

First, we need to transform the data into a matrix format required for the LightFM model, using the Dataset helper class.

user_ids_buffered = range(1000)
item_ids_buffered = range(1700)

user_feature_names = user_features.columns.tolist()
user_feature_names.remove("user_id")

item_feature_names = item_features.columns.tolist()
item_feature_names.remove("movie_id")

dataset = Dataset()
dataset.fit(
    users=user_ids_buffered,
    items=item_ids_buffered,
    user_features=user_feature_names,
    item_features=item_feature_names
)

Here, we make use of the first trick. Even though there are only 943 users and 1682 movies in the training data, we initialize the dataset with more users / items (1000 and 1700 respectively) to give room for the new users / items. The reason is that once a LightFM model was trained, new users / items can’t be added unless the complete model is retrained. Hence at the time of training the complete model, we anticipate that some new users / items will join, and we add dummy users / items to the training set. These dummy users / items are empty, they don’t have any interactions or features, but they make sure that the matrices in the model will have the right size.

Then, we build the interaction and feature matrices using the Dataset class and some helper functions that transform the data frames into formats build_interactions and build_user_features, build_item_features expect them.

interaction_matrix_current, _ = dataset.build_interactions(
    transform_interactions(interactions_current))
interaction_matrix_current_train, _ = dataset.build_interactions(
    transform_interactions(interactions_current_train))
interaction_matrix_current_val, _ = dataset.build_interactions(
    transform_interactions(interactions_current_val))

user_features_matrix_current = dataset.build_user_features(
    transform_features(user_features_current, "user_id"))
item_features_matrix_current = dataset.build_item_features(
    transform_features(item_features_current, "movie_id"))

After that, we perform hyperparameter tuning. For the sake of simplicity, we tune only the most important parameter, the number of epochs.

model = LightFM(loss="warp")
evaluator = Evaluator()
for epochs in [100, 200, 300, 500, 1000, 2000]:
    print(f"Epochs: {epochs}")
    
    model.fit(
        interaction_matrix_current_train,
        epochs=epochs,
        user_features=user_features_matrix_current,
        item_features=item_features_matrix_current
    )
    
    evaluator.evaluate(
        epochs, model,
        interaction_matrix_current_train, interaction_matrix_current_val,
        user_features_matrix_current, item_features_matrix_current)

The best model was training for 2000 epochs, having a 0.9216 validation ROC AUC score. So, we train the model with this parameter on the complete “current” dataset.

best_epochs = evaluator.get_best_epochs()
model.fit(
    interaction_matrix_current,
    epochs=best_epochs,
    user_features=user_features_matrix_current,
    item_features=item_features_matrix_current
)

In our imagined scenario we can deploy this model into production.

How to Quickly incorporate new interactions to the model

In an ideal situation the recommendation system would immediately incorporate every new interaction to the model. Meaning, when the user is browsing the website, and makes the next click, the recommendations already take into account all the previous clicks. Since there is a collaborative filtering component of the hybrid recommendation system, for every new interaction we would need to retrain the entire model. However, LightFM has a method called fit_partial, to train only part of the model, using only the new interactions. This function makes it possible to quickly incorporate new interactions to the model, but potentially at some cost to model stability.

Hence, a good approach would be the following:

  1. Use fit_partial to frequently incorporate new interactions to the model
  2. From time to time retrain the entire model from scratch

Let’s see how the fit_partial works. Once again, we can use cross-validation to decide for how many epochs we should train. However, the number of new interactions might be small, so it is a good idea to do the train validation split multiple times and average up the performance metrics. Calculating the metrics and the averaging of multipole train validation splits is implemented in Evaluator helper class, you can see it in the notebook. Notice, that every time we run a test, we make a copy of the original model and run the fit_partial on the copy.

evaluator = Evaluator()
for epochs in [100, 500, 1000, 2000]:
    print(f"Epochs: {epochs}")

    num_fold = 5
    for fold in range(num_fold):
        print(f"fold: {fold}")
              
        interactions_new_train, interactions_new_val = train_test_split_randomly(interactions_new, 1/num_fold)
        interaction_matrix_new_train, _ = dataset.build_interactions(
            transform_interactions(interactions_new_train))
        interaction_matrix_new_val, _ = dataset.build_interactions(
            transform_interactions(interactions_new_val))
        
        model_fold = deepcopy(model)
        model_fold.fit_partial(
            interaction_matrix_new_train,
            user_features=user_features_matrix_new,
            item_features=item_features_matrix_new,
            epochs=epochs
        )

        evaluator.evaluate(
            epochs, model_fold,
            interaction_matrix_new_train, interaction_matrix_new_val,
            user_features_matrix_new, item_features_matrix_new)

Once we found the best number of epochs, we can call the fit_partial on the complete set of “new” interactions.

best_epochs = evaluator.get_best_epochs()
model.fit_partial(
    interaction_matrix_new,
    user_features=user_features_matrix_new,
    item_features=item_features_matrix_new,
    epochs=best_epochs
)

fit_partial is able to include completely new users to the model that weren’t present when training the complete model, because at that time we included some additional dummy users to the dataset.

Now, let’s have a look at an example of these new users, freshly included to the model. This user rated the following movies (to 3 stars or more):

movie_title feature_list
Dead Man Walking (1995) [‘Drama’]
People vs. Larry Flynt, The (1996) [‘Drama’]
Fargo (1996) [‘Crime’, ‘Drama’, ‘Thriller’]
Ransom (1996) [‘Drama’, ‘Thriller’]
Independence Day (ID4) (1996) [‘Action’, ‘Sci-Fi’, ‘War’]
Star Wars (1977) [‘Action’, ‘Adventure’, ‘Romance’, ‘Sci-Fi’, ‘War’]
Scream (1996) [‘Horror’, ‘Thriller’]
English Patient, The (1996) [‘Drama’, ‘Romance’, ‘War’]
Chasing Amy (1997) [‘Drama’, ‘Romance’]
Mary Reilly (1996) [‘Drama’, ‘Thriller’]
Twister (1996) [‘Action’, ‘Adventure’, ‘Thriller’]
Face/Off (1997) [‘Action’, ‘Sci-Fi’, ‘Thriller’]
Breakdown (1997) [‘Action’, ‘Thriller’]
River Wild, The (1994) [‘Action’, ‘Thriller’]
Ghost and the Darkness, The (1996) [‘Action’, ‘Adventure’]
Boot, Das (1981) [‘Action’, ‘Drama’, ‘War’]
Unforgettable (1996) [‘Sci-Fi’, ‘Thriller’]
Smilla’s Sense of Snow (1997) [‘Action’, ‘Drama’, ‘Thriller’]
Trees Lounge (1996) [‘Drama’]
Frighteners, The (1996) [‘Comedy’, ‘Horror’]

And these are our top recommendations for the user, excluding those movies that were already rated:

movie_title feature_list
Twelve Monkeys (1995) [‘Drama’, ‘Sci-Fi’]
Contact (1997) [‘Drama’, ‘Sci-Fi’]
Time to Kill, A (1996) [‘Drama’]
Leaving Las Vegas (1995) [‘Drama’, ‘Romance’]
Jerry Maguire (1996) [‘Drama’, ‘Romance’]
Star Trek: First Contact (1996) [‘Action’, ‘Adventure’, ‘Sci-Fi’]
Trainspotting (1996) [‘Drama’]
Titanic (1997) [‘Action’, ‘Drama’, ‘Romance’]
U.S. Marshalls (1998) [‘Action’, ‘Thriller’]
Air Force One (1997) [‘Action’, ‘Thriller’]
Lost Highway (1997) [‘Mystery’]
Rock, The (1996) [‘Action’, ‘Adventure’, ‘Thriller’]
Screamers (1995) [‘Sci-Fi’]
Mission: Impossible (1996) [‘Action’, ‘Adventure’, ‘Mystery’]
Men in Black (1997) [‘Action’, ‘Adventure’, ‘Comedy’, ‘Sci-Fi’]
Courage Under Fire (1996) [‘Drama’, ‘War’]
Absolute Power (1997) [‘Mystery’, ‘Thriller’]
Primal Fear (1996) [‘Drama’, ‘Thriller’]
Juror, The (1996) [‘Drama’, ‘Thriller’]
Godfather, The (1972) [‘Action’, ‘Crime’, ‘Drama’]

From this, we can see that this user is mainly interested in drama, flavored with some sci-fi, thriller and romance, but definitely not comedy. We see it reflected in the recommended movies, so we can assume that the model is working properly.

Summary

In this article, we learned about two architectures that allow you to make real-time recommendations, and when to use them. We used the MovieLens dataset to train a LightFM model. We learned how to quickly incorporate new interactions to the model, without completely retraining it, using the fit_partial function. Finally, we showed how to prepare the model to anticipate new users / items, by padding the training data with some dummy users / items.

Are you curious to learn how you can solve the other challenges we mentioned in the beginning? Contact us to learn more or subscribe to our newsletter where we frequently share technical blogs, upcoming events, podcasts, and more.

Like this article and want to stay updated of more news and events?
Then sign up for our newsletter!

Don't miss out!

Subscribe to our newsletter and stay up to date with our latest articles and events!

Subscribe now

Newsletter Subscription