Photos Are All You Need for Reciprocal Recommendation in Online Dating

by   James Neve, et al.
University of Bristol

Recommender Systems are algorithms that predict a user's preference for an item. Reciprocal Recommenders are a subset of recommender systems, where the items in question are people, and the objective is therefore to predict a bidirectional preference relation. They are used in settings such as online dating services and social networks. In particular, images provided by users are a crucial part of user preference, and one that is not exploited much in the literature. We present a novel method of interpreting user image preference history and using this to make recommendations. We train a recurrent neural network to learn a user's preferences and make predictions of reciprocal preference relations that can be used to make recommendations that satisfy both users. We show that our proposed system achieves an F1 score of 0.87 when using only photographs to produce reciprocal recommendations on a large real world online dating dataset. Our system significantly outperforms on the state of the art in both content-based and collaborative filtering systems.



There are no comments yet.


page 1

page 2

page 3

page 4


Deep Social Collaborative Filtering

Recommender systems are crucial to alleviate the information overload pr...

Recommender Systems with Characterized Social Regularization

Social recommendation, which utilizes social relations to enhance recomm...

Controllable Recommenders using Deep Generative Models and Disentanglement

In this paper, we consider controllability as a means to satisfy dynamic...

"It doesn't look good for a date": Transforming Critiques into Preferences for Conversational Recommendation Systems

Conversations aimed at determining good recommendations are iterative in...

Single-Item Fashion Recommender: Towards Cross-Domain Recommendations

Nowadays, recommender systems and search engines play an integral role i...

Local Optimality of User Choices and Collaborative Competitive Filtering

While a user's preference is directly reflected in the interactive choic...

Recommender Systems with Random Walks: A Survey

Recommender engines have become an integral component in today's e-comme...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Recommender Systems (RS) are personalisation tools that are used by online services to recommend items to users (Pizzato et al., 2013). RSs usually make recommendations by computing a preference score between and that represents the extent to which a particular user would like a particular item. This is done by using explicit preference information (for instance, a user’s profile where they have specified their preference) or implicit preference information, such as a user’s purchase history. RSs have become increasingly advanced over the last decade, and most popular online services such as Amazon and Netflix use them to enhance their users’ experience (Bell and Koren, 2007).

Reciprocal Recommender Systems (RRSs) are a subtype of RSs that recommend people to other people. They are commonly used in online dating and social services. While RSs make recommendations based on a unidirectional preference relation involving an inanimate item, RRSs are inherently more complex in that they must make recommendations based on both sides of a bidirectional preference relation. Applying a conventional recommender system to a reciprocal environment results in recommendations that are only satisfying to one of the two users involved in the interaction. RRS design also involves a number of considerations that are not involved in unidirectional recommendation. For instance, a popular product being repeatedly recommended is not usually a problem, but a popular user appearing in everyone’s recommendations often represents a negative experience for that user (Kleinerman et al., 2018). RRSs are often adapted from RSs, where two unidirectional preferences are computed and then combined into a single preference score that represents the preference of two users for each other.

RSs (and therefore RRSs) are often categorised as content-based or collaborative filtering systems. Content-based systems make recommendations based on a user’s preference for specific aspects of an item. These preferences are sometimes explicit, but are more usually inferred implicitly from a user’s preference for previous items (Aggarwal, 2016). Collaborative filtering algorithms use correlations between multiple users to make recommendations, often by extracting latent factors from a preference matrix of users and items, and inferring preference for those latent factors through historical preferences. Historically, collaborative filtering algorithms have been more effective than content-based algorithms (Aggarwal, 2016). However, in content-rich environments, the reverse can be true. Content-based filtering algorithms also tend to be more effective at solving the Cold-Start Problem (Lam et al., 2008; Lin et al., 2013), where it is difficult for the system to make effective recommendations for new users because of the lack of preference history.

Online dating services and social networks often provide a content-rich environment, with users making decisions about whom to express preference to based on a great many factors, including image data, free text profiles and categorical data such as age and job. In particular, image data is extremely important to modern social services, with many such as Instagram using images as the basis for interactions. This has been demonstrated by informal research from industry111 In this paper, we present a novel recommender system, Temporal Image-Based Reciprocal Recommender (TIRR), that uses a Recurrent Neural Network (RNN) to interpret a user’s history of preferences for images, and make predictions about their future preferences in order to make recommendations. This is a significant improvement on the only other previous image-based RRS, ImRec(Neve and McConville, 2020), in the sense that it outperforms both ImRec (previously the state of the art in content-based reciprocal recommendation) and also the current state of the art collaborative filtering solutions.

In addition to the advantages in terms of its improvement in the ROC curve on cross-validation, TIRR is also an advance of the field in the sense that it provides a unified system that predicts matches directly, as opposed to two separate predictions of unidirectional preferences followed by an aggregation. There is some doubt as how to combine two unidirectional scores into a single bidirectional score in a way that is fully representative of two users’ bidirectional preference for each other; TIRR solves this by predicting the bidirectional relation end to end.

The system was tested using a popular online dating service. We used users and approximately expressions of preference combined split across train and test sets.

The original contributions of this paper are therefore threefold:

  1. We present a content-based RRS, TIRR, that makes recommendations based on historical sequences of images utilizing Siamese networks and LSTMs.

  2. Previous RRSs predict two unidirectional preferences and then aggregate them; the end-to-end algorithm detailed in this paper predicts the probability of a match directly.

  3. Based on tests using real-world data, TIRR outperforms not only other content-based RRSs but also the state-of-the-art collaborative filtering RRS.

2. Related Works

This section contains a review of other academic works that formed the background for this study. This includes works on RRSs, content-based recommendation and on recurrent neural networks for understanding image-based histories.

2.1. Reciprocal Recommender Systems

RRSs are recommender systems used for person-to-person matching, in settings such as online dating, social networks (He and Chu, 2010) and job recommendation (Siting et al., 2012). They are complex in the sense that they need to consider the preferences of both sides.

The earliest RRS in the literature is RECON (Pizzato et al., 2010). This is a content-based recommender system designed by Pizzato et al. based on recommendation using categorical data such as age and hobbies. For two users and , the algorithm calculates the preferences of the two users and

as vectors based on their historical preferences for individual attributes. Using these historical preferences, RECON estimates unidirectional preferences


and combines them using the harmonic mean into a single bidirectional preference relation that represents the projected preference of the two users for each other.

RCF (Xia et al., 2015) extended reciprocal recommendation with an implementation of a collaborative filtering system. RCF uses nearest-neighbour-based recommendation, where for candidate users and it calculates the similarity between and the other users that have liked and vice-versa. RCF demonstrated improvements in a number of areas over RECON, and was at the time considered the best in class reciprocal recommender system.

Subsequently, a number of systems have made improvements to both collaborative and content-based systems, in addition to designing hybrid systems that exploit the best of both subtypes. For example, Kleinermann et al. improved on RCF by reducing the bias of user popularity on recommendations (Kleinerman et al., 2018). ImRec demonstrated that image-based recommendation was more effective than recommendation based on categorical profile data (Neve and McConville, 2020).

2.2. Content-Based Recommender Systems

Content-based recommender systems make recommendations based on users’ preferences for specific content on a service. This might be structured content such as the category of an item, or it might be unstructured content such as images and free text description.

Recommendations based on unstructured data appear most often in news recommendation (Lang, 1995; van Meteren and van Someren, 2000). This is a rich area for research because news articles are often written with set structures that make them easier to process, and also because of concerns about serendipity and recommendations reinforcing existing echo chambers. Papers such as (Bansal et al., 2016)

demonstrate the capacity for deep learning systems based on freetext information to make effective recommendations.

Examples of content-based recommender systems basing their results on images is much less common. Lei et al. used user preferences to train a model based on ImageNet

(Deng et al., 2009) that predicted user preference for one of two images (Lei et al., 2016). This trains a network to map both users and images into the same space by generating embeddings for both, with images that the user preferred being close to the user in the space, and images the user did not like being further away. User preference for subsequent images can then be predicted by relative distance from the user.

Another example is DeepStyle (Liu et al., 2017), which uses a Siamese Network to predict user preference for clothes based on images. DeepStyle uses pairs of positive and negative samples with user preference as the output to differentiate between the two images. This can then be used to make predictions about whether a user might like a new image by comparing it to an existing liked image.

2.3. Recurrent Neural Networks

This paper uses Recurrent Neural Networks (RNNs) to interpret time series data for recommendation. RNNs contain loops, which feed the output of a network back into current neurons. This means that they implement a concept of

memory: they store computed results, and these results have an impact on subsequent predictions. Each step therefore incorporates information from the previous steps into the prediction.

Standard RNNs are particularly good at processing short sequences, but their memory is short-term memory: when training them using longer sequences, the early items in the sequence have very little impact on the final prediction. This is known as the vanishing gradient problem (Hochreiter, 1998)

.This also exists in deep neural networks, where early layers learn very slowly when trained with backpropagation.

Various architectures have been proposed to overcome this limitation. One that has been particularly successful in allowing RNNs to hold and use information for longer is the Long Short-Term Memory Network (LSTM) (Hochreiter and Schmidhuber, 1997). A LSTM uses a forget gate

comprised of a Sigmoid function that determines whether information is kept or not: a value close to

results in the information being forgotten by the network, whereas closer to results in the information being stored. This allows for much longer sequences to be processed, which is particularly useful in the field of recommendation, where long sequences of user behaviour are common. RNNs have been used successfully in recommender systems to incorporate time series data into recommendations (Twardowski, 2016; Wu et al., 2017), but not in reciprocal recommender systems.

3. Methodology

In this section, we describe a model that produces predictions about user preferences based on the RNN interpretation of user history. The RNN-based model uses a pre-trained siamese network at its core, so we describe that independently first, and then its use in the context of the RNN.

3.1. Problem Formulation

The online dating service we used currently only supports heterosexual relationships. We can therefore assume that for a set of users of one gender there is a set of candidate users for recommendation . A user may have an ordered history of preference expressions for users of length , for example, where represents the expression of positive or negative preference of user for the user at time .

In our reciprocal system, our objective is to estimate , the reciprocal preference score that represents the projected degree of preference of two users for each other. We consider that is a function of the historical preferences of and as well as the two users themselves, and train a model to predict it using all of this information:


Where represents the parameters of the model. Note that contrary to most previous approaches to RRSs, our approach trains a single model to predict reciprocal preference using all of the information, as opposed to combining the results of two models predicting unidirectional preference. Also note that the reciprocal preference is symmetrical i.e. .

3.2. Service & Data

The data for our model was provided by a popular online dating service with several million registered users. On this service, the user experience is streamlined so that everyone goes through the same process of interaction.

A user finds other users by searching, or by viewing recommendations on a list page. From the list page, they can view profiles with images, text and categorical data. If finds a user that they want to interact with, they can send a . In our algorithm, this is used as a unidirectional indicator of preference.

User can choose whether or not to reciprocate this . If they do reciprocate, this is considered a ; if not it is considered a . These represent bidirectional indicators of preference or negative preference respectively. Users who have can subsequently message each other, and potentially agree to meet in person.

As we wanted to focus on an algorithm that measured personal attractiveness of users for each other, we made the decision to exclude images from the dataset that did not include user faces using automatic face detection. It is common for deep learning based on faces to also include cropping an affine transformation of features, but preliminary experiments showed that this did not improve our results.

We also made a number of exclusions in order to increase the reliability of the dataset. We excluded users who had been removed from the service for any reason (often these users are not using the service correctly, which implies that they are not expressing preferences based on their own intuition). Although for privacy reasons we are unable to release the dataset used in our experiments, we do hope that the algorithm will be reproduced on other services.

3.3. Siamese Network Unidirectional User Preference Learning

[Siamese Network Architecture]A representation of the Siamese Network.

Figure 1. The architecture of the Siamese network used to learn unidirectional user preferences, which forms a component of TIRR.

In this section, we briefly describe the Siamese network (Koch et al., 2015) we used to learn unidirectional user preferences, a building block of our proposed method, originally included as a component of ImRec (Neve and McConville, 2020). We will utilize this in a novel way to ImRec to demonstrate superior performance.

3.3.1. Siamese Network Concept

Siamese networks are commonly used in object recognition (Vinyals et al., 2016) and tracking (Bertinetto et al., 2016), where they have excelled for face verification in scenarios with relatively little training data, known as one-shot or few-shot learning. As shown in Figure 1

, a Siamese network consists of two symmetrical CNNs with shared weights, and a loss function based on the outputs of these two networks and a ground truth label.

The network is trained from tuples of the form from where and are photos of that have been Liked by and is a photo of that has been Disliked by . Using as an anchor, two pairs are made from the tuple; is a positive pair where the expected output is and as the negative pair where the expected output is . From these pairs, the network is trained to differentiate between a Liked and a Disliked image, given another Liked image.

The key to this is that the network uses shared weight parameters for the training and inference process. We map and to and using , which are two points in a 128 dimensional space. We calculate the difference between the two points as follows:


Relating the Siamese network back to our original problem formulation in Section 3.1, this gives us a basis for estimating a unidirectional preference relation based on two images, one image from ’s preference history , and the current user , solving the problem:


3.3.2. Network Layers

Layer Size-in Size-out Kernel Param
input 100x100x3 0
conv1 100x100x3 100x100x3 7x7x3 444
maxpooling1 100x100x3 34x34x3 3x3
normalization1 34x34x3 34x34x3 12
conv2 34x34x3 34x34x64 3x3x64 1792
maxpooling2 12x12x64 12x12x64 3x3
normalization2 12x12x64 12x12x64 256
conv3 34x34x3 12x12x192 2x2x192 49344
maxpooling3 12x12x64 4x4x192 3x3
conv4 4x4x192 4x4x384 2x2x384 295296
maxpooling4 4x4x384 2x2x384 3x3
conv5 2x2x384 2x2x256 1x1x256 98560
conv6 2x2x256 2x2x256 3x3x256 590080
maxpooling5 2x2x256 1x1x256 3x3
flatten 1x1x256 256
dense1 256 256 65792
dense2 256 128 32896
Table 1. The layers of the CNN used as the symmetrical part of the Siamese Network.

Table 1 shows the architecture of the CNN that makes up the two symmetrical branches of the Siamese network. The small convolution kernels used have been shown to effectively identify facial features in deep convolutional networks (Parkhi et al., 2015). The network was trained using an Adam optimiser, with a learning rate of .

The output of the network is a value between and expressed by a Sigmoid function, representing whether is more likely to Like or Dislike based on the two images. In the next section, we extend this to use the whole of ’s preference history and show how this becomes an even more effective predictor of preference.

3.3.3. Loss Function

The Siamese network was trained using binary crossentropy. This is a standard loss function used in training neural networks, the formula for which is given below. In the following equation,

is the binary variable representing Like and Dislike,

is the embedded distance between two images, is a neural network and is the predicted probability of resulting in a Like.


Binary crossentropy was shown experimentally to result in the highest effectiveness metrics for the network.

We note that it is also common to train Siamese networks using a contrastive loss function, which uses a margin to increase the network’s error when it misclassifies two very similar images. In most situations where a Siamese network is applied, such as face detection, misclassifying two very similar images is as incorrect as misclassifying two very different images. However, this is not true in our application, where user preferences are not necessarily categorical, and similar images are more likely to be liked by a user than very different images. This explains why binary crossentropy may have been more effective in our tests.

The contrastive loss function is defined below, where the terminology used is the same as in equation 4 and is the margin:


3.4. Incorporating RNNs for Learning User Preference History

[LSTM Network Architecture]A representation of the LSTM trained with outputs from pre-trained Siamese Networks.

Figure 2. TIRR: the architecture to predict matches using an LSTM to interpret historical preference data on user photos.

The Siamese network described above, when trained on unidirectional preference, is an effective if elementary model. In this section, we describe the RNN we use to interpret the user history based on the results of the Siamese network.

The output of the Siamese network is a point in 128-dimensional space that represents the preference of a user for an image based on comparison with the anchor image . Based on initial experimental work, we chose an LSTM-based RNN architecture to interpret the time series of images. The forget gate of the LSTM is particularly intuitive in this case. For a state at time , a forget gate described by , a write gate and a candidate write derived from the input and the previous state, the next state is described by the equation:


We might intuitively expect that preferences expressed by users would change over time, and the forget behaviour of the LSTM allows us to model this, with the input for the state of the LSTM modelling the preferences of user being the user , and the final input at being the user whom we wish to estimate ’s preference for.

The LSTM is visualised in Figure 2. Because users have variable length preference histories, we fill the histories of users with shorter histories with dummy images and use a masking layer to filter them. The LSTM and subsequent dense neural network form a representation in 256-dimensional space of the user’s preference as a time series.

Layer Size-in Size-out Kernel Param
input 128x15 0
LSTM 128x15 128 128
dense1 128 128
concat 128x2 256
dense2 256 128 128
output 128 1
Table 2. The layers of TIRR following the mapping of images into 128-dimensional space by the pre-trained Siamese network

Specifically, the network consists of an input layer, which accepts a maximum of outputs from Siamese networks in 256-dimensional space concatenated together. Experiments determined that more than this did not significantly alter the performance of the network. The layers are described in Table 2. If a user has fewer preferences expressed than this, the earlier images are filled with zeroes, and the network learns to interpret this as dummy data. Following the LSTM, the network consists of a single dense layer of 128 neurons, and then a dropout layer with a dropout rate of . The network was trained with an Adam optimiser with a learning rate of .

3.5. Training and Match Prediction




preprocess images

pretrain siamese network



preprocess images

train LSTM network

evaluate TIRR

test matches

[Training Flowchart]A flowchart describing how the Siamese Network and subsequent LSTM network are trained and validated

Figure 3. The process by which TIRR is trained. Three independent datasets used represented by different colours.

This section describes training the network to predict matches between two users. As described in Section 3.2, our objective is to differentiate between interactions consisting of bidirectional expressions of preference, Matches, and unidirectional expressions of negative preference, Dislikes.

The full training process is visualised in Figure 3. Our experiments determined that the network trained extremely slowly when trained in its full form from an initial randomised state, and we therefore pre-trained the Siamese network segment of the network using one dataset, shown in green. The subsequent training of the full system on matches was done using a separate dataset, shown in red. The final evaluation was done using a third dataset, shown in blue. In addition, Neve et al. demonstrated that the Siamese Network training was more effective when two networks were trained separately on male and female data (Neve and McConville, 2020). As the service providing our data currently only supports heterosexual dating, this split does not decrease the usefulness of the application in this case.

Training for the Siamese networks were based on triplets sampled from users split evenly over male and female images. Images were cropped and centered on the faces of users before training. Other methods of preprocessing such as affine transformations, which have been shown to improve the predictive power of other networks (Lewenberg et al., 2016) did not have any impact on performance. The Siamese networks were trained to predict unidirectional preferences i.e. was an image had Liked (but not necessarily with reciprocity) and was an image had Disliked.

Following convergence of the Siamese network, the LSTM network was trained based on the preference histories of users to predict Matches and Like-Dislike Tuples. This dataset was separate from the dataset used to train the Siamese network. Histories were capped at one year, because of concerns that changes to the service’s design and search algorithm over time might have an effect on user preferences. They were also capped to a maximum of

preferences, because initial experiments showed that longer sequences did not improve accuracy, and because some outlier users express thousands of preferences, which results in an unreasonable increase in training and prediction times.

Finally, the LSTM was validated on a separate dataset of Matches and Like-Dislike Tuples. There was no overlap in preference expression between the three datasets. There was overlap between the users contained in these datasets, but as in a real-world situation the system would be trained based on users on the service and subsequently used to make predictions for those users in addition to new users, testing in this way is valid and representative.

4. Results & Evaluation

In this section, we present the results for TIRR compared to the current state of the art in both content-based and collaborative filtering.

4.1. Evaluation Metrics for Reciprocal Environments

RRSs generally use similar metrics for success as standard machine learning models: evaluation via the

ROC Curve and the related metrics Precision, Recall and their combined metric F1 Score. However, because of the requirement for reciprocal success, their definitions are a little different in this scenario, so we present them here as defined by Pizzato et al. in (Pizzato et al., 2013). In the following equations, is the set of recommended users, is the set of recommended users who matched with each other, and is the set of recommended users where at least one expressed negative preference for the other.


As the models predict a value between and that represents the strength of the mutual preference relation, the ROC curves in this section are drawn by moving a threshold between these two values and plotting the true and false positive rates.

4.2. Siamese Network Results

We first describe the results generated by the pretrained Siamese network. This network provides a basis for the main user preference prediction model, as the output embeddings from this network provide an input for the RNN.

[Siamese Network ROC Curve]The ROC curve for the Siamese network pretrained to differentiate between two Liked images and a Liked and a Disliked image.

Figure 4. Pretrained Siamese Network ROC curve. This forms a building block of both ImRec and our proposed TIRR model.

The ROC curve for the Siamese network is displayed in Figure 4 as the blue line (the green dotted line is the - reference line). This curve was drawn based on a test set of samples not in the original training dataset. In general the network is capable of differentiating between a single Liked image and a single Disliked image based on an anchor image. The curve itself is slightly erratic, but this is not entirely unexpected: a single anchor image is unlikely to enough information to differentiate between positive and negative preference.

As the model by itself is not directly the source of the recommendations, it would not be appropriate to compare it to other recommender systems. For this reason, we present this model without a point of comparison. However, in Section 4.3 we will compare two approaches that use this model as a building block for reciprocal recommendation.

[Siamese Network Embeddings]Embeddings for the pretrained Siamese Network, for 500 random positive and negative samples.

Figure 5. UMAP embeddings of the pretrained Siamese network forming part of TIRR. The red points represent Liked images while black points represent Disliked images.

As the embeddings in 128-dimensional space from the output of the Siamese network form the input of the RNN. It is therefore useful to visualise these embeddings. In order to do this, we use Uniform Manifold Approximation and Projection for Dimensionality Reduction (UMAP) to reduce the 128-dimensional vectors to two-dimensional vectors for visualisation. This visualisation is displayed in Figure 5.

In this visualisation, the black datapoints represent Disliked images and the red datapoints represent Liked images. It is clear from the visualisation that the embeddings are separable to some extent, even in two dimensions. The anomalous black cluster in the top right of the image represents heavily distorted or very poor quality images, or images misclassified by the face detection algorithm (i.e. images that do not contain a face). These tend to be almost universally Disliked.

4.3. TIRR vs Content-Based Algorithms

As described in Section 1, recommender systems are divided into content-based algorithms and collaborative filtering algorithms.

[Content-Based Algorithm ROC Curves]ROC Curves for the content-based RRS Algorithms RECON, ImRec and TIRR.

Figure 6. Content Based Algorithm ROC Curves demonstrating the significant improvement in AUC with TIRR.

Figure 6 displays a comparison of TIRR with other content-based algorithms. As described in Section 2, RECON (Pizzato et al., 2010) is a an algorithm that identifies a user’s implicit preferences for categorical data, and ImRec (Neve and McConville, 2020)

is an algorithm that uses images to make predictions without the RNN-based component of TIRR, instead using a Random Forest and aggregation function.

RECON struggled to generate effective recommendations on our dataset. As RECON was also evaluated on a private dataset, without comparing the datasets directly, it is difficult to establish why this is, but one possibility is that modern dating services place a higher emphasis on visual content than services did ten years ago, at the time RECON was developed. ImRec performs better than RECON, but performs significantly worse than our proposed method TIRR. The key difference between TIRR and ImRec is the RNN-based process that allows TIRR to interpret historical and time-series data in order to make predictions, whereas ImRec treats user preferences in a global way, with no ability to capture individual users preferences.

Algorithm F1 Score Precision Recall AUC
RECON 0.61 0.56 0.68 0.51
ImRec 0.71 0.60 0.88 0.65
TIRR 0.87 0.86 0.88 0.91
Table 3. Results based on best F1 score for content-based algorithms. Here we can see that our proposed method TIRR significantly outperforms the other approaches.

The AUC and maximum F1 score for the three algorithms is described in Table 3. The scores are based on the threshold that gave the best F1 score in the training set, used in the test set. We consider that this significant improvement of our proposed method TIRR derives from the ability of our algorithm to interpret a user’s history of preferences for images over time, and take account of a user’s potentially shifting preferences, whereas Imrec provides a global model across all users without distinguishing more than one preference per user at a time, and RECON doesn’t make use of images at all.

The table also lists the precision and recall at the points where the best F1 score was recorded. While F1 is an excellent measure of overall performance of an algorithm, the individual precision and recall numbers and their balance are particularly important in RS research because precision tends to influence the trust users have in the RS, which in turn affects their use of it

(Herlocker et al., 2004). It is noteworthy that while ImRec was relatively successful at predicting which image a user would like, its precision was relatively low in comparison with other algorithms, whereas TIRR has very high precision, and is therefore more likely to be trusted and used.

4.4. TIRR vs Collaborative Filtering

In addition to comparing TIRR to other content-based RRSs, we also ran tests comparing it to the current best-in-class collaborative filtering algorithms, RCF and LFRR.

[Collaborative Filtering Algorithm ROC Curves]ROC Curves for the collaborative filtering RRS Algorithms LFRR, RCF and TIRR.

Figure 7. ROC Curves showing the performance of the content-based TIRR against the current state of the art collaborative filtering algorithm LFRR.


is a collaborative filtering algorithm based on latent factor models trained by stochastic gradient descent, and

RCF is a neighbourhood-based collaborative filtering algorithm. TIRR outperformed both of these algorithms on our test dataset, although by a slimmer margin than its lead on current content-based filtering algorithms. Nonetheless, this represents a significant advancement in the field of reciprocal recommendation, as in services where images prominently used, our algorithm is likely to be more effective than current collaborative filtering methods.

Algorithm F1 Score Precision Recall AUC
LFRR 0.86 0.86 0.85 0.90
TIRR 0.87 0.86 0.88 0.91
Table 4. Results based on best F1 score for the TIRR and LFRR algorithms. Here we can see that the content-based TIRR improves upon the collaborative filtering-based LFRR.

Table 4 lists the peak performance metrics for the two algorithms. In addition to the higher F1 score, TIRR also has a comparable balance of precision and recall to LFRR.

5. Conclusions

In this paper, we presented a novel algorithm to interpret user preference history using only photos and make predictions about future preferences for reciprocal recommendation. We demonstrated that this can effectively be used as a predictor for the probability of mutual preference between two users, and therefore forms the basis for an effective recommender system. We also demonstrated that our algorithm outperforms state of the art reciprocal recommender systems in offline tests using a large dataset from a dating service with real users.

This research demonstrates the value of including historical preference in reciprocal recommendation. Previous research has demonstrated the value of using RNNs to interpret sequences of preferences in user-item recommendation, but this is the first time it has been used in reciprocal recommendation. The improvement over a similar algorithm that does not use sequences of data shows the value of this approach.

Finally, the model itself represents a significant advance in the field of content-based reciprocal recommendation. The model’s success also allows us to draw interesting conclusions about the significance of photos in online dating, given their strong predictive power in this dataset. It also provides interesting insight into the potential power of content-based algorithms in online dating: while in many fields, they are outperformed by collaborative filtering, the algorithm presented in this paper performs better on evaluation metrics than the current state-of-the-art collaborative filtering algorithm.


  • C. Aggarwal (2016) Recommender systems: the textbook. 1st edition, Springer, London, England. External Links: ISBN 3319296574 Cited by: §1.
  • T. Bansal, D. Belanger, and A. McCallum (2016) Ask the gru: multi-task learning for deep text recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Recsys 2016, New York, NY, pp. 107–114. Note: External Links: Document, Link Cited by: §2.2.
  • R. Bell and Y. Koren (2007) Lessons from the netflix prize challenge. ACM SIGKDD Explorations Newsletter - Special issue on visual analytics 9 (2), pp. 75–79. Note: External Links: Link Cited by: §1.
  • L. Bertinetto, J. Valmadre, J. Henriques, A. Vedaldi, and P. Torr (2016) Fully-convolutional siamese networks for object tracking. In

    Proceedings of the 2016 European Conference on Computer Vision

    ECCV 2016, , pp. 850–865. Note: External Links: Document, Link Cited by: §3.3.1.
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) ImageNet: a large-scale hierarchical image database. In

    Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition

    CCVPR 2009, Miami, FL, pp. 248–255. Note: External Links: Document, Link Cited by: §2.2.
  • J. He and W. Chu (2010) A social network-based recommender system (snrs). Data Mining for Social Network Data 12 (), pp. 47–74. Note: External Links: Link Cited by: §2.1.
  • J. Herlocker, J. A. Konstan, L. Terveen, and J. Riedl (2004) Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22 (1), pp. . Note: External Links: Link Cited by: §4.3.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735––1780. Note: External Links: Link Cited by: §2.3.
  • S. Hochreiter (1998) The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6 (2), pp. 107–116. Note: External Links: Link Cited by: §2.3.
  • A. Kleinerman, A. Rosenfeld, F. Ricci, and S. Kraus (2018) Optimally balancing receiver and recommended users’ importance in reciprocal recommender systems. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, New York, NY, pp. 131–139. External Links: Document, Link Cited by: §1, §2.1.
  • G. Koch, R. Zemel, and R. Salakhutdinov (2015) Siamese neural networks for one-shot image recognition. In Proceedings of the 2015 ICML Deep Learning workshop, , , pp. . Note: External Links: Document, Link Cited by: §3.3.
  • X. N. Lam, T. Vu, T. D. Le, and A. D. Duong (2008) Addressing cold-start problem in recommendation systems. In Proceedings of the 2nd international conference on Ubiquitous information management and communication, ICUIMC ’08, New York, NY, pp. 208–211. External Links: Document, Link Cited by: §1.
  • K. Lang (1995) NewsWeeder: learning to filter netnews. In Proceedings 12th International Conference on Machine Learning,, ICML 1995, , pp. 331–339. Note: External Links: Document, Link Cited by: §2.2.
  • C. Lei, D. Liu, W. Li, Z. Zha, and H. Li (2016) Comparative deep learning of hybrid representations for image recommendations. In The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, , pp. 2545–2553. Note: External Links: Document, Link Cited by: §2.2.
  • Y. Lewenberg, Y. Bachrach, S. Shankar, and A. Criminisi (2016)

    Predicting personal traits from facial images using convolutional neural networks augmented with facial landmark information


    Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

    AAAI 2016, , pp. 4365–4366. Note: External Links: Document, Link Cited by: §3.5.
  • J. Lin, K. Sugiyama, M. Kan, and T. Chua (2013) Addressing cold-start in app recommendation: latent user models constructed from twitter followers. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’13, New York, NY, pp. 283–292. External Links: Document, Link Cited by: §1.
  • Q. Liu, S. Wu, and L. Wang (2017) DeepStyle: learning user preferences for visual recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, New York, NY, pp. 841–844. Note: External Links: Document, Link Cited by: §2.2.
  • J. Neve and R. McConville (2020) ImRec: learning reciprocal preferences using images. In Proceedings of the Fourteenth ACM Conference on Recommender Systems, Recsys ’2020, New York, NY, pp. 170–179. Note: External Links: Document, Link Cited by: §1, §2.1, §3.3, §3.5, §4.3.
  • O. Parkhi, A. Vedaldi, and A. Zisserman (2015) Deep face recognition. BMVA (), pp. 1–12. Note: External Links: Link Cited by: §3.3.2.
  • L. Pizzato, T. Rej, J. Akehurst, I. Koprinska, K. Yacef, and J. Kay (2013) Recommending people to people: the nature of reciprocal recommenders with a case study in online dating. User Model User-Adap Inter 23 (5), pp. 447–488. Note: External Links: Link Cited by: §1, §4.1.
  • L. Pizzato, T. Rej, T. Chung, I. Koprinska, and J. Kay (2010) RECON: a reciprocal recommender for online dating. In Proceedings of the fourth ACM conference on Recommender systems, RecSys ’10, New York, NY, pp. 207–214. External Links: Document, Link Cited by: §2.1, §4.3.
  • Z. Siting, H. Wenxing, Z. Ning, and Y. Fan (2012) Job recommender systems: a survey. In Proceedings of the 7th International Conference on Computer Science & Education, ICCSE ’12, Melbourne, VIC, Australia, pp. 920–924. External Links: Document, Link Cited by: §2.1.
  • B. Twardowski (2016) Modelling contextual information in session-aware recommender systems with neural networks. In Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, New York, NY, pp. 273–276. External Links: Document, Link Cited by: §2.3.
  • R. van Meteren and M. van Someren (2000) Using content-based filtering for recommendation. In Proceedings of the ECML 2000 Workshop: Maching Learning in Information Age,, ECML 2000, , pp. 47–56. Note: External Links: Document, Link Cited by: §2.2.
  • O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra (2016) Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 (), pp. . Note: External Links: Link Cited by: §3.3.1.
  • C. Wu, A. Ahmed, A. Beutel, A. Smola, and H. Jing (2017) Recurrent recommender networks. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, New York, NY, pp. 495–503. External Links: Document, Link Cited by: §2.3.
  • P. Xia, B. Liu, Y. Sun, and C. Chen (2015) Reciprocal recommendation system for online dating. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’15, New York, NY, pp. 234–241. Cited by: §2.1.