The realm of human-computer interaction has vastly expanded with the technologies for immersive experience making great strides. Moreover, there has been a huge shift in the media consumption, with a large population shifting online for personalized consumption of media content, like video or music. Hence, there is a growing need for innovation in design of human-computer interaction techniques to provide a seamless immersive experience for media consumption [2, 3].
A challenging design problem in this context is social/collaborative viewing, that aims to allow remotely located users to enjoy shared viewing of media content in a way that they feel being seated together, like conventional group viewing. The impact of group viewing on improving viewing experience has been well studied in television research [4, 5]. The work by  and  formalized the concept of remote social viewing.  designed a SocialTV experiment to investigate how groups behave when watching a program together.  built CollaboraTV, which incorporated user collaboration while watching television through messaging and shared interest profiles. In a large scale study of online sports viewing experience, Mo et al.  demonstrated the effectiveness of sharing thoughts and information, and desire to be belonging to a group for improving the watching experience. Further, McGill et al.  built a synchronous shared at-a-distance smart TV system, and analyzed the adoption of the system and the nature of communication. They also built a prototype in VR for shared viewing and showed its effectiveness in enhancing the viewing experience. Commercially, rabb.it and togethertube.com support synchronized viewing of broadcasted content. Otherwise too, most of the online video platforms support some form of social interaction. For example, Facebook Live allows user to ”like” a live video, whereas Hulu enables users to edit and share video clips with other. The social functionality also helps users in content discovery on the platforms.
However, the design of an interface, which could meaningfully enable remote viewers to explore and decide video content they would like to watch together, has not been looked into extensively. While previous work enable remote users in a collaborative viewing session to communicate through chat, voice or video, there has been little focus on developing interfaces which would enhance content discovery experience in such scenarios. To this end, we present VoCoG, an intelligent system for voice-based collaborative group-viewing. The proposed system attempts to address various challenges in achieving a seamless content discovery experience in collaborative viewing settings.
Firstly, VoCoG incorporates voice as a medium of interaction between users. This is non-intrusive as the users are not required to type or click, and is particularly suited for immersive interfaces [9, 10]
. Further, natural user conversations allow VoCoG to extract rich user feedback (like movie, star affinity, expressed sentiments, etc.) using advanced natural language processing techniques. Moreover, even though, popular personal assistants like Siri or Alexa are built out of voice-based interfaces, we believe that there has been limited work in voice-driven feedback-based recommendations in multi-user interaction systems.
VoCoG deploys an online recommendation algorithm, which could efficiently update user preferences based upon the complex feedback from conversations. Conversation  and critique-based  and online [13, 14, 15] recommendation methods have started gaining attention recently. We exploit insights from the recent work to build an online recommendation system, which computes the recommended movies for each individual, based upon the feedback from his/her conversations.
Finally, the challenge is to how to combine the recommendations for each individual into the final watch list for the group as a whole. VoCoG uses the concepts for group behavior modeling in social network , as well as for group-based recommendations [17, 18] and takes into account for user-user agreements/disagreements, individual affinities towards movies, shows or stars as well as user behavioral traits, to arrive at meaningful recommendations for the group to watch. VoCoG can also detect if the group has reached a consensus on watching a video or not.
The paper is divided into six sections. Section II describes the related work in the area. The details of the design of the proposed interface, VoCoG are in Section III, while Section IV describes the final prototype. A comprehensive user evaluation of the system is discussed in Section V followed by the conclusions in Section VI.
Ii Related Work
In this section, we describe the related work in the design of multi-user interface, recommendations, conversation analysis and multi-user interaction modeling.
Multi-user Interfaces: There has been extensive research in the design of multi-user interfaces [19, 20]. Virtual presence  as well as design of virtual world using avatars  have been studied. The use of voice for human-machine interaction [21, 22] as well as immersive media  has been studied. There has been also related work in the domain of design of interactive shared viewing experience [6, 3, 24]. However, these approaches provide restrictive interaction mechanism between through chats or avatars. The proposed approach however is designed to account for rich conversation between users, and provide a seamless non-intrusive experience to the users.
Recommendations: A comprehensive survey of recommendation algorithms has been done by . There have been work for group-based recommendations [26, 17, 18]. Conversation-based [11, 27] as critique-based recommendations  have been studied. Online recommendations techniques like bayesian , bandits , latent analysis  have been proposed recently. Our approach is motivated by  to use user-user clustering for updating individual preferences. This allows VoCoG to account for complex updates from user conversation, while outputting relevant recommendations.
Conversation Analysis: There has been extensive work in analyzing natural user conversation. The existing methods describe method for language parsing [29, 30], text tagging  and entity recognition 
. There has also been considerable work in sentiment analysis[33, 34] and intention mining [35, 36] from text. Further, Mikolov et al  have looked into robust semantic representation of words. Commercially, applications like luis.ai provide services for entity and intent extraction. VoCoG requires comprehensive parsing of conversation data, including entity extraction, sentiment analysis as well as parsing direct/indirect references in the conversation sequence. Prior work do not address this sufficiently. Hence, we build upon the existing work to analyze user conversation, and extract the required information.
User-User Interaction Modeling: There has been work in group behavior modeling in social networks . However, the area of small group conversation is relatively unexplored. Prior work has looked the problems of conflict resolution , identifying speaker  and addressee  and modeling face-to-face conversations . We address the challenge of conflict modeling in multi-user conversations through a novel user-user graph.
Iii System Implementation
In this section, we will describe the modeling for VoCoG, the proposed intelligent assistant interface. The workflow for the approach is described in Figure 1. The essential components of the system include an online recommendation algorithm, a module to understand the voice conversation between users and inter-user interaction modeling. Each of the these modules are described in details below.
Iii-a Recommendation System
VoCoG combines a novel incremental collaborative filtering as well as content filtering-based techniques to arrive at a robust ranking of show preferences for individual users. Thereafter, algorithm to update ratings based upon the user conversations is discussed. Note that how the group recommendations are arrived at, will be described later in Section III-C.
Iii-A1 Movie Database
We used MovieLens  dataset for training the recommendation system. The dataset has about million ratings for K movies by around million users. We pruned out users with rating less than movies and movies having less than ratings, leaving around K users, K movies and M ratings. This was done to reduce the movie search space during updating VoCoG recommendations. Moreover, relatively unknown movies, not rated by enough users, would not generated conversation among users. The dataset was further enriched, through crawling the web, with the genre terms, actors and directors for each of the final K movies. This enriched data was used for training the VoCoG recommendation models.
Iii-A2 Collaborative Filtering
We chose to deploy a simple probabilistic latent analysis (probLat)-based method for collaborative filtering, but describe a method inspired by  to efficiently incorporate complex feedback from user conversations (Section III-A4). For a user , his rating , for a movie , was modeled as a function of and movie . The latent variable was introduced to decouple probabilistic dependency between users and the movies ratings. Different user interest groups were captured in and hence, the rating of a movie for a user was calculated as:
Each user belonged to a cluster
with probabilityand distribution of ratings across clusters was given by . Distribution, , is modeled as:
was used. The E-step calculated the posterior probability of, given the user , movie and rating as:
Once the posterior probabilities were computed, the M-step computed probability of user belonging to different clusters and parameters for the distributions:
Log-likelihood was used to measure convergence of the algorithm.The algorithm was terminated when change in the log-likelihood went below of the log-likelihood at that step.
Iii-A3 Content Filtering
Content filtering was done through the nearest neighbor approach. Based on the movie rating, given by user , scores for a genre and a star were calculated as follows:
where is set of movies containing genre and is set of movies in which star has acted. Both and were normalized with respect to the list of genres and stars respectively. Content-based score of a movie for the user was now calculated by averaging the scores for the genres in the movie and the scores for the stars present in the movie.
Iii-A4 Incorporating user preferences
Here we describe the model for updating recommendations based upon conversation-based feedback. The model can account for feedback for movies, stars or genres from the user.
1. Ranked Movie List: Once the probLat model was trained, list of movies was ranked in the descending order of for each user interest group, . The mean rating
and varianceof movie, varied with cluster . Hence, as shown in the Figure 2, each cluster has different movies at the top. The top movies from each cluster were used for the next step.
2. Calculating Genre Scores: For each cluster , scores for different genre terms were calculated by averaging the predicted rating for movies containing the genre term, present in the ranked list. The list of genres was created from the tagged MovieLens data. The genre terms were then ranked in the descending order of scores for each cluster to get a cluster specific ranked list, . Figure 3 shows the variation of genre scores across clusters. The cluster-specific genre scoring was used for updating user preferences.
3. Incorporating feedback using : We exploited the difference in movie or genre preference across clusters to incorporate conversation feedback, through modifying interest group probability, . Different distributions of led to generation of different movies as recommendations from the model. We extracted keywords like genre, movie names, stars from the user conversations, along with attached sentiment as described in Section III-B. Here, we describe how to update using the genre preference of the user, but it can be extended to movies or stars as well.
Let be (genre term, sentiment value) pair extracted for a particular user conversation. For example, for conversation like ”Right now, I am in mood for action movies”, the pair would be (action, ). Then, is updated as follows:
The value of the factor lied between , and the exponential ensured that updates can be done serially. The update worked as follows: if a genre, like action, ranked higher in clusters , and than others and user expresses a positive sentiment about it, then the probability of user being in cluster , and will be increased. Higher the rank of action in a cluster, greater will be the update factor for the cluster. Similar equations were used to update using movie and stars keywords. This was repeated for each extracted keyword-sentiment pair. is normalized after all the terms have been processed.
4. Updating content filtering: We updated content based preference based on the input from conversation for genre preference as follows. Other terms can be similarly taken care of.
Iii-A5 Implementation and Results
For implementation, the number of clusters in probLat model was taken to be and the hyperparameter was set to empirically. The final recommendations were arrived by a simple average of probLat and content filtering scores. The probModel was also compared with different methods in literature (Table I). For the testing, rating of a random movie among the movies rated by each user was removed. The model was then trained on the reduced data set and tested on the movies removed from the dataset. It can be seen that the probLat method shows comparable performance with some of the previous methods.
Iii-B Natural Language Understanding
It is challenging to process human language, more so when the people are conversing. For incorporating non-intrusive feedback in VoCoG, it was required to design a workflow which could update the viewer preferences solely based on their conversation. For this, we broke down the entire complex conversation to simpler keyword - sentiment pairs, which could then be used to update user preferences as discussed in Section III-A. The keywords included movie mentions, named entities like stars or directors and mention of genre terms. The process is discussed in described in details below.
Iii-B1 Speech-to-text Conversion
The user conversations were first converted to text using existing APIs . Though the accuracy of speech-to-text APIs have increased considerably, there are accompanying challenges in further processing as described below. The conversations were analyzed sentence-wise.
Iii-B2 Conversation Database
A database of user conversations (MovieForum) was curated from a movie discussion forum (movieforums.com). The dataset created had different threads and an average of comments in each thread and each thread involved users on an average. The conversations were manually labeled with movie, genre and actor mentions. We also further tagged the conversation with the mentions of user, who are involved in the discussion. These tagging can be either direct mentions of a user/star or indirectly through the use of pronouns, etc. Each of the conversations were further labeled manually with a sentiment value (, or ).
For the purpose of evaluating different tagging and sentiment detection approaches, we used train/test split of the corresponding dataset. The hyper-parameters were trained using -fold cross validation scheme. In the cases where no training was required, full dataset was used for evaluation.
Iii-B3 Sentence-Level Keyword Extraction
We describe below the proposed methods for extraction of different types of keywords, like genre, movie, actor.
1. Genre Terms: Most of the genre terms in the curated MovieLens database from Section III-A
(like action, drama) were single words. So, the genre terms were extracted using simple word search. The look-up list of genres was compiled from the movie database. The method gave an F-score of aroundon the MovieForum database.
2. Movies Terms: Movie names were more complex like ”One Flew over Cuckoo’s nest”. Also, there was a comprehensive movie list to search for (around 10k for our MovieLens dataset). Hence, a two-step process was used for extraction:
a. Movie Tagging: Alternate methods of tagging potential movie mentions were compared for this purpose.
Baseline approach: The existing, state-of-the-art POS tagging method  was used to detect nouns from the sentences, and then use the detected parts as the tagging for movie mentions.
Learning-based approach: The training data from the MovieForum data was IOB-tagged 
around the word, position of the word (e.g., first or last word), a vector representation of word provided by word2vec model and if the word is among top most frequent word in movie name list (MovieLens data). The output of this classifier was smoothened using an HMM-based sequence analyzer, trained on the MovieForum data with I,O,B as hidden states. This was done to weed out some of the unlikely classifications done by the classifier. The method overcame challenges of unreliable capitalization and could detect long names as well. The performance of the tagging approaches are summarized in Table II.
b. Movie Name Search: The tagged output was then matched with the movie names in the MovieLens database using a string search, based on the Levenshtien distance measure. The top ranked movies were then re-ranked on the basis of the context of the conversation. Context included genre or actor detected in the previous conversation. The scores for the movies related to these mentions were increased, and then ranked accordingly. Table III shows the overall performance of the movie extractor.
3. Movie Stars Terms: Movie stars were detected following the method used for movies. The stars tagging method was compared to the Stanford name-entity tagger . It can be seen in Table IV that the proposed approach outperforms the recall of the Stanford tagger, with only a small decrease in precision.
|Stanford Name Tagger |
4. Indirect references: References to a movie or a star using determiners like it or him/her were attached to the last mention of a movie or a star, detected from the conversation.
Iii-B4 Sentence-Level Sentiment Analysis
The existing sentiment analysis methods were found insufficient for our case. They did not classify sentiment for intent well, e.g. ”We should be watching Inception” was classified as a neutral sentiment. They also did not take care of sentences framed as questions, e.g. ”Why shouldn’t we watch inception?”. The baseline approaches assigned negative sentiment to the sentence. There were cases like the negative sentiment being assigned due to the movie name itself, e.g. ”Let us watch Wrong Turn”. Hence, a modified sentiment analyzer was trained. Features included - 1. if the sentence is a question or not, 2. presence of words indicating intention , positive or negative words , 3. average representation of sentiment and intention words given by word2vec model  and 4. scores of existing sentiment classifiers [33, 34]. Also to avoid the problem of a keyword (movie or actor) altering the sentiment, the positive or negative keywords were removed. The performance comparison of the developed sentiment analyzer for the MovieForum data is provided in Table V.
Iii-B5 Keyword-Sentiment Pairing
The last step was to attach sentiment to the extracted keywords. The direct approach was pair the sentence sentiment with the corresponding keywords. However, in conversations, people can mention multiple movies in a sentence, with contrasting sentiment. Hence, we used a set of linguistic-based rules to improve the pairing, as described below.
The sentence was parsed using a constituency parser  and a set of rules were created to attach the sentiment to the keyword.
A total of 20 rules were created for comparative words like ”but”, ”and”, ”or”, ”yet”, ”although”, ”both … and”, ”instead”, ”as … as”, ”than”. E.g. the rule for ”but” was: In the constituency parse tree, if the parent of ”but” conjunction is a noun phrase, attach the reverse sentiment of the part containing the verb phrase to the part which does not contain the verb phrase.
The final results for keyword-sentiment pairing are shown in TableVI.
Iii-C Inter-User Influence Modeling
In this section we describe the modeling of inter-user influence from conversation. We explain how the ratings of users vary due to agreement or conflict during the conversation. We create a user-user graph, based upon related work in social networks . The algorithm assumes the knowledge of the user names of people present in the conversation, and takes the keyword-sentiment pairs, extracted in Section III-B, as the input.
1. Dependency Parsing: First parts of speech tags (POS tags) were detected using dependency parsing , and which were then used to detect the subject of conversation. The keywords like movie, actors and the corresponding sentiment were extracted as explained in Section III-B. The detected (subject, keyword, sentiment) tuples were outputted.
2. Keyword pruning: In conversation, there would be cases in which references to a movie or star can not be linked to another user. An example would be ”I want to watch The Prestige”. ”I-The Prestige” would be the user-keyword pair obtained from this, but in case The Prestige was not referred to before by any user, it would not convey agreement or disagreement with any other user. These keywords were pruned out.
3. Inter-User Sentiment: We now find the expressed sentiment for interaction between users. There are two possible cases here:
If the subject was not detected, the user who last used the particular keyword was taken as the referred user. The agreement or disagreement (i.e. the sentiment of interaction) is determined by whether their expressed sentiments matched or not.
If the keywords and subject were not detected from the sentence, then the following method was used. We assumed that people talking about what someone else has talked about, tend to bring up similar topics. Hence, we find the overlap of noun words between the sentences of the user as well as recent sentences spoken before. The user who last spoke the maximum overlapping sentence was assigned as the user referred. In case there was no overlap, the speaker of the sentence previous to the current one was taken to be the referred user.
5. User-User Influence Graph: The sentiment for ordered user pair from conversation was used to update the graph. The weight was assigned to be the extracted sentiment. Note that the graph is not symmetrical, as user agreeing or disagreeing with user changes , but not . For multiple conversations, the sentiment for each one was added to the corresponding weight value.
6. User Rating Matrix update: The rating, for a movie, by user, was updated using the user-user influence graph as follows:
where is a regularization parameter. In our tests, was set to be the number of users. The update brought the rating of users in agreement closer together, so as to arrive at consensus quicker. Negative weight edges in the graph were not used in the update. However, the negative weights were maintained so that users, who were in prior disagreement, must come to agreement before the correspond edge weight to be taken into account.
7. Limitations: Our subject analysis method may fail in case of complex movie names, for example ”Who Is Harry Kellerman and Why Is He Saying Those Terrible Things About Me?”. If this movie is part of a sentence, naturally ”He” will be detected as a subject, as well as a pronoun, and this will lead to a result that suggests the presence of an inter-user interaction, although there may not be.
The user influence modeling system showed good performance on the MovieForum dataset. For a set of about users in the database with agreements, disagreements or neutral exchanges between any two users, the algorithm had and .
|Questions||Strongly Disagree(1)||Disagree(2)||Neutral(3)||Agree(4)||Strongly Agree(5)||p-value|
|Overall VoCoG provided good experience|
|Final recommendations were good|
|Updates to recommendations were appropriate|
|System took care of your preferences|
|Response time of the system was fast enough|
|System was non-intrusive|
Iii-D Group Consensus Function
Finally, to arrive at the recommendations for the whole group, a group consensus function was used. A variety of group consensus functions like maximum pleasure, average satisfaction , least misery, etc. have been explored in the group recommendation literature [17, 18].
1. Average without misery function: We used ”average without misery” function for our case. This function first eliminates movies on the basis of ”misery”, i.e. if any user has rated a movie below a threshold, then the movie is eliminated. For the surviving movies, average rating is computed for each movie, based on which the top movies are decided. In our experiments, the threshold for misery was decided empirically. The final decision was taken using a weighted average rating computation. The weights were decided upon by user behavior in the conversation, like sentences spoken and users influenced.
2. Consensus Detection: The system decided whether the users have reached a consensus by comparing the top rated movie with the lower ranked ones. If the overall rating of the movie for the group exceeded the next movie in the list by a specified threshold (set to times in our experiment), then the consensus was deemed to have been reached.
Iv Working Prototype
Figure 5 shows snapshots of a working prototype of the system. As shown in the figure, first the users are asked to login into the system. VoCoG waits till all the users have joined the session, to enable a synchronized experience. Once all the users have logged in, VoCoG generates initial recommendations based on the users’ previous histories, and outputs a voice message as well as a text on the screen. In the prototype, the number of movie options shown was kept at so as to generate conversation about each option.
The users can then converse among themselves using ”Record” and ”Send” buttons. This is similar to the interfaces in many voice-based assistant systems. The interface also helps in sequencing the user conversation seamlessly. The sentences spoken by the users are sent to other users and also the back-end server in real time. The users are represented by avatars, and as they speak, the corresponding avatars light up.
After a fixed time interval, VoCoG refreshes the recommendations and an ”Updating recommendations” message is played as well as shown on the screen. The next set of recommendations are then displayed, which is based on processing the user conversation following the method described in Section III. The process continues until the consensus is detected, as shown in the figure, where users have converged upon the movie ”The Dead Zone”. The movie is then played when all the users click the video icon. In case the consensus is not reached after five rounds of updates, the top rated movie is shown as the final output.
V System Evaluation
For a comprehensive evaluation of the proposed system, VoCoG was made to interact with users. About people ( females) were involved in a survey to judge the performance of VoCoG. The participants in the survey were drawn from the age group of , and had varied movie watching preferences. The system was then measured on different parameters following the methods for user-centric recommender system evaluation [46, 47].
V-a Survey Design
The survey was conducted as follows:
Phase 1: In the first phase the participants were asked to rate a set of popular movies from the MovieLens database. Movies to be rated were chosen to be representative of different genres. These ratings were used to train the model combined with the movie dataset (Section III-A1). Each subject rated movies on an average. This gave data to VoCoG to create an initial profile of the subject. The subjects were also asked if they were frequent movie watcher (more than 2 times a week), and if they are usually active in conversation.
Phase 2: In the second phase these subjects were grouped in groups of , and groups were formed. These groups were then called upon to interact with the system. Thereafter, they rated the system on six parameters, as shown in Table VII, on a Likert scale of , where represents the worst rating and the best. Arrangements were made to have an environment identical to the one which the viewers would experience in a remote collaborative viewing implementation. All the three viewers were made to sit in different rooms and could interacting only through the system. VoCoG listened to their conversations, and updated the recommendations periodically.
Questionnaire: After the interaction, the participants were made to fill a questionnaire. Here, they rated the different aspects of interaction with the system (shown in Table VII) on a Likert scale of (Strongly disagree) - (Strongly Agree).
V-B Survey Analysis
Here, we analyze different aspects of interaction of participants with VoCoG.
1. Questionnaire Response: The summary of the questionnaire responses is shown in Table VII. As can be seen, VoCoG received strongly positive response (more than participants agreed or strongly agreed) for all the parameters (recommendation quality, interactivity, non-intrusiveness) except response time. This is because VoCoG searches through a large movie dataset for recommendations. We hope to improve the system response time in future implementations.
2. Conversation analysis: Table VIII shows the statistics of average mentions of different entities in the survey. It can be seen that the participants conversed the most about movies, followed by genres and actors/stars. There were also considerable agreements/disagreements between the participants while interacting. Overall they participated well in the survey, with number of sentences spoken per update being around .
3. Group Recommendation response: Table VIII also shows the statistics for user responses to the recommendations provided by VoCoG. As the users can provide feedback through conversation, different aspects of the response are required to be captured (different from click-based systems). As shown in the Table VIII, about movie mentions out of the total average mentions per update cycle were regarding the recommended movies. Overall on an average out of movies were unique per update. This shows that while the users discussed the recommended movies, they also looked out for diverse recommendations. The statistics for genre term mentions ( out of on an average were from recommended list) indicate that the users expressed more conveniently in terms of their genre choices. Actors and directors were mentioned only few times. Also, VoCoG was able to reach consensus for only out of groups. This calls for a need for better modeling for group consensus and understanding user dynamics. We intend to study these as future directions to the work.
4. Variations due to user differences: We also studied how the nature of participants, viz. frequent/non-frequent and active/non-active (as collected in the Phase 1) affected their interaction with the system. As shown in Table IX, frequent and active participants rated VoCoG highly on overall experience, but there were some lower ratings by non-frequent and non-active participants.
|Entities||Avg. number per update|
|Unique movies recommended|
|Recommended Movies mentions|
|Recommended Genre mentions|
|Recommended Actors mentions|
Vi Conclusion and Future Directions
In this paper, we have described framework for VoCoG, an intelligent, non-intrusive interface for collaborative group-viewing experience. We have described the technology behind each components of VoCoG, viz. an online recommendation system, a robust conversation analyzer and a user-user interaction modeling algorithm.
In the future, we plan to optimize the system for an efficient response time. We also need to expand the scope of the algorithms to update user preferences beyond the session, for longer-term viewing experience optimization and incorporate better features for user dynamics and consensus modeling. We further plan to incorporate a richer GUI, using avatars and augmented sound to improve the experience.
-  J. Blascovich and J. Bailenson, Infinite reality: Avatars, eternal life, new worlds, and the dawn of the virtual revolution. William Morrow & Co, 2011.
-  P. Cesar and K. Chorianopoulos, “The evolution of tv systems, content, and users toward interactivity,” Foundations and Trends in Human-Computer Interaction, vol. 2, no. 4, pp. 373–95, Apr 2009.
-  M. Nathan, C. Harrison, S. Yarosh, L. Terveen, L. Stead, and B. Amento, “Collaboratv: making television viewing social again,” in Proceedings of the 1st international conference on Designing interactive user experiences for TV and video. ACM, 2008, pp. 85–94.
-  J. Lull, “The social uses of television,” Human communication research, vol. 6, no. 3, pp. 197–209, 1980.
-  J. G. Webster and J. J. Wakshlag, “The impact of group viewing on patterns of television program choice,” Journal of Broadcasting & Electronic Media, vol. 26, no. 1, pp. 445–455, 1982.
-  N. Ducheneaut, R. J. Moore, L. Oehlberg, J. D. Thornton, and E. Nickell, “Social tv: Designing for distributed, sociable television viewing,” Intl. Journal of Human-Computer Interaction, vol. 24, no. 2, pp. 136–154, 2008.
-  M. Ko, S. Choi, J. Lee, U. Lee, and A. Segev, “Understanding mass interactions in online sports viewing: Chatting motives and usage patterns,” ACM Trans. Comput.-Hum. Interact., vol. 23, no. 1, pp. 6:1–6:27, Jan. 2016.
-  M. McGill, J. H. Williamson, and S. Brewster, “Examining the role of smart tvs and vr hmds in synchronous at-a-distance media consumption,” ACM Trans. Comput.-Hum. Interact., vol. 23, no. 5, pp. 33:1–33:57, Nov. 2016.
-  J. Steuer, “Defining virtual reality: Dimensions determining telepresence,” Journal of communication, vol. 42, no. 4, pp. 73–93, 1992.
-  M. V. Sanchez-Vives and M. Slater, “From presence to consciousness through virtual reality,” Nature Reviews Neuroscience, vol. 6, no. 4, pp. 332–339, 2005.
-  K. Christakopoulou, F. Radlinski, and K. Hofmann, “Towards conversational recommender systems,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: ACM, 2016, pp. 815–824.
-  L. Chen and P. Pu, “Critiquing-based recommenders: survey and emerging trends,” User Modeling and User-Adapted Interaction, vol. 22, no. 1, pp. 125–150, 2012.
-  G. Bresler, G. H. Chen, and D. Shah, “A latent source model for online collaborative filtering,” in Advances in Neural Information Processing Systems, 2014, pp. 3347–3355.
-  X. Zhao, W. Zhang, and J. Wang, “Interactive collaborative filtering,” in Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 2013, pp. 1411–1420.
J. Kawale, H. H. Bui, B. Kveton, L. Tran-Thanh, and S. Chawla, “Efficient thompson sampling for online￼ matrix-factorization recommendation,” inAdvances in Neural Information Processing Systems, 2015, pp. 1297–1305.
-  W. Sherchan, S. Nepal, and C. Paris, “A survey of trust in social networks,” ACM Computing Surveys (CSUR), vol. 45, no. 4, p. 47, 2013.
-  J. Masthoff, “Group recommender systems: Combining individual models,” in Recommender systems handbook. Springer, 2011, pp. 677–702.
-  K. McCarthy, M. Salamó, L. Coyle, L. McGinty, B. Smyth, and P. Nixon, “Group recommender systems: a critiquing based approach,” in Proceedings of the 11th international conference on Intelligent user interfaces. ACM, 2006, pp. 267–269.
-  C. Carlsson and O. Hagsand, “Dive a multi-user virtual reality system,” in Virtual Reality Annual International Symposium, 1993., 1993 IEEE. IEEE, 1993, pp. 394–400.
-  P. Curtis and D. A. Nichols, “Muds grow up: Social virtual reality in the real world,” in Compcon Spring’94, Digest of Papers. IEEE, 1994, pp. 193–200.
-  T. Igarashi and J. F. Hughes, “Voice as sound: using non-verbal voice input for interactive control,” in Proceedings of the 14th annual ACM symposium on User interface software and technology. ACM, 2001, pp. 155–156.
-  P. R. Cohen and S. L. Oviatt, “The role of voice input for human-machine communication,” Proceedings of the National Academy of Sciences, vol. 92, no. 22, pp. 9921–9927, 1995.
-  M. C. Salzman, C. Dede, R. B. Loftin, and J. Chen, “A model for understanding how virtual reality aids complex conceptual learning,” Presence: Teleoperators and Virtual Environments, vol. 8, no. 3, pp. 293–316, 1999.
-  V. Becker, “Interactive television experience in convergent environment: Models, reception and business,” in Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video. ACM, 2016, pp. 119–122.
-  G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE transactions on knowledge and data engineering, vol. 17, no. 6, pp. 734–749, 2005.
-  S. B. Roy, S. Thirumuruganathan, S. Amer-Yahia, G. Das, and C. Yu, “Exploiting group recommendation functions for flexible preferences,” in 2014 IEEE 30th International Conference on Data Engineering. IEEE, 2014, pp. 412–423.
-  H. Wu, Y. Wang, and X. Cheng, “Incremental probabilistic latent semantic analysis for automatic question recommendation,” in Proceedings of the 2008 ACM Conference on Recommender Systems, ser. RecSys ’08. New York, NY, USA: ACM, 2008, pp. 99–106.
-  D. H. Stern, R. Herbrich, and T. Graepel, “Matchbox: large scale online bayesian recommendations,” in Proceedings of the 18th international conference on World wide web. ACM, 2009, pp. 111–120.
D. Chen and C. Manning, “A Fast and Accurate Dependency Parser using Neural Networks,” inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Oct. 2014, pp. 740–750. [Online]. Available: http://www.aclweb.org/anthology/D14-1082
-  M. Zhu, Y. Zhang, W. Chen, M. Zhang, and J. Zhu, “Fast and accurate shift-reduce constituent parsing.” in ACL (1), 2013, pp. 434–443.
-  L. A. Ramshaw and M. P. Marcus, “Text chunking using transformation-based learning,” arXiv preprint cmp-lg/9505040, 1995.
-  C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky, “The stanford corenlp natural language processing toolkit.” in ACL (System Demonstrations), 2014, pp. 55–60.
-  S. Bird, “Nltk: the natural language toolkit,” in Proceedings of the COLING/ACL on Interactive presentation sessions. Association for Computational Linguistics, 2006, pp. 69–72.
-  S. Loria, “https://textblob.readthedocs.io/en/dev/,” 2013.
-  B. Liu, M. Hu, and J. Cheng, “Opinion observer: analyzing and comparing opinions on the web,” in Proceedings of the 14th International conference on World Wide Web. ACM, 2005, pp. 342–351.
Q. Liu, Z. Gao, B. Liu, and Y. Zhang, “Automated rule selection for aspect
extraction in opinion mining,” in
Proceedings of the 24th International Conference on Artificial Intelligence, ser. IJCAI’15, 2015, pp. 1291–1297.
T. Mikolov and J. Dean, “Distributed representations of words and phrases and their compositionality,”Advances in Neural Information Processing systems, 2013.
-  A. Pesarin, M. Cristani, V. Murino, and A. Vinciarelli, “Conversation analysis at work: detection of conflict in competitive discussions through semi-automatic turn-organization analysis,” Cognitive processing, vol. 13, no. 2, pp. 533–540, 2012.
-  O. Vinyals and G. Friedland, “Towards semantic analysis of conversations: A system for the live identification of speakers in meetings,” in Semantic Computing, 2008 IEEE International Conference on. IEEE, 2008, pp. 426–431.
-  N. Jovanović et al., “Towards automatic addressee identification in multi-party dialogues.” Association for Computational Linguistics, 2004.
-  D. Wyatt, T. Choudhury, J. A. Bilmes, and H. A. Kautz, “A privacy-sensitive approach to modeling multi-person conversations.” in IJCAI, vol. 7, 2007, pp. 1769–1775.
-  F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, pp. 19:1–19:19, Dec. 2015.
T. Hofmann, “Unsupervised learning by probabilistic latent semantic analysis,”Machine learning, vol. 42, no. 1-2, pp. 177–196, 2001.
-  B. Mehta, T. Hofmann, and W. Nejdl, “Robust collaborative filtering,” in Proceedings of the 2007 ACM conference on Recommender systems. ACM, 2007, pp. 49–56.
-  J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Kamvar, and B. Strope, ““your word is my command”: Google search by voice: A case study,” in Advances in Speech Recognition. Springer, 2010, pp. 61–90.
-  B. P. Knijnenburg, M. C. Willemsen, Z. Gantner, H. Soncu, and C. Newell, “Explaining the user experience of recommender systems,” User Modeling and User-Adapted Interaction, vol. 22, no. 4-5, pp. 441–504, Oct. 2012.
-  P. Pu, L. Chen, and R. Hu, “A user-centric evaluation framework for recommender systems,” in Proceedings of the Fifth ACM Conference on Recommender Systems, ser. RecSys ’11. New York, NY, USA: ACM, 2011, pp. 157–164.