Appendix of "Discovering discussion topics about development of cross-platform mobile applications using a cross-compiler development framework"
A cross-platform mobile application is an application that runs on multiple mobile platforms (Android or iOS platforms). One strategy for developing this kind of mobile applications involves to develop, using platform-related toolkits, a native application for each chosen platform. Several frameworks have been proposed to simplify the development of cross-platform mobile applications and, therefore, to reduce development and maintenance costs. Between them, the cross-compiler mobile development frameworks transform the application's code written in intermediate (aka non-native) language to native code for each platform. However, to our best knowledge, there is no much research about the advantages and disadvantages of the use of cross-compiler frameworks during the development and maintenance phases of mobile applications. This paper aims at contributing with one of the first bricks in that research direction. We study what mobile developers that use cross-compiler frameworks ask about when they develop and maintain cross-platform mobile applications. In particular, we focus on one framework: Xamarin from Microsoft. For that, we first created two datasets of questions and answers (QA) related to the development of mobile applications using Xamarin by mining two QA sites: Xamarin Forum and Stack Overflow. We analyzed and compared the number of questions, views and accepted. Then, we applied LDA on Xamarin-related questions to discover the main topics asked by developers that use Xamarin. Finally, we compared the discovered topics with those topics about mobile development. Our findings show that Xamarin Forum has a larger number of questions than Stack Overflow, however, the latter has more answers per question. Moreover, both sites share most of the main topics, which mainly discuss about user interface (UI), formatting, design and navigation.READ FULL TEXT VIEW PDF
Appendix of "Discovering discussion topics about development of cross-platform mobile applications using a cross-compiler development framework"
Nowadays, there are billions of smartphone devices around the world. Smartphones are mobile devices that run software applications (apps) such as games, social network and banking apps. A native mobile application is an app built to run in a particular mobile platform. Currently, there are two platforms that dominate the smartphone market: Android (from Google) and iOS (from Apple), with the 99.7% of the market share as of the first quarter of 2017 mobileshare .
A cross-platform mobile application is an application that targets more than one mobile platform. To cover a large number of users and, thus, to increase the impact on the market and revenues, companies and developers aim at releasing their mobile apps to both Android and iOS platforms. A traditional approach for developing this kind of apps is to build, for each platform, a native application using a particular programming language (e.g., Java for Android, Objective-C or Swift for iOS), SDK (Software Development Kit) and IDE (e.g., Android Studio, XCode for iOS). Unfortunately, this approach increases the cost of development and maintenance. For example, a company needs developers with different competence for developing an app for two platforms, resulting in two native apps. Moreover, as studied by previous works, those native apps could have different quality. For example, Hu et al. hu2016crossconsistency found that 68% of the studied cross-platform apps have different start ranking across the App Store and Google Play stores. Furthermore, Ali et al. Ali2017SAD analyzed 80,000 cross-platforms apps from those app stores and found that the Android version of app-pairs receives higher user-perceived ratings compared to the iOS version.
During last years, researches (e.g., perchat2014common ) and industry companies (e.g, Microsoft, Facebook Inc.) have both focused on proposing development frameworks with the goal of making easier the development and maintenance of cross-platforms mobile apps. Earliest frameworks focused on producing hybrid mobile applications: apps built coding both non-native (e.g., HTML for Phonegap/Cordova111https://cordova.apache.org) and native code. The non-native code is shared across all the platforms’ implementations, whereas the native is written for a particular platform. Nonetheless, beyond the good results of some of them for developing simple apps (Joorabchi2013Challenges ; heitkotter2012evaluating ), companies such as Facebook found the resulting applications do not have the same user experience than purely native applications Martinez:2017:TQI .
However, beyond the advantages of sharing code, we wonder whether developing cross-platforms apps using cross-compiler frameworks has any drawback with respect to the native development during the life-cycle of a mobile application. To our knowledge, as also mentioned by Nagappan2016:TrendsMobile , no previous work has studied that yet. Our long term goal is to study the differences between the development process that uses cross-compiler framework and the process that uses traditional development toolkits.
To encourage the study of cross-platforms mobile application, this paper presents two datasets of questions and answers (Q&A) related to one cross-platform development framework: Xamarin from Microsoft.
The two datasets are conformed by questions and answers extracted from Stack Overflow and Xamarin Forum, respectively. The latter is a Q&A site exclusively dedicated to Xamarin technology.444https://forums.xamarin.com/ The main reasons that motivate us to study Xamarin are: a) availability of documentation and guidelines555https://developer.xamarin.com/guides/; b) 5+ books edited during last years (e.g., hermes2015xamarin ; snider2016mastering ; peppers2015xamarin ); c) development toolkits available666https://docs.microsoft.com/visualstudio/; d) availability of testing environment for cross-platform apps called Xamarin Test Cloud.777https://www.xamarin.com/test-cloud, and e) Xamarin is one of the top 10 most popular development frameworks and libraries according to the Stack Overflow Survey 2018.888https://insights.stackoverflow.com/survey/2018/#technology-frameworks-libraries-and-tools
Previous works have analyzed mobile-related Q&A from Stack Overflow with different purposes, for instance, for discovering main topics related to mobile development (Linares-Vasquez2013EAM ; Rosen2016MDA ; Beyer2014 ). To show the utility of the two Xamarin-related Q&A dataset, in this paper we replicate one of those studies, i.e., Rosen and Shihab Rosen2016MDA , focusing exclusively on the Xamarin technology, instead of focusing on general mobile development as done by Rosen and Shihab. Our study discovers the main discussion topics present in the questions related to Xamarin by applying Latent Dirichlet Allocation (LDA). Finally, to know about the particularities of cross-platform apps developed using Xamarin, we compare the topics we discovered with those topics related to general mobile development discovered by Rosen and Shihab.
Our research is guided by the following research questions:
RQ 1: What is the number of a) Xamarin-related questions, b) answers and accepted, c) views on Stack Overflow and Xamarin Forum?
RQ 2: What are the main topics discussed about Xamarin in Q&A sites?
RQ 3: How many main topics from Stack Overflow and Xamarin Forum are also topics related to general mobile development discovered by Rosen and Shihab?
RQ 4: What are the most relevant questions from Xamarin-related topics?
The contributions of this paper are:
A dataset of Q&A related to Xamarin technology filtered from Stack Overflow.
A dataset of Q&A extracted from Xamarin forum.
An analysis of the two datasets (e.g., number of questions, answers, views).
Discussion topics discovered from questions related to Xamarin technology.
An study about the relation between topics from Stack Overflow and Xamarin Forum and those topics discussed in questions related to development of native mobile applications from Rosen2016MDA .
The study of the relevant questions from 3 Xamarin-related topics.
The paper is organized as follows. Section 2 discusses the related work. Section 3 presents two datasets of Q&As extracted from Stack Overflow and Xamarin Forum. Section 4 analyzes both datasets in terms of number questions, answers accepted, and views. Section 5 replicates the study of Rosen and Shihab Rosen2016MDA for discovering topics from Stack Overflow and Xamarin Forum. Section 6 presents the discussion. Section 7 concludes the paper.
All data discussed in this paper, including the two Q&A datasets, is publicly available in our web appendix: http://anonymous.4open.science/repository/3ac646ee-03f9-40d0-a7fc-d0a6124af979/
During the recent years several works have studied Q&A from the site Stack Overflow MSRChallenge2013 ; Bazelli2013PTS ; Pinto2014MQS ; Pinto2015SMP . Some of them focused on questions about Android platform Linares-Vasquez:2014:ACT ; Wang:2013:DAU ; Stevens:2013:APU ; Beyer2014 ; Beyer2016GAT ; Abdalkareem2017Reuse ; Treude2017CF ; Syer2015 . For example, Linares-Vasquez et al. Linares-Vasquez:2014:ACT studied questions and activities in Stack Overflow when changes on Android APIs occur, finding that deleting public methods from APIs is a trigger for questions that are discussed more. Wang et al. Wang:2013:DAU analyzed posts from Stack Overflow related to iOS and Android APIs to find API usage obstacles, and then they applied a topic modeling technique to discover several repetitive scenarios in which API usage obstacles occur. Stevens et al. Stevens:2013:APU studied questions about Android permission use on Stack Overflow. Other works have focusing on mobile-related tags from Stack Overflow: Beyer et al. Beyer2014 investigated 450 Android-related posts from Stack Overflow to get insights into the issues of Android app development. To our knowledge, no work has studied Stack Overflow questions and problematic related to neither cross-complied frameworks nor Xamarin technology.
In this paper, we analyze Q&A related to Xamarin technology, and then we apply applied a topic modeling techniques called Latent Dirichlet Allocation (LDA) to obtain the main topics from those questions. As reported by Chen et al. Chen2016survey and Sun et al. Sun2016Survey several works have already applied topic modeling techniques such as LDA. For instance, Barua et al. Barua2014DTA analyzed Stack Overflow data to automatically discover the main topics present in developer discussions. Yang et al. Yang2016 focused on discovering topics related to security by analyzing security-related posts. Bajaj et al. Bajaj:2014 discovered topics from discussions about web developers and focused on how prevalent are web-related topics in discussions related to mobile web development. Other works have focused on topic modeling for native mobile technologies. For example, Linares-Vasquez et al. Linares-Vasquez2013EAM used LDA to extract hot-topics from mobile-development related questions. Their findings suggest that most of the questions include topics related to general questions and compatibility issues, whereas the most specific topics are present in a reduced set of questions. Rosen and Shihab Rosen2016MDA applied LDA on mobile-related Stack Overflow posts to determine what mobile developers are asking. Across all native platforms studied, they found that questions related to app distribution, user interface, and input are among the most popular. Pinto et al. Pinto2016Swift have analyzed questions about Swift, programming language (successor of Objective-C) for building native iSO mobile apps. They applied LDA to find common problems faced by Swift developers, finding that the language is easy to understand and adopt, but there are many questions about problems in the toolkit (IDE, SDK). In our work, we focus on topics related to Xamarin development framework.
Other works have study both Stack Overflow and other sources of information. For example, Wang et al. Wang2017Linking studied the mutual knowledge sharing between Android Issue Tracker and Stack Overflow. Their goal is to bridge the two communities by linking related issues to posts automatically. Ye et al. Ye2017DK studied how users share URLs in Stack Overflow for understanding how knowledge diffusion process takes place on that site. They found that the 31% of the shared URLs on Stack Overflow is to reference information that can help to solve a complex problem. Other works focus on analyzing developer forums. For example, Venkatesh et al. Venkatesh2016 mined both developer forums and Stack Overflow to find the common challenges encountered by developers when using Web APIs. As difference with our work, they put all posts (from both Web APIs and Stack Overflow) in a same dataset, which is later used for extracting topics using LDA. Lee et al. Lee2017 studied the similarity in developer interests within and across GitHub and Stack Overflow. They found that the 39% of the GitHub repositories and Stack Overflow questions that a developer had participated fall in the common interests. Zagalsky et al. Zagalsky2016:RCurates focused on R language by analyzing questions and answers from two channels: the R-tag in Stack Overflow and the R-users mailing list. They found that knowledge is constructed in each channel in a different manner: on Stack Overflow participants contribute knowledge independently of each other, whereas R-user mailing list are more likely to build on other answers. In this work, we create two datasets of Q&A related to Xamarin technology, that could be used for replicating some of these works (e.g., Zagalsky2016:RCurates ).
Other works focus on classifying, comparing and evaluating cross-platform mobile application development tools to build hybrid mobile and native apps (heitkotter2012evaluating ; francese2013supporting ; dalmasso2013survey ; palmieri2012comparison ; desruelle2012challenges ). For example, Ciman et al. Ciman2016 analyzed the energy consumption of mobile development: their results showed the adoption of cross-platform frameworks as development tools always implies an increase in energy consumption, even if the final application is a real native application. In that work, Xamarin framework was not evaluated. To the best of our knowledge, no work has studied cross-platforms mobile applications created with Xamarin framework. In this work we study questions from Q&A to understand the main problematic faced by developers when they develop cross-platforms mobile apps using Xamarin framework.
In this section, we present two datasets with questions and answers (Q&A) related to Xamarin technology. In section 3.1 we present the first dataset that is composed of posts extracted from the site Stack Overflow.999https://stackoverflow.com/ In section 3.2 we present the second dataset, named XamForumDB, is composed of posts extracted from the official forum of Xamarin.101010https://forums.xamarin.com/ Finally, in section 4 we briefly describe the data from both datasets. In section 5.2 we use both datasets for extracting the main discussion topics related with Xamarin technology.
We used the data dump of Stack Overflow provided by Stack Exchange, Inc.111111https://archive.org/details/stackexchange The data is organized on 8 XML files, each one represents one data entity: Posts, Users, Comments, Tags, Votes, PostLinks, PostHistory, and Badges.121212https://ia800500.us.archive.org/22/items/stackexchange/readme.txt For facilitating the manipulation of the data, we migrated it to a relational dataset.131313https://gist.github.com/gousiosg/7600626
A post from the Stack Overflow data dump corresponds to a question or an answer done by an user. Question can have zero or more tags (up to 5). A post can also have a vote, which represents a mark as ‘favorite’, ‘spam’, ‘inform moderator’, ‘offensive’, etc. The Stack Overflow dataset contains 14,458,875 questions and 22,668,556 answers.141414Data dump released the August 31, 2017
The first challenge is to filter posts related to Xamarin technology among all posts present in the Stack Overflow data dump. For that, we apply two techniques. The first one filters posts according to the tags associated to each post. The second technique matches keywords from the posts’ titles. Let us detail each strategy.
We used the technique proposed by Rosen and Shihab Rosen2016MDA , who filtered posts related to mobile technologies. Their technique consists of three steps. The first step filters posts using a initial set of tags (in that work the author used mobile related tags such as ‘Android’, ‘iOS’). The second step analyzes the most representative and significant tags from the posts retrieved after applying the first step. Finally, the last step filters posts from Stack Overflow using the most representative tags for the mobile technology (those tags discovered in the previous step).
|All posts||Xamarin posts|
We applied the technique as follows. In the first step we defined the initial set of tags: as we aim at filtering Xamarin-related posts, our initial set was “%xamarin%”, where % corresponds to zero or more characters. In total, 28 tags include the word ‘xamarin’ such as ‘xamarin.iso’, and formed the initial set of tags. We then retrieved 39,855 questions tagged with at least one tag from the initial set. Those 39,855 questions have, in total, 4,143 tags. Note that, those retrieved posts could also be tagged with tags that are not included in the initial set, but they are related to Xamarin technology such as ‘monodevelop’. To obtain tags related to the Xamarin technology not included in the initial set, the technique by Rosen and Shihab presents two measures, TRT (tag relevance threshold) and TST (tag significance threshold) Rosen2016MDA , that help us to: a) obtain those tags representative to Xamarin technology, and b) discard those tags that are related to Xamarin technology but, at the same time, they are too general. For instance, a significant portion of Xamarin posts are tagged with "Android". However, most of the questions tagged with Android are not related to Xamarin technology.
TRT() is the ratio between the number of posts that contain at least one tag from the initial set and the total number of posts related to . TST() is the ratio between the number of mobile posts for and the number of mobile posts for the most popular mobile tag.
In the second step, we filtered those tags with at least a 25% for representative (TRT) and 0.1% for significance (TST). In total, 38 tags conforms the final set of tags related to Xamarin technology. Table 1 shows the 10 tags with higher number of posts, and for each tag the number of total occurrences on a) all posts and b) on Xamarin posts, and the measures TRT and TST. For example, the final tag set includes tag "monotouch.dialog", which has a TRT of 92.84%: almost all posts tagged with that tag are also tagged with one tag from the initial tag set such as ‘Xamarin’. Moreover, those posts represent the 2.34% (i.e., the TST value) out of all posts tagged with at least one initial tag. On the contrary, the tag ‘Android’ is not included in the final tag set: its TRT metric is lower that the threshold we set: only the 1.07% of post tagged with Android also are tagged with Xamarin (i.e., TRT metric).
Finally, for the third step of the technique, we retrieved 43,988 posts related to Xamarin technology: 39,855 of them tagged with the initial tags (i.e., include the keyword "Xamarin") and 4,133 posts tagged with at least one tag from remaining final set of tags.
The second technique for retrieving Xamarin-related posts filters posts by matching keywords on the post’s title. This strategy aims at retrieving those posts that are not tagged with any tag from the final set of tags. We define a keyword for each tag included in final set of tags presented before.
Using this keyword-based strategy, we retrieved 446 new posts related to Xamarin technology. For instance, one of them is the post "Xamarin Android Save sms" (post id 29405420)151515https://stackoverflow.com/questions/29405420/xamarin-android-save-sms which was tagged with 3 tags: "C#, android, datetime", none of them included in our final set of tags.
Using the two described techniques, we retrieved 44,434 questions from the Stack Overflow dump: 43,988 were found using the tag-based strategy, whereas 446 were found using the keywords-based strategy.
The Xamarin Forum is an online web platform where mobile developers post questions or start a discussion about the Xamarin framework and its ecosystem. Moreover, it is a communication channel between the developers of the framework (a.k.a. Xamarin Team) and users (i.e., the developers of mobile apps that use Xamarin). For instance, new versions of the framework or of particular components are announced in the forum.161616https://forums.xamarin.com/discussion/85747/xamarin-forms-feature-roadmap#latest All posts are publicly available. However, for creating a new post, users must be registered into the site, which is free.
The Xamarin Forum site is composed of seven general forums: Community, General, Pre-release & Betas, Tools and Libraries, Graphics & Games, Xamarin Platform and Xamarin Products. For instance, the forum Xamarin Platform contains questions related to the development of an application for different targeted platforms, while the forum Xamarin Product focuses on discussions about products related to the Xamarin technology such as TestCloud, a cloud-based platform for testing mobile apps built using Xamarin. Each of those forums has one or more ‘sub-forums’. For instance, the mentioned Platform forum has 5 categories: Android, iOS, Cross-Platform, Mac, and Xamarin.Forms. The forums are created by the forum administrators, which means that users are not able to define new ones.
We identify two types of forums: 1) those related to technological topics (platforms, libraries, tools, IDEs, etc.) and 2) those related to non-technological topics related to Xamarin (events, conferences, jobs, etc.).
The main page of each forum shows a paged-list of posts and two buttons, one for creating a question and the other for creating a discussion. Each post from the list shows the post’s title, author, number of views, number of answers, and zero or more labels. Those labels are colored boxes located near the title and indicate, for instance, if a question was answered or if the answered was accepted by the user who wrote the question.
A user can create a post on a given category. Even there are two types of posts, i.e., questions and discussions, the creation forms of them are similar, i.e., both have the same fields. The user can also select a list of tags associated the new post. However, once a post is created, the forum does not show the list of tags associated to a post.171717Last visit: June, 2018
A registered user can answer an existing question or to put a ‘like’ on existing answers. Moreover, as in others Q&A such as Stack Overflow, the post’s author can accept one or more answers. The accepted answers are labeled with a green-colored box tag “Accepted answer".
The Xamarin Forum provides the forms for registering new users, which have the right to create new posts and to write comments. In the forum also participate users that belong to the organization that develops Xamarin framework.
The Xamarin Forum site does not have an API to programmatically access and retrieve the data, such as those provided by Stackoverflow181818https://api.stackexchange.com/ and GitHub191919https://developer.github.com/v3/. For this reason, we developed a web crawler for retrieving and storing all pages from Xamarin Forum. Our engine has main two phases: 1) Page fetching and 2) Page parsing.
The first phase fetches (i.e., downloads) all web pages written in HTML from the forum web site https://forums.xamarin.com/, by accessing via HTTP protocol. The Xamarin Forum has two kinds of web pages. A page from the first kind corresponds to a single post. It contains the post’s title, question (or discussion topic), author data (names, location, roles), posting date and a list of comments. Each comment has the name of the user that wrote it, the date, and one or more labels such as “Answered question". The second kind of pages corresponds to the main pages of each forum. A “main" page shows in a paged list all posts done in that forum202020In those main pages, those posts are called ‘Threads’, ordered by decreasing creation date. The list shows for each post its title, the numbers of views (i.e., number of visits the post has received), numbers of comments done, and zero or more labels that indicates if the post is a question or announcement, if it has an accepted answer, etc.
In the second phase, our engine extracts and format data for each fetched page and store it in a relational database. Our engine is implemented in python language, and uses the library BeautifulSoup212121https://www.crummy.com/software/BeautifulSoup/bs4/doc/ for parsing the Xamarin web pages in HTML. The structured data is then stored in a MySQL database.
The store all data extracted from the Xamarin forum in a relational database. We now describe briefly the main entities from the database schema. The first entity is Post, which stores all posts written in the Xamarin forums. Each post belongs to one Forum and has zero or more Comments. Note that the post entity stores both questions and discussions. We decide to model both questions and discussion in the same entity due they share most of their properties. Both Comment and Post entities have a relation many-to-one with the entity User, which stores the information of the users that make or answer a post. A User has different Roles. A registered user has the role ‘Member’ by default, but there are other roles exclusively assigned to used belonging the Xamarin organization such as ‘Forum Administrator’, or role ‘XamUProfessor" which are professionals involved on the Xamarin training program called Xamarin University.222222https://www.xamarin.com/university
In our opinion, both datasets can be used by the research community for understanding the main concerns about developing cross-platform mobile applications using Xamarin framework. In this paper we use them to discover the main discussed topics from their questions.
In this section we describe and compare the two datasets with questions and answers related to Xamarin technology presented in section 3 with the goal of answering the first research question: What is the number of a) Xamarin-related questions, b) answers and accepted, c) views on Stack Overflow and Xamarin Forum?.
The XamForumDB has 91,838 questions.232323In this paper, we use the terms ‘question’, ‘post’, and ‘thread’ interchangeably. The first dates from January, 29 2013, and the last one from September, 6 2017. There are 85,908 (93.5%) questions related to technological forums, and 5,930 (6.5%) related non-technological forums such as Events has 1,079 questions (1.17%), Job Listings 465 (0.51%). In this work, we are interested on those technological questions, discarding the non-technological for further analyses.
On Xamarin Forum, the 72.3% (62,114 out of 85,908) of the technological questions have at least one answer. In average, each question has 2.99 answers. The remaining 29.7% (23,794) has not been answered. Moreover, the 15.2% (13,025 out of 85,908) of the questions have, at least, accepted answer. The proportion of answered questions is similar in Stack Overflow and Xamarin Forum: 79% vs 72.3%. However, the proportion of accepted answers on Stack Overflow is 3 times as larger as that one from Xamarin forum: 46.1% vs 15.2%.
On Stack Overflow, the 79% (35,100 out of 44,434) has at least one answer. In average, each question has 1.13 answers. The 46.1% (20,495 out of 44,434) of Xamarin questions has one accepted answer, whereas the 33.9% (14,605) has at least one answer but any on them was accepted. The remaining 20% (9,334) has not been answered.
Xamarin-related questions from Stack Overflow were viewed 36,523,008 times, 821.9 views per question as average, whereas those from Xamarin Forum were viewed 35,032,568, that is, 407.8 views per question as average. In conclusion, questions from Xamarin Forum have received almost the same number of views than Xamarin-related questions from Stack Overflow (>35 millions). However, Stack Overflow questions are viewed, in avg., two times more that those from Xamarin forum.
Response to RQ 1: What is the number of a) Xamarin-related questions, b) answers and accepted, c) views on Stack Overflow and Xamarin Forum? a) Xamarin forums has a 85,908 questions related to Xamarin and Stack Overflow has 44,434. b) The proportion of answered questions is similar in Stack Overflow and Xamarin Forum (79% vs 72.3%), however, the proportion of accepted answers on Stack Overflow is 3 times as larger (46.1% vs 15.2%); c) Questions of both sites have received almost the same number of views (>35 millions), however, those from Stack Overflow are viewed, in avg., two times more.
In this section, we present an study that replicates the experiment replicates the study of Rosen and Shihab Rosen2016MDA . That work focuses at discovering topics from mobile development (i.e., not related to neither a particular platform nor framework) from Stack Overflow. On the contrary, our study aims to discover topics from Xamarin-related questions from Stack Overflow and Xamarin Forum. In addition, our replication study compares those Xamarin-related topics with those found by Rosen and Shihab. This section is organized as follows: In Section 5.1 we first introduce the methodology to discover topics (based on that one from Rosen and Shihab), and in Section 5.2 we then present the results. In the rest of this paper, when the mention Stack Overflow questions we refer to the dataset of questions from Stack Overflow related to Xamarin technology that we built in section 3.1, whereas Xamarin Forum questions are those from the XamForumDB built in section 3.2.
In this section, we present three methodologies that we execute to respond the research questions. First, in section 5.1.1 we present a methodology for discovering the discussion topics from questions from Stack Overflow and Xamarin Forum (i.e., from the datasets we presented in sections 3.1 and 3.2). The results will us allow to respond the second research question. Then, in section 5.1.2 we present a methodology for relating topics discovered from different sources. We use it to respond the third research question. Finally, in section 5.1.3 we introduce a methodology for retrieving relevant questions from a topic and we use it to respond the forth research question.
This section presents a methodology for discovering the main topics from a dataset of questions. We applied this procedure for obtaining two sets of main topics: one from Stack Overflow questions, another from Xamarin Forum questions.
The methodology consists of the following steps.
For each question from a dataset (Xamarin Forum or Stack Overflow), we created a document that includes all words contained in the question’s title as done by Rosen and Shihab Rosen2016MDA . In case of Xamarin Forum, we only considered the technological posts (i.e., the 93.5% of all posts), as discussed in Section 3.1.2.
We pre-processed all documents by first removing stop words (e.g., "at", "the") and then applying a customized word stemming processing. During the setup of our experiment, we noted that traditional stemming (as applied by, for example, Linares-Vasquez2013EAM ) alternated domain-specific words such as “iOS" or “Mono", and, by consequence, the step produces a loss of information. With the goal of minimizing the impact of traditional stemming, we defined a customized word stemming process composed by the following steps. We first created a list of words related to the Xamarin technology. Those words are the tags related to Xamarin posts presented in section 3.1.2. The list, called technology-specific words list, is available in our appendix. Words from a post title that belong to that list are not stemmed and they are included in a document without any transformation applied. For the words that are not included in the list, we applied the first step of the stem process described by Porter1997ASS for transforming all words in singular.
We applied Latent Dirichlet allocation (LDA) algorithm as done by Blei et al. Blei2003LDA for discovering topics from a set of documents, where each document corresponds to a pre-processed question’s title. Several works have used LDA for discovering main topics in the domain of, for example, web development Bajaj:2014 , software maintenance Sun2015MSR , and mobile development (Rosen2016MDA ; Linares-Vasquez2013EAM
). As output, LDA produces a set of topics, each composed of a list of pairs word-probability. Typically, previous works represent a topic with the 15 or 20 words with highest probability (Linares-Vasquez2013EAM and Rosen2016MDA , respectively). We used an implementation of LDA called Mallet mccallum2002mallet .
One of the challenge of LDA to chose a configuration (i.e., values for the input parameters) that produce meaningful topics. There are four main parameters to configure: a) alpha, b) beta, c) number of topics to generate, and d) number of iterations. In this experiment, we used a similar configuration proposed by Rosen and Shihab Rosen2016MDA , one of the closest related work, which studied what mobile developers are asking about on Stack Overflow. The configuration they used is: 40 topics to generate (nrtopics), alpha = 0.025 (that is: 1/nrtopics); beta = 0.1 and number iterations = 1000. Having similar configuration to Rosen and Shihab allows us to compared the topics we discovered from Xamarin-related questions, with those from native mobile apps discovered by them Rosen2016MDA (section 5.2.2).
In the remaining of the paper, we refer as and , to topics with identifier from Stack Overflow and Xamarin, respectively.
As each topic discovered from LDA is a list of words, previous works labeled each topic with a human readable label. In this work we decided to reuse, when is possible, labels related to mobile technology defined by Rosen and Shihab Rosen2016MDA , which discovered and labeled topics related to mobile development. Some of the labels from this work are “Input”, “Tools”,“Application Store”, “Parsing”, “Map and Location”. In case that none of their labels describe correctly a topic that we discovered, we defined a new label by taking into account the words from the topic and their probabilistic. In some cases we used the same label for two related topics and included a sub-category in parenthesis for remarking the differences between them.
In our experiment, we need to find whether a topic from a dataset (e.g., Xamarin Forum) is also a topic from another dataset (e.g., Stack Overflow or that one from Rosen and Shihab).
We related two topics from different datasets if they have: a) similarity on their topic labels; or b) similarity on the most significant works that conform the topic. The significant of a word is the probability given by the generated LDA model to that word. For example, we related topics 28 with 40 because they have the same topic label: “User Interface (Table)”. As the labels do not always match, we related other topics using the words from the topic. For example, topic 8 was label as “Graphic and memory", and the most similar topic in Xamarin that we found is 16, which was labeled as “Video memory”. We related both topics because they shares significant words like “memory” and “leak”. The lists of topics (words and their probabilities) and the related topics of each are available in our appendix.
In this paper, we define relevant questions of a topic as those questions from with: a) most number of views, or b) highest score, i.e., number of up votes (only for Stack Overflow). Note that all questions we consider for analyzing topic have as dominant topic. A dominant topic of document is the most related topic (i.e., with highest probabilistic) according to the LDA topic model generated.
The intuition is that if a question asks about a relevant subject or a recurrent problem about Xamarin technology, it is frequently visited and appreciated with up votes done by other developers. For each topic, we selected the questions with at least 10,000 views and 10 up votes as score.
|11,32||List (Forms)||7,10||View Controllers/Navigations|
|28||User Interface (Table)||4,8||Lists (Forms)|
|23||User Interface (Layout)||3||Forms (WebView)|
|3||Web Services||5||Android (Activity)|
|5||Forms (XAML)||17||User Interface (Style)|
|4||Library/Native||6||Map & Location|
|10||HTTP Request||28||IDE (Visual Studio)|
|21||Inputs (Event)||32||IDE (Simulator- for-Mac)|
|20||Android (Activity)||15||Application Store|
|39||User Interface (Style)||35||User Interface (Layout)|
|18||Android (Debug/Device)||9||Error (Unified/Insight)|
|16||Notifications||40||User Interface (Table)|
|33||Location||33||Language Questions (OOP)|
|31||Input (Text)||36||Phone Orientation/Media-images|
|34||Threading||27||Questions (Forms, Samples)|
|38||Windows Platform||29||Error (Code/Compilation)|
Table 2 shows the topics from Stack Overflow and Xamarin Forum questions discovered using the methodology described in section 5.1.1. The left-size displays the Stack Overflow topics and the right-side displays the Xamarin topics. The column Id corresponds to the topic identifier assigned by Mallet; the column Label corresponds to the label manually assigned by us using the method presented in section 5.1.1. The topics are sorted by decreasing NDDT. As explained by Linares-Vasquez et al. Linares-Vasquez2013EAM NDDT counts for each topic the number of documents that has as as dominant topic. In our appendix we presented, for each discovered topic, the words that compose the topic, their probabilities, and NDDT metric. Let us first present some of the discovery topics, and then to compare the topics discovered from both datasets.
Let us first focus on Stack Overflow: the top-4 topics (according with the number of dominant documents NDDT) are related related to the view: 11 and 32 both labelled with "List (Forms)"; 28 labelled with "User Interface (Tables)"; and 23 "User Interface (Layouts)". Then, the next two topics are not directly related to the user interface, for example: 22 "Emulator/Device/Simulator" and 3 "Web Service".
For Xamarin Forum, the top-5 topics from Xamarin Forum are related to View or Controllers development: 7 and 10 both labelled with "View Controllers/Navigation", 4 and 8 both labelled with "Lists (Forms)", and 3 labelled with "Forms (WebView)". Then, after them there are two topics not related to the user interface: 1 "Resources/Files" and 5 "Android" (Activity).
When we analyzed the remaining topics, we observed that the topics discussed in the Q&A are diverse. For instance, those cover topics such as: resources and files ( 6 and 1), web services ( 3 and 37), language questions ( 9 and 33), databases ( 24 and topic 30), notifications ( 16 and 18), packages ( 7 and 31), architectural ( 19, 19), social ( 35 and 21), maps and locations ( 33 and 6), IDEs ( 17 and 28).
We observe that Xamarin Forum and Stack Overflow share 33 out of 40 (82.5%) of the discovered topics. For instance, there are questions that discuss about “Notifications”: in Stack Overflow those questions have as dominant topic 16 and those from Xamarin Forum has as dominant topic 18. Seven topics from Xamarin and other 7 from Stack Overflow were not related to any topic.
In both Stack Overflow and Xamarin Forum, there are 7 discovered main topics (12.5%) that we could not manually related to any topic of the other Q&A site. For instance, we discovered that one of the main topic from Stack Overflow, 34, is about “Threading”. However, we could not find any topic between those from Xamarin that discusses about threading. On the contrary, we discovered from the Xamarin Forum site a topic ( 15) which discusses about Application Store, but none of the main topics from Stack Overflow is related to it. The lists of unrelated topics are available in our appendix.
Response to RQ 2: What are the main topics discussed about Xamarin in Q&A sites? The top main topics discovered from Stack Overflow and Xamarin forum discuss about User interfaces: Tables, Layouts, Controllers, Forms. The 82.5% of the discovered main topics are present in both Stack Overflow and Xamarin Forum sites.
As in this paper we have replicated the study of Rosen and Shihab (which focuses on general mobile development) over a Xamarin-related Q&A datasets, we now proceed to compare the results (i.e., discovered topics) of both studies. In this way, we are able to detect topics discovered from Xamarin Forum and Stack Overflow that are: a) general, i.e., related to mobile but not particularly to Xamarin technology, and b) closely related to the development of cross-platform mobile applications using Xamarin.
We compared the topics previously discovered in section 5.2.1 with those related to mobile application development presented by Rosen and Shihab Rosen2016MDA . As the authors analyzed questions about three mobile platforms (Android, iOS, and Windows Phone), we denominated their topics as ‘general’, and we reference each as N, where N is the id of the topic.252525As the topics from Rosen2016MDA do not include any ‘id’, we consider that the ids of them correspond to the row numbers of the table that presents the topics. For comparing topics, we used the same manual methodology for matching topics presented in section 5.1.2. Note that, with the goal of carrying out fair comparison, for discovering topics from a corpus of questions, we used the same technique (LDA) and configuration (values for alpha, beta, # iterations and # topics) that Rosen and Shihab used in their work.
We found that 27 topics from Stack Overflow and 30 topics from Xamarin Forum are related to, at least, one general topic from Rosen and Shihab. For instance, topic 35 is related to 8 due to both share the same label: “Social/APIs”. Other topics were also related by using the topics’ words instead of their labels. For instance, topic 21 labeled as “Inputs (Event)” was related to 1 labeled as “Input” (label more general than the previous one), due to both topics have almost the significant words (i.e., those with higher probability): ‘button’, ‘event’, ‘click’, ‘android’ and ‘keyboard’.
Response to RQ 3: How many main topics from Stack Overflow and Xamarin Forum are also topics related to general mobile development discovered by Rosen and Shihab? There are 27 and 30 topics discovered from Stack Overflow and Xamarin Forum (67.5% and 75%, resp.) that are also main topics in question about mobile application. The remaining topics, i.e., 13 (32.5%) and 10 (25%) topics from Stack Overflow and Xamarin Forum, resp., are particularly related to cross-platform app development using Xamarin.
This overlap between Xamarin and general topics makes sense since: a) Xamarin framework produces general code for different platforms, which it could be maintained during the app life-cycle in the same way as any general app developed using traditional tools. b) a portion of a Xamarin application is usually written in general language (Java for Android, Objective-C or Swift for iOS) rather than in the common language (Xamarin uses C# as common language). for example, the Evolve application for Android platform has the 90% of code written in the C# whereas the remaining 10% in general code.262626https://github.com/xamarinhq/app-evolve. Thus, in both cases, when writing or maintaining the portion of general code, a Xamarin developers cold have the same kinds of questions than developers of general apps. c) the general development of mobile apps includes the development of cross-platforms mobile apps.
Now, let us to analyze those main topics that are not include between the general topics, and those general main topics that are not present between the main topics discovered from Stack Overflow and Xamarin Forum.
Let us start discussing the discovered topics from Stack Overflow and Xamarin Forum not reported by Rosen and Shihab. There are 13 and 10 discovered topics from Stack Overflow and Xamarin Forum, respectively, that could not be related to any general topic. For instance, the topic 19 labeled as “MVVM” is one of them. As its label indicates, 19 represents documents that discuss about the design pattern Model–view–viewmodel (MVVM), which was introduced by Microsoft for facilitating the design of multi-tiers application under the Microsoft’s platforms .NET. This pattern is recommend by Xamarin documentation for implementing large and complex applications on Xamarin platform.272727https://developer.xamarin.com/guides/xamarin-forms/enterprise-application-patterns/mvvm/ Another topic is 7 “Packages/Nuget/PCL” which refers to concepts from the Microsoft technology. The first one is "Portable Class Library (PCL)", which is a type of project in the .NET framework to write and build portable .NET assemblies that are then referenced from, for example, cross-platform apps. 282828https://docs.microsoft.com/en-us/dotnet/standard/cross-platform/cross-platform-development-with-the-portable-class-library The second concept is “Nuget”, a package manager for .NET.292929https://www.nuget.org/ In summary, it makes sense to find this topic only Xamarin-related questions from Stack Overflow and Xamarin Forum: it covers questions about reuse of functionality under the .NET platform during the development of cross-platform mobile application using Xamarin framework. Similarly, in Xamarin Forum we discovered two topics, each related to one mentioned concept: 31 “PCL/Library” and “14 Nuget/Package”.
Furthermore, we also discovered topics from Stack Overflow and Xamarin Forum that are not neither related to any general topics nor directly related to Xamarin technology. Two of them are topics 37 and 22 both labeled as “Testing”, which include words such as ‘uitest’, ‘testing’, ’unit’, ‘test’. Note that no topic from Rosen and Shihab discuses about testing: the mentioned words are not presented in any topic.
Between the general mobile topics discovered by Rosen and Shihab about general mobile development, there are 9 out of 40 that are not present in the main topics from Stack Overflow, whereas 10 are not related to any from Xamarin Forum. 7 of them could not be related to any topic from nether Stack Overflow nor Xamarin Forum. They are: 6 labelled with “Phone/Sensors”; 12 “HTML5/Browser”; 16 “App Distribution”; 19 “Processes/Activities”; 20 “Data Structures”; 24 “Data Formatting”; and 30 “Contacts”. For instance, topic 6 includes words that no discovered topic from Stack Overflow and Xamarin Forum has: as ‘time’, ‘alarm’, ‘voice’, ‘speech’, ‘incoming’, ‘number’.
In section 5.2.2 we discovered topics that discuss about Xamarin that were not previously reported by Rosen and Shihab as main topics of general mobile development. Now, we focus on three of those topics to know the main concerts and problematic asked by Xamarin developers. For each of them, we analyze the most relevant comments, according to the methodology presented in Section 5.1.3.
The MVVM (Model-View-ViewModel) is a pattern that helps to separate the business and presentation logic of an application from its user interface (UI). The pattern was introduced by Microsoft for designing apps for its different platforms, including Xamarin, Windows Forms, WPF, Silverlight, and Windows Phone.303030https://msdn.microsoft.com/en-us/library/hh848246.aspx.
The most relevant questions from topic MVVM are related to Mvvmcross.
Mvvmcross is a framework built for easing the development of Xamarin frameworks that proposes, for instance, an easier way to implement the MVVM pattern.313131https://www.mvvmcross.com/
The question from Stack Overflow with highest score from topic MVVM (25 upvotes) is about asynchronous programming:
“How can I use async in an MVVMCross view model?" (id:17187113)323232This number corresponds to the ID of a StackOverflow post. A post with id
Other relevant questions about Mvvmcross focus on the communication between the pattern’s components, e.g., “Passing on variables from ViewModel to another View [..]" (id:10192505), “MvvMCross bind command with parameter [..]" (id:17492742). Moreover, another relevant question is about the differences between Mvvmcross and ReactiveUI.333333https://reactiveui.net/ (ReactiveUI is a model-view-viewmodel framework for all .NET platforms, including Xamarin).
Xamarin.Form is an API of Xamarin framework to build native apps for iOS, Android and Windows completely in C# or in XML (using XAML language).343434https://www.xamarin.com/forms Xamarin.Forms pages represent single screens within an app, and support layouts, buttons, labels, lists, and other common controls. Each page and its controls are mapped to platform-specific native user interface elements. Xamarin.Forms is best for developing apps that require: a) little platform-specific functionality, or b) code sharing is more important than custom UI.
The relevant questions from this topic 11 are about Xamarin.Forms or its components. For instance: “How to correctly use the Image Source property with Xamarin.Forms?" (id: 30850510) is the most viewed question. Moreover, there are relevant questions about the component ListView, e.g., “[..] ListView inside StackLayout: How to set height?" (id: 24598261) and “[..] untappable ListView (remove selection ripple effect)" (id: 35586243). The highest score questions of topic 11 are related to the IDE support of XAML development, such as the code-completion (IntelliSense), e.g.: “Is it possible to use a Xaml designer or intellisense with Xamarin.Forms?" (id: 24158201) (36 up votes). Other relevant questions are also about XAML, e.g.: “[..] ListView ItemTapped/ItemSelected Command Binding on XAML" (id: 24792991).
The topic Library/Portability contains relevant questions related to the architecture of Xamarin-based cross-platform apps. Xamarin provides three alternative architectures that focus on sharing code between cross-platform applications: 1) Shared Projects, 2) Portable Class Libraries (PCL), and 3) .NET Standard Libraries.353535https://docs.microsoft.com/fr-fr/xamarin/cross-platform/app-fundamentals/code-sharing
Relevant questions ask about these architectures, specially about the second one. There are questions asking about clarification of those architectures, e.g., “What is a Portable Class Library?" (id: 5238955), “Portable Class Library vs. library project" (id: 28746609), or questions about the difference of two architectures, e.g., “Xamarin Shared Projects vs Portable class libraries" (id: 23990307). Other relevant questions focus on problematic of using the PCL architecture, for instance: “Unable to resolve assemblies that use Portable Class Libraries" (id: 13871267), or “Portable Class library and reflection" (id: 14061291). Finally, the another group of relevant questions from this topic relates PCL and concurrency (threads): “Update UI thread from portable class library" (id: 14427340), “Thread.Sleep() in a Portable Class Library" (id: 9251917).
In section 3 we applied a technique based on the use of tags for retrieving posts related to Xamarin technology as done by Rosen and Shihab Rosen2016MDA . There is a threat when a developer a) mislabels a post, i.e., tags do not represent the real topic of the posts; or b) omits to label it. Moreover, we used the titles from posts to capture Xamarin-related posts that were not labelled with Xamarin-related tags. A threat to validity is present when the title does not represent the content of the post. To alleviate this threat we manually analyzed a sample of the retrieved posts and verified the concordance between the title and post’s content.
We applied the LDA algorithm for modeling topics. The input of LDA is a corpus of document (See Section 5.1.1), where each document contains information of a single post. We decided to, as done by the study we replicated, to only consider the title of the question due to, according to them: 1) titles summarize and identify the main concepts being asked in the post, 2) the body of the question adds non-relevant information rather than the main idea being asked about, and 3) we are interested in what issues the developers are asking about and adding the answer posts would not make sense Rosen2016MDA .
LDA needs 4 configuration arguments (alpha, beta, number of topics and number iterations). Choosing optimal values for those arguments is a difficult task, so, to alleviate this threat, as done by Hindle2015TMS ; Linares-Vasquez2013EAM ; Rosen2016MDA , we tried different configurations to choose, to our best judgment, the best configuration. The selection criteria used by those works were: a) inspection at the average dominant topic probabilities given by the resulting model Rosen2016MDA , and b) assurance that topics do not have much overlap in top terms, are not copies of each other, and are not share excessive disjoint concepts or ideas Hindle2015TMS . In our experiment we decided to use the same configuration that the study we replicated from Rosen and Shihab. Reusing the configuration allowed us to compare the topics discovered from Xamarin-related questions against those by Rosen and Shihab from mobile-related questions.
As done by the mentioned related works (e.g., Rosen2016MDA ; Linares-Vasquez2013EAM ), we manually analyzed the results produced by LDA for labeling each topic with a human readable label, based on a set of words from the topic and their probabilistic. To the best of our knowledge, there is no tool for automatically labeling topics. To alleviate the threat of mislabeling topics or mismatching of topics from different sources, we have carried out those tasks using a peer-reviewed process and the results are publicly available.
One potential threat is that sources of information used for studying Q&A related to Xamarin technology are not representative. To mitigate that threat we selected two sources: Stack Overflow and Xamarin Forum. The former is one of the most popular and used Q&A site by software developers and latter is the official forum of the Xamarin technology.
For future work, we plan to continue exploring the two datasets of Xamarin-related questions. Future research direction could focus on:
By inspecting the relevant questions of this topic, we notice that many of their accepted answers include a link to the official Xamarin documentation web site. For instance, the mentioned most viewed post (id: 30850510) has an accepted answers that cites a page from the official documentation as source of its response. That finding triggers some research questions for future work: a) How many posts from Stack Overflow link to the official documentation in their questions/answers?, b) How much do (Xamarin) developers ask questions which solution are already included in the documentation?. Moreover, we also observe that answers on Stack Overflow also include links to the Xamarin Forum, the official Q&A site of Xamarin. For example, the question id 24598261 from Stack Overflow has an accepted answer based on a post from the Xamarin Forum (id: 66248).363636https://forums.xamarin.com/discussion/comment/66248 Other papers have focused on combining different sources of information (e.g., Zagalsky2016:RCurates ; Lee2017 ; Wang2017Linking ; Ye2017DK ; Venkatesh2016 ). However, to our knowledge, nobody has linked questions from two Q&A sites nor focused on Xamarin source of information.
As we discussed on Section 5.2.3, relevant questions are about mobile apps architectures and frameworks. Future work can study and compare apps developed using different architectures. For instance, a research could focus on analyzing which is the best option between the architectures Shared Projects and Portable class libraries in term of, for instance, the expertise the developers needs to develop and maintain apps with those architectures, or the ease of evolving an application according to the upgrade of the mobile platforms. Moreover, another possible research for future work could study the benefices and disadvantages of developing and maintaining cross-platforms apps that used this frameworks (such as Mvvmcross) w.r.t. apps developed without them i.e., using only the Xamarin framework.
Cross-compiler mobile development frameworks allow mobile developers to create native cross-platform mobile applications with the promise of simplifying the development and maintenance phases by reusing code source across the different platforms. To study and characterize the development and maintenance of cross-platforms apps using cross-compiler frameworks, in this paper we present two datasets with questions and answers (Q&A): one with Q&A from the official Xamarin Forum, the other with Xamarin-related Q&A from Stack Overflow. We then use them in a replication study of Rosen2016MDA for discovering the main discussion topics from questions related to Xamarin technology. We compared the discovered topics against those main topics related to mobile development discovered by Rosen2016MDA . We found that a portion of the topics from the Xamarin-related questions were not previously identified when discussing about general mobile applications. To promote more studies about Xamarin and cross-platform mobile frameworks, the two Xamarin-related Q&A datasets are publicly available.
Mallet: A machine learning for language toolkit.2002.
2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pages 357–362, May 2016.