1 Introduction
The number of high school students who use ecigarette increased by 78% resulting in 3.05 million and the number of middle schoolers who use ecigarette increased by 48% bringing the total up to 570,000 between 20172018 2018NYTSData. This brings the total, according to 2018 National Youth Tobacco Survey (NYTS), to about 3.6 million middle and high school students using ecigarette 2018NYTSData. The problems with youths using ecigarette is that exposure to nicotine at an early age can cause addiction and harm the developing brain ECIGFACT. Also, Bunnell et al. bunnell2015intentions found that ecigarette use has associations with smoking cigarettes. Due to the fact that ecigarette use is relatively a recent issue, there are only a small number of research done regarding the use of ecigarette for adolescents dawkins2013vaping; palazzolo2013electronic; miech2017cigarette; raloff2015dangers; dutra2014electronic; kosmider2014carbonyl; littlefield2015electronic; pepper2017risk; pepper2017risk; huey2017escape; schneider2015vaping; mccabe2017associations; penzes2018bidirectional; arnold2014vaping; miech2017kids; schripp2013does; westling2017electronic; mccabe2017smoking; tsai2018reasons. Dawkins et al. dawkins2013vaping discusses how vaping is used to decrease the dependence on cigarettes and that ecigarettes are better than nicotine replacement therapy. Some examples of side effects of vaping were throat irritation and mouth irritation but only a small percent of people had these effects. In another research, Palazzolo palazzolo2013electronic states that people are not sure of the harmful results of vaping due to lack of empirical data, but there were some cases of people negatively affected. Also, McCabe et al. mccabe2017smoking states that students who smoke ecigarettes early on tend to smoke as they get older, so it is a gateway drug to other worse drugs. Chapman chapman2014cigarettes states that vaping could have a positive impact on people to transition from smoking cigarettes to vaping, which helps people to move away from smoking. However, a negative impact for nonsmokers would be to move to vaping, which could lead to other more harmful drugs. Another research carried by Miech miech2017cigarette states that people who vaped were 4 times more likely to smoke cigarettes, and vaping does not predict the decrease in smoking. In the research by Dutra dutra2014electronic, it is stated that adolescents who have vaped before have a higher chance of smoking cigarettes. Also, most of the vapers smoked cigarettes as well. According to the research by huey2017escape, more than 250,000 teens who never smoked a regular cigarette have vaped and those who did are twice more likely to smoke regular cigarette in the future.
It is evident from the multiple research papers that vaping leads to smoking cigarettes later on. However, there is a methodological limitation among these researches. Most of the research utilized statistical methods to analyze the relationship between causes and effects for ecigarette and/or smoking habit. For example, one researchbunnell2015intentions discusses the association between ecigarette use and smoking intention among US youths who have never smoked a traditional cigarette. The article implemented chisquared tests, multivariate logistic regression, and other models to analyze the data. Although analyzing data using logistic regression model was appropriate for their case of study, it may not be the most practical in terms of antismoking campaign, preventing smoking habit. Using prediction models developed from machine learning (ML) algorithms would be more useful, because it can predict whether a person will have the intention to smoke cigarette or not depending on his or her information about ecigarette use as well as race, ethnicity, gender, and the environment. Additionally, a prediction can directly help individuals stay away from the path of smoking cigarettes, which is more helpful than simply analyzing data.
In this paper, we use the NYTS results from 2018 and construct multiple prediction models (e.g.,Gradient Boosting Classifier, and Decision Tree Classifier) that can predict whether a person will have an intention to smoke cigarettes or not. After data analysis, Gradient Boosting Classifier, one of the prediction models, had the highest accuracy of 93% out of all the models tested. We divide the data into two sets: neversmokers and smokers of cigarette. The group of neversmokers were analyzed to find the best fitting model to predict the intention to smoke cigarettes for both ecigarette smokers and nonecigarette smokers. In addition, we create a website involving the Gradient Boosting Classifier model in order to allow the public to input factors (e.g., sex, race, and age) and receive a prediction of whether or not they will have a high intention to smoke cigarettes or not. This will give the general public more awareness of their position as to whether or not they will smoke cigarettes and possibly steer away from the path of smoking cigarettes.
Consequently, there are two contributions in this paper: (i) found the bestfitting model to predict smoking intention from the NYTS data, and (ii) create a website to help students, especially ecigarette smokers, be able to prevent ecigarette use due to possible chance of smoking cigarette.
This paper is organized as following: Section 2 introduces background of ML techniques; Section 3 presents analysis methods; Section 4 explains prediction system for electronic cigarette usage based on smoking intention; Section 5 includes the conclusion and future study.
2 Background of Machine Learning Techniques
In this section, we introduce the CRISPDM (CrossIndustry Standard Process for Data Mining) process model wirth2000crisp. The CRISPDM process model is a comprehensive and simple method for data mining and analysis. Throughout Section 2.1, we describe the six steps of the CRISPDM process model. In addition, five machine learning algorithms used in this research are introduced in Section 2.2.
2.1 CrispDm
CRISPDM wirth2000crisp (see Figure 1) is a crossindustry standard process for data mining and analysis. CRISPDM helps plan out a solid method of running a data mining project and directs at making data mining and analysis projects more efficient, cheaper, and faster. CRISPDM is comprised of six steps: (1) business understanding, (2) data understanding, (3) data preparation, (4) modeling, (5) evaluation, and (6) deployment.
2.1.1 Business Understanding
Business Understanding is introducing the objectives of the project and understanding what the goals will be. Knowing the goals and the objectives, a data mining problem is formed and a plan to achieve these goals is made. Figure 1 is the CRISPDM process and the first box covers the business understanding step.
2.1.2 Data Understanding
Data Understanding includes finding an initial data set, familiarizing with the contents in the data set, and making observations from the data set in order to create an hypothesis. The initial data set must be reliable and accurate, which means that it is not outdated and has correlation within itself. The second box from Figure 1 indicates the data understanding step.
2.1.3 Data Preparation
Data Preparation is the process of converting the initial, or raw, data set into the final data set. This step could alter the data to become more applicable to achieve the goal. The third box from Figure 1 covers the data preparation step.
2.1.4 Modeling
In Modeling
, artificial intelligence models are created using various machine learning algorithms. The data set prepared from the data preparation step are applied to this modeling step.
2.1.5 Evaluation
Evaluation is reviewing performance (e.g., accuracy) of models and deciding a best model. This is the process where the models are improved based a goal performance. Success criteria can be based on speed of algorithm, memory usage, or prediction accuracy. According to analysis purpose, different performance metrics are used. For example, for regression problem, the score can be used. For classification problem, an accuracy () of Equation 1, the sum of correct classification divided by the total number of classifications, can be used. Since this paper uses classification, some criteria for classification are introduced in the following.
(1) 
where denotes the number of class and denotes the total number of the case in which values of th prediction and th observation are identical.
In this study, we used the fourperformance metrics Acc (Equation 1), Precision (Equation 2), Recall (Equation 3), and F1score (Equation 4). These metrics can be easily calculated by using the following four indicators.

True Positive (TP): The amount of the observed positive values which were correctly predicted

False Positive (FP): The amount of the observed positive values which were wrongly predicted

False Negative (FN): The amount of the observed negative values which were wrongly predicted

True Negative (TN): The amount of the observed negative values which were correctly predicted
These four indicators can be used to define the equations of Precision and Recall as shown.
(2) 
(3) 
Precision is commonly used to measure the influence of False Positive, while Recall is used to measure the influence of False Negative. F1score is defined as the weighted average of Precision and Recall.
(4) 
Precision, Recall, and F1score have a score of one when the prediction is perfect. For the total prediction failure, they yield a score of zero.
2.1.6 Deployment
Deployment is the process of organizing the information gained, such as the model, so that it is understandable for the model user. This step is carried out by the user rather than the analyst, so it is essential for the user to understand how to use the models. The result of this step is a final report. The last step in Figure 1 represents the deployment step.
2.2 Machine Learning Models
In this subsection, we introduce machine learning models that this paper utilizes. Since Linear Regression model is a basic model in machine learning, we start with this model. Then, other models (i.e., Logistic Regression Classifier,
Gaussian Naive Bayes Classifier
, Decision Tree Classifier, Random Forest Classifier, and Gradient Boosting Classifier) are introduced.2.2.1 Linear Regression Model
Linear Regression model represents a linear trend of data and is used to predict a continuous response using predictor variables . We define Linear Regression model formally.
(5) 
where denotes weights (or coefficients) {, , …, }, denotes features {, , …, }, and denotes the number of weights.
In these settings, machine learning of Linear Regression model is to estimate better weights (or parameters)
which fits data. A simple approach is a least squares approach in which the parameters are found by minimizing a residual sum of squares (RSS).(6) 
where denotes an ith residual error for ith data (or row) consisting of a pair of and denotes the number of data.
Using RRS (Equation 6), a gradient descent algorithm can be applied to find the best weights fitting training data. The process of the gradient descent algorithm follows: (1) Initial weights are randomly selected, (2) The weights are updated according to
(7) 
where denotes a previous weight of and is a learning rate, (3) if , where denotes a threshold for the convergence, then stop.
2.2.2 Logistic Regression Classifier
Logistic Regression Classifier is an extension of the Linear Regression model. Especially, the response in Logistic Regression is discrete (e.g., True and False). In other words, given predictor variables , a discrete value is predicted. A simple case of the Logistic Regression is a binary Logistic Regression model whose output is Boolean. The binary Logistic Regression model can be defined formally as shown Equation 8.
(8) 
where is a natural exponential function and is identical to Equation 5.
For a general Logistic Regression, whose output is multiple, a softmax function can be applied to represent a probability distribution for multiple classes. Equation
9 shows the general Logistic Regression.(9) 
where denotes the number of classes.
For machine learning, the Logistic Regression model requires a loss function to measure similarity between the leaned model and data, like the residual sum of squares in the Linear Regression model. Equation
10 shows the loss function of the general Logistic Regression.(10) 
where is a function return one is true, zero otherwise.
2.2.3 Gaussian Naive Bayes Classifier
Naive Bayes (NB) Classifier maron1961automatic
is a classification method based on probability theory, by which one can classify class labels using (discrete or continuous) inputs. Basically, NB represents a joint distribution for input random variables (
) and a class random variables () on the assumption of conditionally independence of given . NB can be written as Equation 11.(11) 
where is the number of input random variables and denotes a class label in .
Gaussian Naive Bayes Classifier uses continuous inputs under the Gaussian (or Normal) distribution assumption.
(12) 
where and
denote the mean and standard deviation of the input
for the class label , respectively.For machine learning of Gaussian Naive Bayes Classifier, Maximum Likelihood Estimation (MLE), Maximum a Posteriori Probability (MAP), and Bayesian approach can be used murphy2012machine.
2.2.4 Decision Tree Classifier
Decision tree Classifier quinlan1986induction; morgan1963problems; safavian1991survey; magerman1995statistical consists of a tree structure containing a set of hierarchical nodes (Root Node, Internal Nodes, and Leaf Nodes). The root node and the internal nodes represent features or variables, while the leaf nodes denote values of a target variable. The main challenge of machine learning for decision tree is to construct these nodes and their hierarchy in a decision tree, so that it can effectively classify classes using input data of predictor variables. The basic approach, called ID3 (Iterative Dichotomiser 3), to build a decision tree was introduced by quinlan1986induction. For example, suppose that there is a target variable with two classes (Positive and Negative), the expected information for this can be written by . Note that the following equations are taken from quinlan1986induction.
(13) 
where and denote the numbers of positive and negative cases, respectively. The expected information for a parent node of the target variable can be derived as the weighted average.
(14) 
where denotes the number of the parent node values and denotes the expected information for the th value of the parent node. The information for the node can be obtained as the following equation.
(15) 
Thus, machine learning for decision tree is to find a tree on maximizing the information for all the root and internal nodes
2.2.5 Random Forest Classifier
A set of ML models can often have a better performance than the use of a simple ML model. Such integration of ML models is called an ensemble learning. Random Forest Classifier ho1995random; ho2002data uses the ensemble learning by forming a set of decision trees and resulting in an output which are voted from each decision tree. Random Forest draw random samples from training data and learn a decision tree model from the sample data, so that it can have a set of decision trees (i.e., forest). After machine learning, in the prediction (or application) stage, the class voted by the majority of learned decision trees is chosen as the final result. The following shows an equation for such majority voting.
(16) 
where is a single decision tree and the function yields the output as the class label that is the most frequent class among the set of classification results.
2.2.6 Gradient Boosting Classifier
Gradient Boosting Classifier breiman1996arcing uses an ensemble model consisting of a set of simple models (e.g., a decision tree stump, a tree containing only one root and its immediately connected leaf nodes). By adding such simple models, the result ensemble model can be sequentially improved and finally fitted to data. In other words, after applying a simple model, samples which are classified by it are reused to another simple model. And then this process is repeated until convergence (or achieving better predictive performance). Gradient Boosting Classifier is a generalized method of boosting (e.g., schapire1990strength; freund1996experiments) by using gradient of a loss function.
3 Analysis Methods
In this section, we introduce the specific processes of our analysis and the results from the analysis.
3.1 Business Understanding
In this paper, our goal has two folds of objectives. (1) The first goal is to determine the most accurate machine learning model between Linear Regression, Gaussian Naive Bayes, Decision Tree, Random Forest, and Gradient Boosting. (2) The second is to construct a public web that will allow adolescents to know whether they will have the intention to smoke or not.
We utilize the data from NYTS for 2018 in order to construct ML models, and the models are analyzed to choose the most accurate one in predicting the intention for a nonsmoker to smoke cigarettes.
3.2 Data Understanding
We found reliable data from the National Youth Tobacco Survey 2018 2018NYTS_DataSet, which is a nationwide survey in US for middle (grades 68) and high school (912) youth’s tobaccorelated beliefs, attitudes, behaviors, and exposure to pro and antitobacco influences. NYTS implements a threestage cluster sampling design in order to get a nationally representative data for students from grades 612 in all 50 states and District of Columbia NYTS. However, the data covers only the youth who are currently attending middle and high school, which means that the results would be inapplicable for youths who are not attending middle and high school.
There are a total of 88 questions in the survey (Table 8), and we examined all of the questions and picked the questions that were necessary making the model to predict the whether a person has the intention to smoke or not. Table 1 describes the format of the questions and the answer choices as written on the questionnaire. For example, in the first row of Table 1, Q1 represents question 1 and the next column describes what the actual question is, which is "How old are you?" The possible answer choices is displayed on the next column and they are ranged from ages 919 years old.





Q1  How old are you?  9, 10, …, 19 years old  
Q2  What is your sex?  Male/Female  
Q3  What grade are you in?  6, 7, …, 12, ungraded or other grade  
Q4  Are you Hispanic, Latino, Latina, or of Spanish origin?  Select one or more  
…  …  …  
Q88  Because of a physical…making decisions?  No/Yes  
3.3 Data Preparation
Our original data was taken from the National Youth Tobacco Surveys 2018. This data set was compiled of questions represented as questions Q and answers represented by numbers (e.g., 1, 2, 3, and 4) and words (e.g., Yes and No). We filtered every question to prepare for machine learning. For example, any null in the answer meant that the question was not answered. We went through the process of replacing all the nulls with 0’s, which represents the unanswered choices.
The next process involved classifying related data. When we first found the data, it was a set of 20189 rows x 195 columns. As mentioned in Section 1, our goal is to construct a prediction model construct a public web. In order to achieve this goal, we needed to divide the data into two groups: never smoked cigarette users and ever smoked cigarette users. Then, all the rows containing individuals who have ever smoked in their lives were deleted, because we only need to analyze the youths who never smoked cigarettes. The purpose of the first split of the cigarette and noncigarette users is to form a predictive model. Next, we extracted our target question and the questions pertaining to it. There are 88 questions within the survey we used, but not every question was related to our prediction. After examining all the survey questions, we chose specific questions to be used, since not all the questions were applicable to our goal and redundancies and indirect correlations were present. Out of 88 questions, 47 questions were used, and specifically questions 15, 16, 17, 43, 44, and 45 "do you think that you will try a cigarette soon?", "do you think you will smoke a cigarette in the next year?", "if one of your best best friends were to offer you a cigarette, would you smoke it?", "do you think that you will try smoking tobacco in a hookah or waterpipe soon?", "do you think you will smoke tobacco in a hookah or waterpipe in the next year?", "if one of your best friends were to offer you a hookah or waterpipe with tobacco, would you try it?", respectively, was our target questions, because they are all related to someone’s intention to smoke a cigarette, no matter how small the intention. We decided that the answer choices for target questions (definitely yes, probably yes, probably no, definitely no) could be simplified to two answer choices by considering definitely yes and probably yes as "yes" and definitely no and probably no as "no".
3.4 Modeling
We used five ML algorithms to generate each ML model using a training data and evaluated the accuracy for each one of the models. The models include (1) Decision Tree Classifier, (2) Gaussian NB Classifier, (3) Logistic Regression Classifier, (4) Gradient Boosting Classifier, and (5) Random Forest Classifier. Out of the whole data set in Subsection 3.3, we assigned the training data set, which is 80 percent of the original data, and the test data set, which is 20 percent of the original data. The training data set is used to learn the prediction model while the test data set is used to test the learned model by evaluating the accuracy for each model. After examining the predicted accuracy, we can evaluate the model that showed the highest accuracy, which can be seen in Subsection 3.5.
3.5 Evaluation
Table 2 to Table 6 shows the results of the prediction accuracy that was taken from the five machine learning models. Note that the terms precision, recall, and F1score that are used for the table are explained in Subsection 2.1.5. The first column for each table lists the answer choices for question 16 "Do you think you will smoke a cigarette in the next year?". This step is to see how the factors are related to our target question, Q16, and we used machine learning models to find which factor affects people to answer Q16 which is intention to smoke cigarettes.







Yes  0.60  0.42  0.49  201  
No  0.94  0.97  0.96  2049  
Macro Avg  0.77  0.70  0.72  2250  
Weighted Avg  0.91  0.92  0.92  2250 







Yes  0.25  0.77  0.38  201  
No  0.97  0.77  0.86  2049  
Macro Avg  0.61  0.77  0.62  2250  
Weighted Avg  0.91  0.77  0.82  2250 







Yes  0.66  0.30  0.41  201  
No  0.93  0.98  0.96  2049  
Macro Avg  0.80  0.64  0.69  2250  
Weighted Avg  0.91  0.92  0.91  2250 







Yes  0.65  0.35  0.46  201  
No  0.94  0.98  0.96  2049  
Macro Avg  0.79  0.67  0.71  2250  
Weighted Avg  0.91  0.92  0.91  2250 







Yes  0.71  0.37  0.49  201  
No  0.94  0.99  0.96  2049  
Macro Avg  0.83  0.68  0.73  2250  
Weighted Avg  0.92  0.93  0.92  2250 
The precision, recall, F1score, and cases results are all shown in the tables above. The Gradient Boosting Classifier showed the most accurate results out of all the other models. Although the F1score of the answer choice "Yes" (0.49) of the Gradient Boosting Classifier is one of the highest models in the F1score, it does not show a high enough value. Possible reasons for the low F1score is that the NYTS data was insufficient and limited to create an accurate model. In addition, the ML models might not have been the best choices and there might be better algorithms to fit the NYTS data. Finally, the questions, or x variables, given and chosen might not be the most fitting ones and there are better options. Potential questions, or x variables, that could have been added to the NYTS data are detailed family history, respondent’s location, and possibly the personality or qualities of the respondent.
Table 7 and Figure 2 shows the accuracy of training score and testing score of each machine learning model.








Training Score  0.9298  0.7695  0.9352  0.9298  0.9366  
Test Score  0.9226  0.7728  0.9235  0.9248  0.9306 
The training scores resulted from Cross Validation (CV). The training scores for Decision Tree, Gaussian NB, Logistic Regression, Random Forest, and Gradient Boosting were 0.9298, 0.7695, 0.9352, 0.9298, and 0.9366, respectively. Also, the test scores were 0.9226, 0.7728, 0.9235, 0.9248, and 0.9306, respectively.
Figure 2 has the xaxis represented as the ML models and the yaxis represented as the training score, test score, macro average, and weighted average. The figure shows that Gradient Boosting classifier has the highest accuracy among others.
3.6 Deployment
Our deployment process is through a public web page in which users (especially adolescents) enter their information for predicting their possibility of smoking cigarettes. The public web page (http://nyts.pythonanywhere.com) gives the users instant access and quick response for the predicted result. The architecture and structure of the web page is introduced in Section 4.
4 Prediction System for Future Smoking
The best prediction model (i.e., Gradient Boosting) introduced in Section 3 is utilized in the public web page (http://nyts.pythonanywhere.com) on the purpose of antismoking campaign for teenagers. This public web aims to inform teenagers of the possibility of future smoking, so that it may be helpful to prevent bad consequences of adolescents’ health. In this section, we introduce the public web we developed.
Figure 3 shows a sequence diagram representing about how the user (e.g., ecigarette or nonsmoking teenagers) interacts with the website. The user, web server, and ML model reasoner represented by the boxes denotes main entities. The ML model reasoner contains the Gradient Boosting model that was learned in Section 3 using the NYTS data. The path for each entity goes from top to bottom and the vertical rectangle means the life cycle of the entity. The arrow lines across the vertical rectangles represent event flows. The following list summarize the event flows.

A user enters the web server.

The web server shows 47 questions to the user.

The user inputs all answers for the questions.

The web server receives the answers from the user and sends them to the ML model reasoner.

The ML model reasoner performs prediction using the answers and sends the results back to the web server.

The web server displays the prediction results using a chart.
The web server was developed in the Python environment. We used Flask v1.1.1, a Pythonbased web server, and "pythonanywhere.com" for a web hosting. The Gradient Boosting model was learned using ScikiLearn v0.21.3 and was stored using Pickle python module to serialize and deserialize for the learned model. For reasoning of the model, again ScikiLearn v0.21.3 was used. The web page example can be found in Figure 4.
5 Conclusion
Ecigarette use has increased among adolescents. This is a worldwide problem, because it has been stated in many researches mentioned in the introduction that ecigarette use can cause future use of cigarettes. Since ecigarette is a recent rising issue, there is little research done on this topic, compared to smoking cigarettes. Even among the researches done, there is a lack of researches implementing prediction models, which are more practical in preventing adolescents from using (e)cigarettes. Thus, we researched using the 2018 NYTS data and developed multiple prediction models to predict a adolescent’s intention to smoke cigarette.
The most accurate prediction model was Gradient Boosting Classifier with an overall accuracy of 93%. This model was applied in the website we designed to allow the public to input their information in respect to tobacco products, including ecigarette, cigarette, and cigar. With this information, the algorithm can predict the respondee’s probability of future of smoking. This will help the public become more aware about certain factors in their lives and be attentive about their drug use or how their environment can affect their intention to smoke cigarettes.
Further research could include a wider range of ages, since our research is mainly focused on adolescents rather than adults. In order to improve the accuracy of the prediction model, it is essential to increase the amount of data or choose better, more fitting, variables.
no
Appendix A
a.1
These are the list of questions that were used for data analysis and what they were used for. The "Number" column specifies the question number from the 2018 NYTS questionnaire. The "Question" column lists out the exact question used from the questionnaire and "Where to Use" column specifies the purpose of using the question. The term "Predictor Variable" means that the question was considered a factor, or x variable, in predicting the intention for a person to smoke. "Data Selection for NonSmoker" means that the question was utilized to separate the data into nonsmoker and smoker groups. "Data Selection for Smoking Intention" means that the question was used as a target question, or a y variable, because the goal is to predict intention to smoke cigarette. "Data Selection for NonESmoker" means that the question was used to separate the nonecigarette smokers from the ecigarette smokers.
Number  Question  Where to Use  

Q1  How old are you?  Predictor Variable  
Q2  What is your sex?  Predictor Variable  
Q3  What grade are you in?  Predictor Variable  
Q4  Are you Hispanic, Latino, Latina, or of Spanish origin?  Predictor Variable  
Q5  What race or races do you consider yourself to be?  Predictor Variable  
Q6  Have you ever been curious about smoking a cigarette?  Predictor Variable  
Q7  Have you ever tried cigarette smoking, even one or two puffs? 


Q15  Do you think that you will try a cigarette soon? 


Q16  Do you think you will smoke a cigarette in the next year? 


Q17 



Q18 

Predictor Variable  
Q19 



Q23 

Predictor Variable  
Q24 



Q27 

Predictor Variable  
Q28 



Q29 

Predictor Variable  
Q30 

Predictor Variable  
Q31 

Predictor Variable  
Q32 

Predictor Variable  
Q33 

Predictor Variable  
Q34 

Predictor Variable  
Q35 

Predictor Variable  
Q36 

Predictor Variable  
Q37 

Predictor Variable  
Q38 

Predictor Variable  
Q39 



Q43 



Q44 



Q45 



Q59 



Q61 

Predictor Variable  
Q62 

Predictor Variable  
Q63 

Predictor Variable  
Q64 

Predictor Variable  
Q65 

Predictor Variable  
Q66 

Predictor Variable  
Q67 

Predictor Variable  
Q68 

Predictor Variable  
Q69 

Predictor Variable  
Q70 

Predictor Variable  
Q71 

Predictor Variable  
Q72 

Predictor Variable  
Q73 

Predictor Variable  
Q74 

Predictor Variable  
Q75 

Predictor Variable  
Q76 

Predictor Variable  
Q77 

Predictor Variable  
Q78 

Predictor Variable  
Q79 

Predictor Variable  
Q80 

Predictor Variable  
Q81 

Predictor Variable  
Q82 

Predictor Variable  
Q83 

Predictor Variable  
Q84 

Predictor Variable  
Q85 

Predictor Variable  
Q86 

Predictor Variable  
Q87 

Predictor Variable  
Q88 

Predictor Variable  
a.2
This is a glimpse of the website. After answering total of 47 questions and clicking submit, the bubble will fill up and show the probability of a user to smoker in the future.
References
yes
Comments
There are no comments yet.