An influence-based fast preceding questionnaire model for elderly assessments

11/22/2017 ∙ by Tong Mo, et al. ∙ ibm Peking University 0

To improve the efficiency of elderly assessments, an influence-based fast preceding questionnaire model (FPQM) is proposed. Compared with traditional assessments, the FPQM optimizes questionnaires by reordering their attributes. The values of low-ranking attributes can be predicted by the values of the high-ranking attributes. Therefore, the number of attributes can be reduced without redesigning the questionnaires. A new function for calculating the influence of the attributes is proposed based on probability theory. Reordering and reducing algorithms are given based on the attributes' influences. The model is verified through a practical application. The practice in an elderly-care company shows that the FPQM can reduce the number of attributes by 90.56 as the Expert Knowledge, Rough Set and C4.5 methods, the FPQM achieves the best performance. In addition, the FPQM can also be applied to other questionnaires.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Questionnaires have been widely used in various fields, including elderly assessments. Several questionnaires have been developed and are currently in extensive use to assess health-related quality of life (HRQOL)[20]. Aging is an increasingly serious social phenomenon in China, and there is a strong need for care services. Assessments of the elderly are essential for providing personalized services. Existing assessment methods are usually based on the Barthel Index[18] and the national industry standard for the ability assessment of elderly adults[28]. Many investigation attributes are needed to systematically obtain information about the elderly. The elderly are asked about multiple attributes in succession. These assessment methods are inefficient, and the order of the attributes is not reasonable. When there is a relationship between attributes, some unknown attributes can be predicted by known attributes, and a more reasonable order should be determined[8, 9, 19].

Classical Test Theory (CTT), Rasch Analysis (RA), decision rule, and experts[22, 10, 15, 25, 4, 21]

have been applied to reduce the length of health questionnaires. However, actually, these removed attributes have provided additional information. A reasonable order of these attributes of the questionnaires can also be considered. Correlation, multiple regression, factor analysis, cluster analysis and structural equation modelling, and hierarchical multiple regression

[7, 2, 24, 27]

can be used to determine the relationships among the attributes of health questionnaires. Certain attributes can indeed be predicted by other attributes using hierarchical logistic regression, correlation analysis, and binary logistic stepwise regression

[14, 1, 3, 11, 6]. However, only one attribute, not multiple attributes simultaneously, could be predicted in one study. Meanwhile, the involved attributes in each study are incomplete.

A solid mathematical definition of the question is given. The fast preceding questionnaire model (FPQM) is proposed to solve the problem in five steps. First, the influence of one attribute on all other attributes is defined and calculated. Second, we traverse every investigation attribute and chooses the attribute with the largest influence as the best attribute to split. Third, we create the FPQM with the best attribute. We traverse every value of the attribute, and the sub-dataset corresponding to the value can be used to obtain the sub-model recursively. Then, the sub-model is attached to the full model, and the full model is obtained when the recursion ends. Fourth, the created FPQM is used for the real investigation. The value is directly asked for at the beginning of the real investigation because there is no prior information about the respondent. Certain investigation attributes can be inferred after sufficient information has been accumulated. At that time, the confidence level is greater than the given threshold, which means that questions about the attribute do not need to be asked. Fifth, we calculate the evaluation metrics and evaluate the model FPQM.

This paper is organized as follows. Section 2 reviews related work. The fast preceding questionnaire model (FPQM) is introduced in Section 3. First, a solid mathematical definition of the question is given. Then, an influence calculation formula, the best attribute to split choosing algorithm (BASCA), the fast preceding questionnaire model creating algorithm (FPQMCA), the model used for real investigation algorithm (MURIA), and the model evaluation algorithm (MEA) are presented. Section 4 shows the experimental results, therein presenting the experimental data;, evaluation metrics; the overall results of the FPQM; the comparison experiment with Expert Knowledge, Rough Set, and C4.5; and the factor analysis, which includes the number of elderly, number of investigation attributes, and threshold. Section 5 concludes the paper.

2 Related work

2.1 Attribute reduction

Luis Prieto [22] presents a parallel reduction in a 38-attribute questionnaire, the Nottingham Health Profile (NHP), to empirically compare Classical Test Theory (CTT) and Rasch Analysis (RA) results. The CTT results in 20 attributes (4 dimensions), whereas RA results in 22 attributes (2 dimensions). Moreover, the attribute-total correlation ranges from 0.45-0.75 for NHP20 and from 0.46-0.68 for NHP22, while the reliability ranges from 0.82-0.93 and from 0.87-94, respectively.

Ephrem Fernandez [10] reduces and reorganizes the McGill Pain Questionnaire (MPQ) using a 3-step decision rule for affective and evaluative descriptors of Pain. With a minimum absolute frequency of 17 and a minimum relative frequency of 1/2 as the threshold values, the words of the MPQ are reduced from 78 to less than 20 on average. This reduction leads to a negligible loss of information transmitted. Moreover, Wasuwat Kitisomprayoonkul [15] develops the Thai Short-Form McGill Pain Questionnaire (Th-SFMPQ).

RC Rosen [25] develops an abridged five-attribute version (IIEF-5) of the 15-attribute International Index of Erectile Function (IIEF) to diagnose the presence and severity of erectile dysfunction (ED). The five attributes are selected based on the ability to identify the presence or absence of ED and on adherence to the National Institute of Health’s definition of ED. The IIEF-5 possesses favorable properties for detecting the presence and severity of ED.

X. Badia [4] achieves a qualitative and quantitative reduction in the 179 expressions of the bone metastasis quality of life questionnaire (BOMET-QOL) with respect to clarity, frequency and importance with 15 experts. This phase, which is performed in two steps, results in the 35-attribute version of the BOMET-QOL. The initial reduction yields a 25-attribute questionnaire via factorial analysis. Similarly, the BOMET-QOL-25 is reduced to an integrated version of 10 attributes through a sample of 263 oncology patients. The BOMET-QOL is an accurate, reliable and precise 10-attribute instrument for assessing HRQOL.

Tamar E.C. Nijsten [21] tests and reduces Skindex-29 to Skindex-17 using Rasch Analysis. The Rasch Analysis of the combined emotion and social functioning subscale of Skindex-29 results in a 12-attribute psychosocial subscale. A total of five of the seven attributes are retained in a symptom subscale. Classical psychometric properties, such as the response distribution, attribute–rest correlation, attribute complexity, and internal consistency, of the two subscales of Skindex-17 are at least adequate. Skindex-17 is a Rasch-reduced version of Skindex-29, with two independent scores that can be used for the measurement of health-related quality of life (HRQOL) for dermatological patients.

[22, 10, 15, 25, 4, 21] remove some attributes directly and develop qualitative and quantitative reductions in questionnaires about health using Classical Test Theory (CTT), Rasch Analysis (RA), decision rules, or experts. These questionnaires include the Nottingham Health Profile (NHP), McGill Pain Questionnaire (MPQ), 15-attribute International Index of Erectile Function (IIEF), bone metastasis quality of life questionnaire (BOMET-QOL), and Skindex-29. However, these removed attributes can provide additional information, and their values can be predicted by the remaining attributes with reduction methods. Meanwhile, a more reasonable order of these attributes is not considered.

2.2 Relationships among attributes

Alexandra-Lelia Dima [7] studies the interrelations between acceptance, emotions, illness perceptions and health status. The confirmatory analysis (employing a variety of statistical procedures, from correlation to multiple regression, factor analysis, cluster analysis and structural equation modelling) largely confirms the expected relations within and between domains and is also informative regarding the most suitable data reduction methods. An additional exploratory analysis focuses on identifying the comparative characteristics of acceptance, emotions, and illness perceptions in predicting health status metrics.

Arnow, Bruce A [2]

provides estimates of the prevalence and strength of association between major depression and chronic pain in a primary care population and examines the clinical burden associated with the two conditions alone and together. Data are collected by questionnaires assessing major depressive disorder (MDD), chronic pain, pain-related disability, somatic symptom severity, panic disorder, other anxiety, probable alcohol abuse, and health-related quality of life (HRQL). The instruments include the Patient Health Questionnaire, SF-8, and the Graded Chronic Pain Questionnaire. The conclusions are that chronic pain is common among those with MDD, and Comorbid MDD and disabling chronic pain are associated with greater clinical burden than is MDD alone.

A. Elizabeth Rippentrop [24] seeks to better understand the relationships among religion/spirituality and physical health, mental health, and pain in 122 patients with chronic musculoskeletal pain. Hierarchical multiple regression analyzes reveal significant associations between components of religion/spirituality and physical and mental health. Forgiveness, negative religious coping, daily spiritual experiences, religious support, and self-rankings of religious/spiritual intensity significantly predict mental health status. Religion/spirituality is unrelated to pain intensity and life interference due to pain. Religion/spirituality may have both costs and benefits for the health of those with chronic pain.

Susan W. Vines [27] determines the relationships between pain perceptions, immune function, depression and health behaviors and examines the effects of chronic pain on immune function using depression and health behaviors as covariates. Pain perceptions show positive significant correlations with depression (P = 0.01) and total percent of NK cells (P = 0.04). Depression and health behaviors are negatively correlated (P = 0.01). Positive associations are observed for depression and 2 PHA mitogen levels (P0.05). The immune function of patients with chronic pain is significantly higher than in the no-pain comparison group. Pain perceptions may have a deleterious effect on enumerative NK cell measures and depression levels.

Most of the attributes mentioned in [7, 2, 24, 27] are included in Table 4, such as acceptance, emotions, illness perceptions, and health status; depression, chronic pain, and clinical burden; religion/spirituality and physical health, mental health, and pain; and pain perceptions, immune function, depression and health behaviors. The only two differences between the investigation attributes in Table 4 and the attributes mentioned in the literature are the expressions. The attributes mentioned in the literature are more conceptual. Applied methods include correlation, multiple regression, factor analysis, cluster analysis, structural equation modelling, and hierarchical multiple regression. The literature proves that relationships among these attributes do exist. However, the attributes covered by the relationships in each study are incomplete, and the relationships have not been well utilized to provide results of interest such as in prediction.

2.3 Prediction

Kersh, BC [14] uses psychosocial and health status variables independently to predict health care seeking for fibromyalgia. Subjects are administered 14 measures, which produce six domains of variables: background demographics and pain duration; psychiatric morbidity; and personality, environmental, cognitive, and health status factors. These domains are input into 4 different hierarchical logistic regression analyzes to predict the status as patient or non-patient. The full regression model is statistically significant (P0.0001) and correctly identifies 90.7% of the subjects, with a sensitivity of 92.4% and a specificity of 87.2%.

Maki Aoyama [1]

uses physical and functional factors in activities of daily living to predict falls in community-dwelling older women. Correlation analysis investigating associations among the scores of assessment scales and actual measurements of muscle strength and balance shows that there are significant correlations between handgrip strength and the Falls Efficacy Scale, Functional Reach test, Timed Up and Go test, Berg Balance Scale, Motor Fitness Scale, and Motor Functional Independence Measure in fallers and non-fallers. A binary logistic stepwise regression analysis reveals that only an inability of “being able to go up and down the staircase” in the Motor Fitness Scale remains a significant variable to predict falls.

Aydeniz, Ali [3] predicts falls in the elderly with physical, functional and sociocultural parameters. Falls are common in patients with weakness, fatigue, dizziness, and swelling in the legs and in subjects with appetite loss. Fallers have lower functional status than do non-fallers (p=0.028). In addition, fallers have more depressive symptoms than do non-fallers (p=0.019). Quality of life (NHP), especially physical activity, energy level and emotional reaction, subgroups are different (p=0.016, 0.015, and 0.005, respectively). Disability and mental status are similar in groups (p=0.006). Musculoskeletal problems, functional status and social status might be contributors to falls.

Jennifer L. Gatz [11] uses depressive symptoms to predict Alzheimer’s disease and dementia. The Total Center for Epidemiologic Studies Depression (CES-D) score is a significant predictor of AD and dementia when categorized as a dichotomous variable according to the cutoff scores of 16 and 17; a CES-D cutoff of 21 is a significant predictor of AD and a marginally significant predictor of dementia. When analyzed as a continuous variable, the CES-D score is marginally predictive of AD and dementia. Neither participant-reported history of depression nor participant-reported duration of depression is significant in predicting AD or dementia.

Madeline Cruice [6] predicts social participation in older adults with personal factors, communication and vision. Assessments are individually conducted in a face-to-face interview situation with the primary researcher, who is a speech pathologist. Social participation is shown to be associated with vision, communication activities, age, education and emotional health. Naming and hearing impairments are not reliable predictors of social participation. It is concluded that professionals interested in maintaining and improving the social participation of older people should strongly consider these predictors in community-directed interventions.

Most of the attributes mentioned in [14, 1, 3, 11, 6] are also included in Table 4 such as psychosocial and health status variables and health care; physical, functional and sociocultural parameters and falls; depressive symptoms, Alzheimer’s disease and dementia; and personal factors, communication and vision, and social participation. Health care, falls, Alzheimer’s disease, dementia, and social participation are predicted using hierarchical logistic regression, correlation analysis, and binary logistic stepwise regression, respectively. The literature here is illustrative of the fact that certain attributes can indeed be predicted by other attributes. However, one study only predicts one attribute, not multiple attributes simultaneously. Meanwhile, the involved attributes in each study are also incomplete. A sufficient prediction between complete attributes can be studied.

The relation and motivation between all the related works and this research are emphasized and explained here. [22, 10, 15, 25, 4, 21] show that certain attributes in Table 4 can be reduced directly. The information of these attributes is redundant and contained in other attributes. [7, 2, 24, 27] provide further evidence that there is an inherent relationship between these attributes. [14, 1, 3, 11, 6] further show that certain attributes can be predicted by other attributes because of the underlying relationship. All the related works form the foundation of this research. The proposed fast preceding questionnaire model (FPQM) can achieve state-of-the-art performance only when there is an inherent relationship between attributes. If the relationship does not exist at all, no methods can predict certain attributes by others. The relationship is the foundation of all possible methods, including the FPQM. [22, 10, 15, 25, 4, 21] reduce some attributes directly, while the FPQM predicts the values of the attributes and preserves these attributes. In addition, In[14, 1, 3, 11, 6], one study only predicts one attribute, while the FPQM can predict multiple attributes simultaneously.

3 Fast preceding questionnaire model (FPQM)

A solid mathematical definition of question is given, and the fast preceding questionnaire model (FPQM) is proposed to solve the problem in five steps.

Step 1: Calculate the influence

We calculate, in order, the confidence level of the attribute by taking a value under the condition of another attribute taking a value, the influence of the attribute taking a value on another attribute, the influence of the attribute on another attribute, the influence of the attribute on all other attributes, and the attribute that has the largest influence on all other attributes.

Step 2: Choose the best attribute to split

We traverse every investigation attribute, every other attribute, every value of the attribute, and every value of the other attribute to calculate every influence. Finally, the influence of the investigation attribute on all other attributes can be calculated. Then, we logically choose the best attribute that has the largest influence on all other attributes.

Step 3: Create the FPQM

After the best attribute to split is chosen, we traverse every value of the attribute, and the sub-model can be obtained with sub-dataset corresponding to the value recursively. Then, we attach the sub-model to the full model, and the full model is obtained when the recursion ends.

Step 4: Use the FPQM for real investigation

Now, the FPQM can be used to investigate a new respondent. At the beginning of the real investigation, there is no prior information about the respondent; therefore, we ask for the value directly. After sufficient information has been accumulated, some investigation attributes can be inferred. If the confidence level is larger than the given threshold, then the attribute does not need to be asked about.

Step 5: Evaluate the model

After the FPQM is used to investigate the new respondent, the evaluation metrics can be calculated. The FPQM can be evaluated based on these metrics.

The five steps are also presented in the form of a graph, as shown in Figure 1. Step 1: Calculate the influence with Eq. (3.2)-(5). Step 2: Choose the best attribute to split with Algorithm 1 and the calculated influence in Step 1. Step 3: Create the FPQM with Algorithm 2, and at every recursion step, call Algorithm 1 to choose the best attribute to split. Step 4: Use the FPQM for the real investigation with Algorithm 3 after the FPQM is created in Step 3. Step 5: Evaluate the model with Algorithm 4.

Figure 1: Illustration of the five steps.

3.1 Problem definition

Definition 3.1.

Let be the collection of all individuals who are investigated in the training dataset, where is the -th individual being investigated.

Definition 3.2.

Let be the collection of all individuals in the testing dataset, where is the -th individual and .

Definition 3.3.

Let be the collection of investigation attributes, where is the -th investigation attribute.

Definition 3.4.

Let be the collection of all possible values on all investigation attributes, where is the collection of all possible values on .

Definition 3.5.

, where is the number of . will be used in the time complexity analysis of the following four algorithms.

Definition 3.6.

Let be the matrix of all real values of all individuals, and is the training dataset, where is the real value of individual on the investigation attribute and .

Definition 3.7.

Let be the matrix of all real values, where is the testing dataset, in which is the real value of on and . can also be represented as ; , where

is the vector of real values of

.

Definition 3.8.

Let be the matrix of all final values, where is the final value of on and . can also be represented as ; , where is the vector of the final values of .

Definition 3.9.

Let be the matrix of indication values showing whether the final value is equal to the real value . , where when and when .

Definition 3.10.

Let be the matrix of the confidence levels of taking the final values . , where is the confidence level of taking on . can also be represented as ; , where is the vector of the confidence levels of .

Definition 3.11.

Let be the matrix of indication values showing whether the value is predicted from other already known attributes. , where denotes that is predicted and denotes that by asking on directly. Note that when , and when , where is the given threshold. can also be represented as ; , where is the vector of the indication values of .

Definition 3.12.

Let be the collection of all reasonable orders in which individuals should be investigated, where is a reasonable order in which should be investigated. is a single substitution of . is different from (which it most likely is) when is different from .

Definition 3.13.

Let be the fast preceding questionnaire model. is a tree structure, and will determine , , and .

Definition 3.14.

Let be the collection of all appearing confidence levels when creating , where will determine .

Definition 3.15.

Let be the system of the fast preceding questionnaire model.

Definition 3.16.

Let be the space of all possible questionnaire models.

Definition 3.17.

Average accuracy rate (): . describes how accurate the model can be. can also be represented as ; , where is the accuracy rate of .

Definition 3.18.

Average reduction rate (): . describes how well the model can accelerate the questionnaire. can also be represented as ; , where is the reduction rate .

Definition 3.19.

Average -Measure: , where is a given parameter. describes a balance between and .

Definition 3.20.

The problem is defined as the following.

3.2 Influence calculation formula

To create the model from the training dataset , the influence of one investigation attribute on all others should be calculated. The influence calculation formula is given in Definition 3.21-3.25 when the depth of the created model reaches t. The influence of the investigation attribute depends on the influence of the values.

Definition 3.21.

The confidence level of the investigation attribute taking the value under the condition of the investigation attribute taking the value when the previous k-1 layer values are already known.

(1)

where , , , and . is the -layer value of all investigation attributes , is the layer value of , etc. is the number of individuals in who take the value , on the investigation attributes , , respectively. is the number of individuals in who take the value on .

Definition 3.22.

The influence of taking on when the previous k-1 layer values are already known.

(2)

where is defined in Definition 3.21. Notice that is not defined as because is always true.

Definition 3.23.

The influences of on when the previous k-1 layer values are already known.

(3)

where is the confidence level of taking .

Definition 3.24.

The influence of on all other investigation attributes when the previous k-1 layer values are already known.

(4)

where .

Definition 3.25.

The investigation attribute that has the largest influence on all other investigation attributes .

(5)

3.3 Best attribute to split choosing algorithm (BASCA)

When creating the FPQM, it is necessary to choose the best attribute to split. When the depth of the created model reaches , we traverse every investigation attribute; then, we traverse every other investigation attribute and calculate the influence of the investigation attribute on all other attributes. Lines 5, 9, 11, 13, and 14 are calculated with Eqs (3.2)-(5). Then, we logically choose as the best attribute that has the largest influence on . The pseudocode is shown in Algorithm 1 when the depth of the created model reaches .

1:for (do
2:     for ( & do
3:         for each value of  do
4:              for each value of  do
5:                  Calculate
6:                  Add to
7:              end for
8:              Calculate
9:         end for
10:         Calculate
11:     end for
12:     Calculate
13:end for
14:Calculate
15:return
Algorithm 1 Best attribute to split choosing algorithm (BASCA)
Input: ,
Output:

. The BASCA returns the investigation attribute that has the largest influence on . is a global variable, and the BASCA can also be called to obtain .

3.4 Fast preceding questionnaire model creating algorithm (FPQMCA)

Now, the FPQM can be created with the above groundwork. After is chosen as the best attribute to split, we traverse every value of , and the sub-model can be obtained with the sub-dataset corresponding to recursively. Then, we attach the sub-model to the full model. Algorithm 2 is the pseudocode.

1:Create a node
2:Let be the number of 0s in
3:if  then
4:     Let be the investigation attribute whose corresponding index in is 1
5:     return as a leaf node labeled with
6:else
7:     
8:     Label node with
9:     Let be the corresponding index in of
10:     
11:     for each value of  do
12:         Let be the set of data tuples in satisfying on
13:         
14:         Attach the node to node
15:     end for
16:end if
17:return
Algorithm 2 Fast preceding questionnaire model creating algorithm (FPQMCA)
Input: , , Index list: ,
Output: Fast preceding questionnaire model:

, and is initialized with a zero vector. Then, the FPQMCA can be called to obtain the fast preceding questionnaire model . .

3.5 Model used for real investigation algorithm (MURIA)

With the created FPQM, the new person in the testing dataset can be investigated quickly. At the beginning of the real investigation, there is no information about the respondent; therefore, all we can do is ask about the attribute directly. After sufficient information has been accumulated, some investigation attributes can be inferred; otherwise, we continue asking about attributes.

Let be the index indicating whether the current investigation attribute is the top attribute. indicates that it is the top attribute, and indicates that it is not. If the current investigation attribute is the top attribute, there is no information about the respondent; therefore, the attribute cannot be predicted.

, , , and are global variables that have be defined in Definitions 3.8, 3.11, 3.12, and 3.10, respectively. The pseudocode is shown in Algorithm 3.

1:Let be the top investigation attribute index of . Let be the top investigation attribute of . Let be the real value of on . Let be the final value of on V(M(1)). Let be the indication value of on . Let be the confidence level of on . Let be the current attribute of .
2:if  then
3:     
4:     
5:     
6:     
7:end if
8:
9:Let be the sub-model of when takes .
10:Let be the top investigation attribute index of . Let be the top investigation attribute of . Let be the real value of on . Let be the final value of on . Let be the indication value of on . Let be the confidence level of on . Let be the next attribute of . Let be possible values on . Let be the confidence level of taking under the condition of taking .
11:
12:if  then
13:     
14:     
15:     
16:     
17:else
18:     
19:     
20:     
21:     
22:end if
23:if  then
24:     return
25:else
26:     return
27:end if
Algorithm 3 Model used for real investigation algorithm (MURIA)
Input: , , , threshold of confidence level: , Index:
Output: Last node:

is the initial value. We traverse every testing dataset in as , and we call the algorithm to obtain , , , and .

3.6 Model evaluation algorithm (MEA)

Now, the FPQM should be evaluated to determine its performance. Various evaluation metrics are calculated by the model evaluation algorithm (MEA). First, can be calculated with and . Then, , , and can be obtained with and . A larger indicates a better FPQM. Algorithm 4 is the pseudocode.

1:for  to  do
2:     for  to  do
3:         if  then
4:              
5:         else
6:              
7:         end if
8:     end for
9:end for
10:Let Sum1, Sum2, Sum3, and Sum4 be temporary variables
11:
12:
13:
14:for  to  do
15:     
16:     
17:     for  to  do
18:         
19:         
20:     end for
21:     
22:     
23:     
24:     
25:     
26:     
27:end for
28:
29:
30:
31:return , ,
Algorithm 4 Model evaluation algorithm (MEA)
Input: , , ,
Output: , ,

When is given, . , , can be obtained by calling the MEA.

3.7 An example

Here is an example to illustrate the Definitions 3.1-3.25 and Algorithms 1-4.

Investigation attributes Education Income Social Skills Work Ability Communication
0 1 0 1 1
1 2 0 0 1
1 0 1 0 1
1 0 1 1 0
Table 1: Example Training Dataset
Investigation attributes Education Income Social Skills Work Ability Communication
1 1 0 1 0
0 1 1 0 1
Table 2: Example Testing Dataset

When , no values of the investigation attributes are known yet. By Eq. (3.2),

(6)
(7)
(8)

By Eq. (3.2),

(9)

Similar to Eqs (6)-(3.7),

(10)

By Eq. (3.2),