Similarity between Learning Outcomes from Course Objectives using Semantic Analysis, Blooms taxonomy and Corpus statistics

04/17/2018 ∙ by Atish Pawar, et al. ∙ 0

The course description provided by instructors is an essential piece of information as it defines what is expected from the instructor and what he/she is going to deliver during a particular course. One of the key components of a course description is the Learning Objectives section. The contents of this section are used by program managers who are tasked to compare and match two different courses during the development of Transfer Agreements between various institutions. This research introduces the development of semantic similarity algorithms to calculate the similarity between two learning objectives of the same domain. We present a novel methodology which deals with the semantic similarity by using a previously established algorithm and integrating it with the domain corpus utilizing domain statistics. The disambiguated domain serves as a supervised learning data for the algorithm. We also introduce Bloom Index to calculate the similarity between action verbs in the Learning Objectives referring to the Blooms taxonomy.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Learning Outcomes or Learning Objectives(LO) of a course define what the student is expected to learn by taking the course. LOs form a crucial part of any course description; hence these objectives of a course are considered as a base criterion to compare the two courses. If a student is transferring from institution A to institution B and is also attempting to transfer credits from institution A, then accurate comparison of courses is essential in deciding if the student is eligible to receive credit at institution B. This task of examining the LOs from two courses is currently completed by personnel called Program Managers/Coordinators.

Program managers are also responsible for developing Transfer Program Agreements between institutes. Comparing the learning outcomes from the two course objectives is a practice followed by program managers when they are asked to compare two courses or program [12]. This process requires human intelligence and expertise to evaluate the course objectives. Similarly, Program Managers depend on domain experts to finalize the decision. Domain experts are persons who have knowledge of a particular field. This process depends on the human interference throughout; hence is resource and time consuming.

Our intelligent system aims to automate the process of deciding whether a given student is eligible to recieve credits or not, by comparing Learning Objectives semantically. The course instructors are usually asked to follow Bloom’s Taxonomy[2] when structuring the learning outcomes. Bloom’s Taxonomy provides general keyword guidelines, and a hierarchical structure to be used when defining the learning outcomes [11], see Figure 1[7]. But in practice, we found that instructors usually don’t follow these guidelines. So, in our methodology, we limit the influence of Bloom’s taxonomy by analyzing only the verbs. We use the hierarchical distribution of verbs in Bloom’s taxonomy to compare learning objectives. Each layer in Bloom’s taxonomy, as depicted in Figure 1, has a list of verbs associated with it [8].

Figure 1: Hierarchical Structure of Bloom’s Taxonomy

The main contributions of this research are:

  • Development of LO similarity measures using semantic analysis

  • Utilizing Bloom’s Taxonomy to determine the difficuly level of LOs

  • Demonstrating the effect and the usage of corpus statistics

Next section reviews some related work. Section 3 elaborates the methodology step by step. Section 4 describes the implementation in detail. Section 5 analyses the experimental results and section 6 discusses the performance of the system. Finally, section 7 explains the results in brief and draws the conclusion.

2 Related work

Extensive research in the area of natural language processing has contributed valuable resolutions in the field of semantic analysis. In this section, we review some of the existing algorithms; their strengths, and weaknesses. The related work can be roughly classified into following sections:

  • Similarity based on lexical databases

  • Relatedness based on web search engine results

  • Grammar-based methods

2.0.1 Similarity based on lexical databases

Various methods have been developed previously which use a lexical database. These methods use the hierarchical distribution of the words in the database [3][20][16]. Some techniques also integrate external corpus statistics with lexical database and influence the final semantic similarity [14][10]. These methods have the following general limitations:

  • The appropriate meaning of the word is not considered while computing the similarity between words which introduces inaccuracies during the earlier stage of semantic similarity calculations.

  • The corpus statistics differ for each corpus. Thus, the similarity is different for every corpus.

  • The grammar of the sentence is not considered.

But it has following advantages:

  • Using well-indexed lexical databases such as WordNet, have lower computational difficulties.

  • The similarity algorithm can be exploited to restrict the domain of operation.

2.0.2 Relatedness based on web search engine results

This methodology uses the number of search results from an internet search engine to establish the relatedness between words [5]. This technique doesn’t necessarily give the similarity between words as the number of pages indexed by the search engine are huge, and words with opposite meaning frequently occur together on the web. We have implemented the methodology to calculate the Google Similarity Distance[6], but results are not encouraging.

2.0.3 Grammar-based methods

Grammar-based methods are more useful to analyze the general language sentences. Such methods ultimately depend on some measure of semantic similarity between words[9] [13]. The Sentence Text Similarity method [9] focuses on the semantic similarity between words as well as the String Similarity. They also consider the order of occurrences of words. These methods work suitably for analyzing day-to-day life scenarios such as tweets, textual content from books/articles or speeches. When considering perfect phrases, such as LOs, the grammar remains same throughout the LOs. Hence, using such methods do not give the advantage over other methods when it comes to LOs.

3 Methodology

The proposed methodology uses a semantic similarity algorithm[18] and extends it to work with Bloom’s taxonomy and corpus related to the specific domain. Figure 2 shows the modules for computing the similarity between two learning objectives.

Figure 2: The proposed methodology

The semantic similarity algorithm shown in Figure 2 as a process, is developed by the authors. This developed algorithm uses Synsets from WordNet to calculate the semantic similarity between the sentences. This methodology aims to identify the correct synsets according to the meaning of the word in sentence using corpus statistics[18]. The methodology is divided into the following subsections:

  • Semantic similarity algorithm

  • Bloom’s taxonomy

  • Corpus statistics

3.1 Semantic similarity algorithm in brief

The semantic similarity algorithm used in this method is an edge-based approach which uses WordNet, a lexical database. The method to calculate the semantic similarity between two sentences is divided into three parts:

  • Word similarity

  • Sentence similarity

  • Word order similarity

This method first calculates the semantic similarity between words considering the meaning of the word in the context of the statement. The best result is then used to form a semantic vector for both the sentences separately. The semantic vectors formed are used to calculate the semantic similarity. The word order vector is constructed by considering the syntactic structure of the sentences, i.e., the occurrences of words concerning each other. A suite of algorithms are reported elsewhere and the interested readers of this publication can ask the authors for a copy of the paper under review.

3.2 Bloom’s taxonomy

As discussed in section 1, a well-structured course objective describes what students will be able to learn and to do as a result of the course[1]. Bloom’s taxonomy is well-known, established, hierarchically structured model which contains action verbs in multiple levels of the hierarchy. As we move up the hierarchy, the difficulty level of action verb increases. The upper three layers, Analysis, Synthesis and Evaluation demonstrate the verbs with critical thinking. We implement the Bloom’s taxonomy as separate part of methodology and restrain its influence on the main sentence similarity methodology. We explain the reason in the following subsection.

3.2.1 Problem with integrating Bloom’s taxonomy with the principal method

Though Bloom’s taxonomy is the suggested standard for designing the course outline, we have found that a considerable number of course drafts differ significantly from the norm. Hence, considering such LOs as well structured and integrating it with the primary methodology violates the purpose. Therefore to use the Bloom’s taxonomy, we establish the “Bloom Index”. The Bloom Index represents the learning gap between two learning outcomes according to the verbs in LOs.

We start with identifying the action verbs in learning outcomes. Two lists are formed containing the action verbs from each LO respectively. We use Stanford POS Tagger [15] to tag the words and identify action verbs. Each layer in the hierarchy is given a numerical value starting from 1 and going up to 6 as we move up the hierarchy. The absolute distance between the numerical values of layers of verbs yields the distance between two verbs. We use this distance to calculate the index for each pair of the verb. The absolute bloom index for each pair is given by:


where = -0.20 and = 1. The absolue_bloom_index represents the absolute similarity between two verbs according to the Bloom’s hierarchy. If both verbs fall into the same category, then they represent the same learning level; hence for such verb pairs it is logical to assign a similarity index which represents maximum similarity. Since, most of the similarity algorithms follow the range from 0 to 1 for the similarity index, we follow the same standard and establish the maximum Bloom similarity as 1. Since the hierarchy is divided into 6 levels uniformaly, and the range for Bloom index is 0 to 1, we set incremental or decremental distance as 0.2.

We add the absolute bloom indices of all the verb pairs and get the Total Bloom Index. Now, to limit the value of Bloom Index to 0 to 1, we use the total number of comparisons for verb pairs. Finally, Bloom index is given by:


3.3 Corpus statistics

The selection of corpus affects the similarity index by a considerable amount. Learning objectives have some peculiar words. Using a general-purpose corpus does not make justice to such words as the meaning of words differs in general-purpose corpora. No single corpus serves the purpose as the terminologies used in LOs are different for every domain. For example, the terminologies used in Computer Science are different from that of Economics and Chemistry. Our similarity algorithm uses Synsets from the WordNet to calculate the semantic similarity between the words. A word can have multiple synsets with different meanings. Hence, it is essential to identify the appropriate synset.
This methodology simulates the use of corpus as a supervised learning model. The corpus is then “disambiguated”, i.e., we find the appropriate sense for each word in the corpus. Identifying sense of the word is part of “word sense disambiguation” research area. We use ‘max similarity’ algorithm to identify the sense of the words[19], as implemented in Pywsd, an NLTK based python library[4].


In this stage, we identify the meaning of the word and the synset corresponding to this definition from the WordNet. This information is stored in conjunction with each other to use efficiently for further calculations. The format used is: Word Synset Meaning of the word
This information also serves as a replica of ‘Educational Ontologies’ synchronous with WordNet. Then we run a separate thread to establish the frequencies of the synsets and group them according to the meaning. The process is repeated everytime the corpus is changed or updated, and new storage is created for every run. In case of rare events, if the disambiguation function fails to tag a word, then we use the statistics from the WordNet. WordNet has the predefined frequency distribution of definition of the words. We use this frequency for the failed words.

4 Implementation

We use previously established Sentence Similarity algorithm which is currently under review elsewhere and modify it as explained in section 3. The database used to implement the proposed methodology is WordNet and statistical information from WordNet. A compiled corpus of the Chemistry domain is used containing learning outcomes, definitions of terminologies and textual contents from books/articles.

4.1 The Databse - WordNet

WordNet is a lexical semantic dictionary available for online and offline use, developed and hosted at Princeton. The WordNet version used for this study is WordNet 3.0 which has 117,000 synonymous sets, Synsets. Synsets of a word represent possible meanings of the word in the context of a sentence. The central relationship connecting the synsets is the super-subordinate(ISA-HASA) relationship. We use this connection to find the shortest path distance and use this distance to establish similarity between word pairs.

subatomic Synset(’subatomic.a.01’) of or relating to constituents of the atom or forces within the atom
composition Synset(’composition.n.03’) a mixture of ingredients
atoms Synset(’atom.n.01’) (physics and chemistry) the smallest component of an element having the chemical properties of the element
ions Synset(’ion.n.01’) a particle that is electrically charged (positive or negative); an atom or molecule or group that has lost or gained one or more electrons
isotopes Synset(’isotope.n.01’) one of two or more atoms with the same atomic number but with different numbers of neutrons
Table 1: Disambiguated data for LO1

4.2 Corpus statistics

We present a simulation of formation of corpus statistics using a small corpus. Consider following LOs from the corpus.
LO1: Describe the subatomic composition of atoms, ions and isotopes.
LO2: Calculate spectroscopic quantities in relation to electronic transitions.
LO3: Write electronic configurations of atoms and ions and relate to the structure of the Periodic Table.
LO4: An electrical force linking atoms and molecular bonds in chemicals.

Table 1 and Table 2 represent the information retrieved from the corpus for LO1 and LO4 respectively. Similarly, all the LOs from the corpus are disambiguated to get the data. We then calculate the frequency distribution of synsets corresponding to words. For instance, the frequency of synset Synset(’atom.n.01’) is 2. Hence, whenever the word atom occurs in the LO, the synset considered for semantic similarity calculation will be Synset(’atom.n.01’). Identically, the statistics are formed for all the synsets and words. Having a well-performing word disambiguation function is crucial to get the precise information from the corpus.

electrical Synset(’electrical.a.01’) relating to or concerned with electricity
force Synset(’power.n.05’) one possessing or exercising power or influence or authority
linking Synset(’connect.v.01’) connect, fasten, or put together two or more pieces
atoms Synset(’atom.n.01’) (physics and chemistry) the smallest component of an element having the chemical properties of the element
bond Synset(’bond.n.01’) an electrical force linking atoms
chemistry Synset(’chemistry.n.02’) the chemical composition and properties of a substance or object
chemical Synset(’chemical.n.01’) material produced by or used in a reaction involving changes in atoms or molecules
Table 2: Disambiguated data for LO4

4.3 Bloom’s Taxonomy

To implement Bloom’s Taxonomy, we consider the traditional hierarchical structure. We use the verbs listed in Bloom’s Taxonomy of Measurable Verbs[8] arranged in the hierarchy. Each level in the hierarchy is assigned a number starting at 1 with the base ‘Remembering’ and going up to 6 with ‘Creating’. To tag the verbs in the LOs, we use the Stanford POS tagger [15].

4.4 Illustrative Example

This section explains the working of methodology to calculate the Bloom’s Index and the usage of corpus statistics.

4.4.1 Calculating Bloom’s Index

Consider following sentences:
S1: Discuss the application of the scientific method to the study of human thinking, development, disorders, therapy, and social processes
S2: Identify major health informatics applications and develop basic familiarity with healthcare IT products
From S1, we tag one verb, discuss.
From S2, we tag two verbs,viz., identify and develop.
Here we have 2 comparisons: disucssidentify and developdiscuss. Hierarchical distance between disucssidentify is 1, similarly Hierarchical distance between developdiscuss is 3. Using Eq.(1), we get absolute_blooms_index as 0.8 and 0.4 respectively. Now using Eq.(2), we get the Bloom’s Index as (0.8+0.4)/2=0.6.

4.4.2 Corpus Statistics

Consider two LOs listed in section 4.2. LO1: Describe the subatomic composition of atoms, ions and isotopes.
LO3: Write electronic configurations of atoms and ions and relate to the structure of the Periodic Table.

electronic Synset(’electronic.a.02’) of or concerned with electrons
configurations Synset(’shape.n.01’) any spatial attributes (especially as defined by outline)
atoms Synset(’atom.n.01’) (physics and chemistry) the smallest component of an element having the chemical properties of the element
atoms Synset(’atom.n.01’) (physics and chemistry) the smallest component of an element having the chemical properties of the element
ions Synset(’ion.n.01’) a particle that is electrically charged (positive or negative); an atom or molecule or group that has lost or gained one or more electrons
structure Synset(’structure.n.03’) the complex composition of knowledge as elements and their combinations
Table Synset(’table.n.05’) a company of people assembled at a table for a meal or game
Table 3: Disambiguated data for LO3
LO1 LO2 Proposed Algorithm Similarity
Acquire knowledge: memorize factual information and laws; assimilate scientific concepts; learn chemical calculations To predict the physical and chemical properties of organic molecules from structures. 0.343231930716
Students will develop both problem solving and critical thinking skills, and they will use these skills to solve problems utilizing chemical principles. use knowledge of intermolecular forces to predict the physical properties of molecular and extended-network elements and compounds; 0.295240282004
apply chemical knowledge to integrate knowledge gained in other courses and to better understand the connections between the various branches of science; understand and utilize the terminology and concepts of chemistry to acquire and communicate scientific information and to solve basic chemical problems; 0.318542368852
To become familiar with the structures of organic molecules, especially those found in nature or those with important biological effects; To predict the physical and chemical properties of organic molecules from structures 0.9405819540
solve problems involving the physical properties of matter in the solid, liquid and gaseous states; Students will gain an appreciation of the scientific discipline of chemistry and the principles used by chemists to solve complex problems. 0.223101105502
understand the basis of the unique properties of mixtures and perform related calculations; memorize factual information and laws; assimilate scientific concepts; learn chemical calculations 0.289648142927
apply knowledge of thermochemistry to calculate enthalpy changes associated with chemical and physical processes; solve problems involving the physical properties of matter in the solid, liquid and gaseous states; 0.113466429084
Write electronic configurations of atoms and ions and relate to the structure of the Periodic Table. Describe the subatomic composition of atoms, ions and isotopes. 0.852869346717
Students will learn and apply the method of inquiry used by chemists to solve chemical problems. Describe the role of chemists and chemistry in drug design and methods used by chemists. 0.994912072273
Examine, integrate, and assess any provided or collected chemical data. Draw scientific conclusions from experimental results or data. 0.900301710749
Table 4: Similarity between LOs

Table 1 and Table 3 depicts the words, synsets, and meanings for LO1 and LO3 respectively.
From Table 3, considering the meanings of the words, we can conclude that the disambiguation worked fine and we have appropriate synsets for the further calculations; whereas, from Table 3, we conclude that there are some inaccuracies with the words such as structure and table. The meaning of these words we get after disambiguation is different from their contextual sense in the sentence. The expected meaning of table here is a tabular array (a set of data arranged in rows and columns), and structure is a structure (the manner of construction of something and the arrangement of its parts).
Using right set of LOs corresponding to the appropriate domain, we get the synset with the correct meaning. Even while disambiguating the corpus, the disambiguation function can identify inaccurate sense for a word. Using frequency of the sense in corpus deprecates this inaccuracy.

5 Experimental Results

To evaluate the algorithm, we used real Learning Objectives from various course outlines. A survey was conducted and users were asked if they can make a decision based on the resultant semantic similarity and the Bloom Similarity. All the users at least possessed a Bachelors degree. Out of 15 users, 10 users agreed that 75% or more of the results were useful; 1 user agreed that 65% or more of the results were useful and 4 users agreed that 55% or more of the results were useful. Table 4 shows the semantic similarity between real-time LOs.

6 Discussion

The sentence similarity algorithm used for this methodology achieved good Pearson correlation coefficient of 0.8753 for word similarity concerning the bechmark standard[21] and 0.8794 for sentence similarity with respect to mean human similarity [17]. The proposed methodology aims to use this algorithm and make it specific to the Learning Objectives. We use Bloom’s Taxonomy to determine the comprehensive similarity between the LOs. We achieve this by establishing relative similarity between verbs.
The crucial part of the algorithm is the availability of domain-specific corpus. During this research, we have found no corpus which meets this requirement. So we compiled a small corpus to conduct the study. The contents of the corpus compiled corpus are learning objectives, terminologies and definitions, parts of a book or research belonging to the particular domain. We found that corpus disambiguation works well if we have more apparent words related to that field. This helps the disambiguation function to predict the meaning using the max_similarity algorithm.

7 Conclusion

This paper presented an approach to calculate the semantic similarity between learning objectives using Corpus Statistics and Bloom’s taxonomy. The crucial part of the algorithm is the disambiguation of words in the context of their use. Having fewer datapoints may lead to detecting the wrong meaning of the word. Hence, using a corpus, we make sure that the algorithm always selects the appropriate sense of the word as discussed in the methodology. We use corpus statistics from the disambiguated corpus. The meaning with the highest frequency is considered by the algorithm to find the proper synset from the WordNet. The methodology has been tested on actual learning objectives, and we have achieved very encouraging results.
Future work includes expanding the domains and corpus, increasing efficiency of algorithms by using different file structures and forming WordNet-like ontologies for specifically the education domain.

8 Acknowledgement

We would like to acknowledge the financial support provided by ONCAT(Ontario Council on Articulation and Transfer)through Project Number- 2017-17-LU,without their support this research would have not been possible. We are also grateful to Salimur Choudhury for his insight on different aspects of this project; Kyle Robinson for reviewing and proofreading the paper.


  • [1] Gina M Almerico and Russell K Baker. Bloom’s taxonomy illustrative verbs: Developing a comprehensive list for educator use. Florida Association of Teacher Educators Journal, 1(4):1–10, 2004.
  • [2] Lorin W Anderson, David R Krathwohl, P Airasian, K Cruikshank, R Mayer, P Pintrich, James Raths, and M Wittrock. A taxonomy for learning, teaching and assessing: A revision of bloom’s taxonomy. New York. Longman Publishing. Artz, AF, & Armour-Thomas, E.(1992). Development of a cognitive-metacognitive framework for protocol analysis of mathematical problem solving in small groups. Cognition and Instruction, 9(2):137–175, 2001.
  • [3] Alan D Baddeley. Short-term memory for word sequences as a function of acoustic, semantic and formal similarity. The Quarterly Journal of Experimental Psychology, 18(4):362–365, 1966.
  • [4] Steven Bird. Nltk: the natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions, pages 69–72. Association for Computational Linguistics, 2006.
  • [5] Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. Measuring semantic similarity between words using web search engines. www, 7:757–766, 2007.
  • [6] Rudi L Cilibrasi and Paul MB Vitanyi. The google similarity distance. IEEE Transactions on knowledge and data engineering, 19(3), 2007.
  • [7] Mary Forehand. Bloom’s taxonomy. Emerging perspectives on learning, teaching, and technology, 41:47, 2010.
  • [8] Verbs Cognitive Level Illustrative. Bloom’s taxonomy of measurable verbs. Center for Teaching & Learning— The University of Georgia— ctl. uga. edu, 706:1355.
  • [9] Aminul Islam and Diana Inkpen. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (TKDD), 2(2):10, 2008.
  • [10] Jay J Jiang and David W Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008, 1997.
  • [11] David R Krathwohl. A revision of bloom’s taxonomy: An overview. Theory into practice, 41(4):212–218, 2002.
  • [12] Frankie Santos Laanan. Transfer student adjustment. New directions for community colleges, 2001(114):5–13, 2001.
  • [13] Ming Che Lee, Jia Wei Chang, and Tung Cheng Hsieh. A grammar-based semantic similarity algorithm for natural language sentences. The Scientific World Journal, 2014, 2014.
  • [14] Yuhua Li, David McLean, Zuhair A Bandar, James D O’shea, and Keeley Crockett. Sentence similarity based on semantic nets and corpus statistics. IEEE transactions on knowledge and data engineering, 18(8):1138–1150, 2006.
  • [15] Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations), pages 55–60, 2014.
  • [16] George A Miller and Walter G Charles. Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1–28, 1991.
  • [17] James O’Shea, Zuhair Bandar, Keeley Crockett, and David McLean. Pilot short text semantic similarity benchmark data set: Full listing and description. Computing, 2008.
  • [18] Atish Pawar and Vijay Mago. Calculating the similarity between words and sentences using a lexical database and corpus statistics. CoRR, abs/1802.05667, 2018.
  • [19] Ted Pedersen, Satanjeev Banerjee, and Siddharth Patwardhan. Maximizing semantic relatedness to perform word sense disambiguation. University of Minnesota supercomputing institute research report UMSI, 25:2005, 2005.
  • [20] Philip Resnik et al. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res.(JAIR), 11:95–130, 1999.
  • [21] Herbert Rubenstein and John B Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.