The use of e-books has been heavily studied over the years [1, 2]. Their use in education has been anticipated due to their flexibility, accessibility, interactivity and extensibility [3, 4]. From teacher’s perspective, e-books also prove useful in monitoring and for student progress analysis. Through this platform, teachers can create and present their teaching materials for the students to access, whereas for the students, they can use the platform to learn, communicate, take notes, do pre-study and so on. E-books have also been extended to become a platform for e-learning and e-publishing , as well as learning material recommendation .
As of now, a widely recognized approach to develop e-books is to digitalize the existing printed books as replacement textbooks . Yet, such a system will be useless if all it does is to mimic the physical book digitally. Currently, majority of e-book publishers are still producing such simple digital versions of the original printed books, yet at the same time, there is no sense of familiarity with it , which leads to failure for endeavours that capitalizes on it . Besides, studies still show that printed books are still preferred, and that e-books are often being treated as reference books . Consequently, this leads to a rethinking of what e-book is, and how technologies can be incorporated with it . This is because people treat e-books differently, such as preferring short texts , and also a preference for hyperlinks instead of page flipping. Functionalities unique to the e-book such as search and navigation functionalities have been receiving favorable responses[13, 15], as well as annotation and sharing capabilities that support learning. Recent studies have shown that text, highlighting, bookmarking, multimedia, translation, dictionary and encyclopedia tools are popular components that need to be taken into account in the development of e-book and/or its supporting platform[17, 18].
As most tools are geared towards search and information organization, and intelligent system that supports the management of information according to how the user interacts with the e-book (such as annotation, bookmarking and highlighting) not only simplifies such interactions, but also augments the ability of the human reader in accessing and managing ever more information during the course of learning. But, intelligent platforms come at a price where the teacher needs to painstakingly create their teaching material, which can be a huge overhead. Authoring tools are required to be easy to use for wide adoption of the technology .
This work strives to handle the task of developing an intelligent platform that not only reduces the overhead for teaching material creation in the e-book, but that it can be a personalized information management system for both the teacher and student. This can also induce a more collaborative learning, where studies had found to be favorable. Teaching materials like books, slides and notes are normally presented in a way that assist human learners to garner knowledge that enables continual learning, where earlier knowledge support the assimilation of more complicated concepts further up the course. Therefore, these learning materials normally have internal structures that associates the represented knowledge. For example, descriptions to explain a particular topic on computer number systems will have an order, such as the explanation description on binary-decimal conversion will be preceded by an explanation description on what binary number system is. Another example is the topic classes that categorizes the descriptions. This shows that teaching materials can be represented as directed graphs that connects various nodes, where these nodes themselves may contain values (such as a chunk of text). Such graphs for learning purposes have been studied, such as in language study . This paper will denote this as knowledge graph. There are a lot of works that utilizes such knowledge graphs for information retrieval purposes such as query expansion22], gene identification [23, 24], social networks , probabilistic programming  and large scale knowledge-base inference . It has been shown that information retrieval via graphs outperforms that of texture similarities [28, 24].
The aim of this paper is to present a preliminary work on the platform’s development, where emphasis is given on how teaching materials are to be created and how queries pertaining to learning information management can be achieved. Such queries can be associated with the tools in the e-book such as bookmarking and highlighting. With the use of OWL DL, knowledge graph can be easily constructed by the teacher. The generated knowledge graph can also be used as a Markov Chain for querying purposes. For clarity, domain experts are people who will construct the book ontology. In this paper, “teacher” will be used interchangeably with “domain experts”. Case study is on the query between questions and descriptions of the teaching material. This paper does not assume the use of any prior knowledge base.
This paper is organized as follows: Section II describes the overall architecture of how the e-book platform is set up and deployed. Then, Section III explains how the knowledge base can be set up, as well as the generation of additional facts. Given the generated knowledge graph, Section IV explains how it can be used for information retrieval via graph walk. The platform is then evaluated in Section V. Final, conclusion is given in Section VI.
Ii Platform Architecture
Figure 1 shows the architecture that implements the electronic book platform. Explanation of the architecture will be given according to process flow. It is assumed that the teacher knows everything about the flow of the teaching material and the topic hierarchy. This architecture will leverage this knowledge to build additional knowledge such that more intelligent tasks can be achieved.
Given a course a student need to take, initially, teachers who are expert in the field need to create the teaching materials via some standards specified in Section III. In this work, the teaching materials are presented in HTML format, where there are annotations for the topic and description ID. Description is a small part of text (and images) in the book, which may be a whole page, a paragraph or a sentence (where the teacher has the freedom to decide). Topic, which is often arranged in a hierarchical manner (where an example is shown in Figure 3), is a class that contains the description. In terms of OWL DL specification as described in Section III-A, topic will be a class object that contains the instance of description. An example teaching material presentation in HTML is shown in Figure 2. The way the teaching material is written in the HTML is such that description and their respective topic can be extracted. Besides that, the topic hierarchy can be obtained as well via the headers. From Figure 2, it can be observed that there is a description with id “o:descp:c5mainMemory” that is under the topic “main memory”. Apart from that, it can also be observed that “address space” topic is a sub-topic of “main memory”. Although not shown here, all topics and descriptions will be given unique IDs.
The extracted description instance and topic hierarchy and their association will be described via OWL DL. At this stage, only the ID is used. Every words and their association to the descriptions will also be extracted and added into the ontology. There are also additional axioms that are added into the ontology to ensure the subsequent reasoner to function properly as explained in Section III-D.
The facts and ontologies will then be loaded into the OWL inference engine to generate all the facts that can be deduced, before loading all of them to the MySQL database. Although not implemented yet, there is also an additional soft facts generator that generates facts with a certain magnitude of certainty. This soft facts generator is not used in this paper as the main purpose is to study the information retrieval given the ontologies.
Users will then access the website that shows the teaching material exactly as how the HTML is designed, and also with semantic links stored in the database. Just like web services and semantic web, a lot of features and services can be realized with such setup.
The knowledge graph stored in the database will also be converted to a Markov Chain that will be used for information retrieval purposes. Users can access this feature via the website, which will be explained in Section IV. This information retrieval mechanism using the graph represented by OWL DL is the main emphasis of this paper.
Iii Teaching Material as a Graph
A graph consists of a set of nodes and a set of labeled directed edges between the nodes. This section will describe how the teaching material graph can be constructed. A simple teaching material knowledge graph is shown in Figure 4. Note that from the figure, the knowledge graph only handles the relationship of the nodes (right side of the figure). The data associated with the nodes will be accessed through the HTML representation (left side of the figure), which will be denoted as the value of the nodes. Contrary to the node itself, the value of the node will not be used for reasoning.
Iii-a Node Definition
Teaching material knowledge graph utilizes some objects, namely, description, topic, question, name, concept and term. All except term object needs to be manually defined by the teacher. These objects are explained as follows:
Description: As shown in Figure 4, description node is associated with a chunk of text/images from the HTML representation of the teaching material. Every description has a unique ID, and no nested description is allowed. Description is an instance object in the OWL DL representation under the type “book container”.
Topic: Description is the instance, whereas topic is the class that categorizes the description based on the topic hierarchy of the material. One can think of it as the hierarchy from the table of contents of a book. Note that teachers can also categorize a description under two different topics according to how he/she seems fit. In OWL DL representation, topic is a class object, which is a sub-class of “book container”.
Question: Question is an instance object, which is associated with a value containing the actual question. Question is linked with description that it is relevant to. Note that a question can have multiple links to different descriptions, and vice versa. Question is under the type “question container”.
Name: Contrary to description, which stores a chunk of text, name is an instance object that is associated with a name within the description. Note that the value of name is not limited to one word. Name is under the type “name container”.
Concept: Concept is an instance object, which is used to encode the abstract concept pertaining a description or name. Concept object is expected to be manually designed by domain expert. Automatic generation of concept is subject of future work. Concept is under the type “concept container”.
Term: Name object is associated with a manually annotated word/s, whereas term is automatically extracted. Term is under the type “term container”
Containers: The purpose of type similarity query is to return a ranked list of objects associated with a certain type given some input nodes. The containers explained previously are used as such filter for typed output.
Iii-B Initial Ontology
OWL DL provides a means to generate additional hard facts given an ontology and some initial facts. One can exploit such reasoning capability to enrich the teaching material knowledge graph, without requiring teachers to specify every node links possible. The initial ontology is explained as follows:
Class relationship between topic objects: As explained in Section III-A, topics are arranged in a hierarchical manner according to how the teaching material is structured. One can endow topics, which are of class object in OWL DL representation, with property to generate super and sub-class relationships.
Description relationship with topics: In OWL DL representation, description and topics are related via property, where, combining the super and sub-class relationship described previously, can generate more instance associations with topics.
Other properties: Subsequent descriptions are linked via the property , which has an inverse via . Question is linked to description via property , name linked to description through property , description and name to concept through property , and term to description through property , all of which have inverse properties respectively.
Iii-C Creating Initial Facts
The teacher is responsible to create the initial facts. To prevent the huge overhead imposed on the teacher to create teaching material, and at the same time, able to obtain ample information for reasoning, the process flow is explained in 3 procedures.
Iii-C1 Procedure 1
Topic hierarchy should be specified, which is shown in the region of Figure 5. property is used for topic-topic links. At this stage, the teacher only considers the category and their hierarchy of the teaching material (which is just the structure in the tables of contents), and does not need to consider the actual text of the teaching material.
Iii-C2 Procedure 2
At this stage, the teacher will separate the teaching material into chunks of description, and link them via , which is shown in the region of the Figure 5. The teacher does not need to consider the topic category of the descriptions at this stage. At the same time, given the description, the teacher can also link question object to their relevant descriptions. During this stage, association between the text chunks in the HTML representation and the question and description object should be made as shown in Figure 4.
Iii-C3 Procedure 3
At this stage, links between description and topic are made. The teacher can also link a description to multiple topics.
All three aforementioned procedures can be automatically extracted from the HTML representation of the teaching material. These procedures also pose minimal overhead as such information from teaching materials ought to be ready for any classes. Given the example shown in Figure 5, the following shows the triples (subject, property, object) in OWL DL representation:
Iii-D Additional Facts Generation
Given the ontology and facts provided in Section III-B and III-C, additional facts can be generated. Yet, the accumulation of these facts might not be enough as they are quite sparse considering the number of nodes. One way to increase connection is to have domain expert add in concept nodes and their relationships as described in Section III-A, but such task will be extremely tedious and does not scale well.
In this work, words are automatically extracted from the descriptions, where these words will be associated with their respective descriptions, which provides a denser relationship between the nodes.
Iv Typed Similarity via Graph Walk
Similarity evaluation between two nodes is performed using random walk, which is explained in this section.
The generated knowledge graph explained in Section III is converted to a Markov Chain, taking into account all nodes and properties, including their inverses. Note that inverses for and are also generated. Similarity query is likened to traversing from some starting input nodes, where the nodes that most likely the random walk will end up in is the most similar item (output), given that correct node type is selected. This paper will employ the random walk employed in .
Given a node, , in the Markov Chain, to walk away from the node, one needs to first choose the edge (property) type to move out from. Then, the next node that the chosen edge type is leading to is randomly chosen. Lets denote the directed edge going from node to node . Given
, then, the probability of choosingis uniform over all label types that extended out of , which is:
is then uniformly chosen given the edge type . Let’s define the set of nodes that extends from given as . Thus, the probability of choosing node given node and edge is:
The uniform assumption can be generalized to involve non-uniform probability, but in this work, since no model is used to decide the weights, uniform distribution is used.
Given the Markov Chain and the weights described in Equation 1 and 2, lazy walk, a variant of random walk that includes random stopping is used to traverse the knowledge graph. Lets denote as the probability that the random walk will end up in after steps, and lets denote:
Then, given a stopping probability of , the probability that the random walk will stop at after infinite steps is:
As in , instead of infinite steps, an approximation of will be used. Likewise, is set to .
Therefore, for every knowledge graph, will be generated to perform information retrieval. During querying, input is an initial distribution over all nodes , where . For example, if a question node is selected as input, then whereas it is for the others. Similarity can then be calculated via . The result will be filtered according to the intended output type and ranked.
V Case Study
Case study is performed on the Computer Science foundational course in the National University of Tainan. This course is based on the “Foundations of Computer Science” textbook by Behrouz Forouzan .
The e-book is constructed such that they summarize some of the topics in the original printed book, in a way that meets the learning outcome for the course. There are altogether 11 chapters for the e-book, which covers topics ranging from computer architecture to programming languages and artificial intelligence. Domain experts construct the e-book in HTML format that covers both the description texts and questions for every chapter. According to procedures described in SectionIII-C
, the topics and descriptions as well as their relationships are constructed. As this is a preliminary study, the intention of the case study is to determine performance given different construction of knowledge graphs, which at the same time, should not pose a huge overhead for domain experts. For controlled condition, no external knowledge base facts are used at the moment. Besides, the construction of most teaching materials doesn’t have the luxury of a pre-made knowledge base that caters for its needs. Thus, after generating the facts, the database has about 1600 nodes and 14K triples. Given the small database, normal Matrix implementation of the random walk is used in this case study. For scalability, sampling approach can be used.
V-a Evaluation on Different Construct of Knowledge Graph
Knowledge graph is constructed given an initial specification of topic hierarchy, description sequence and topic-description link. Such knowledge graph may be too sparse for information retrieval. At the same time, to construct a dense graph, huge overhead is required. This is aggravated by the fact that more domain experts like teachers may not be familiar with knowledge graph construction. To alleviate this problem, one can use reasoners to generate more facts to enable denser graphs. In this work, FaCT++ is used for fact generation  given the e-book ontologies provided by the domain experts, after which these facts are sent to the online database. FaCT++ is a reasoner for OWL DL, which is the ontology representation of the e-book. In this work, additional facts are generated pertaining to instance-class relationship, class-class relationship and property inverses. One can refer to  for more details about OWL DL and fact construction.
Evaluation is made between knowledge graphs with and without fact generation, as well as the inclusion of additional links from words. Comparison is done using question query given some inputs. Random input queries are constructed, and the ranked outputs are evaluation based on Mean Average Precision (MAP). The top 10 outputs are used for MAP calculation. As there are no benchmark or labeled data to compare to, a number of human evaluators are employed to determine the relevancy of the returned questions from the query. The MAP result is shown in Table I. It can be observed that with the generated facts via OWL reasoner, the query result will improve significantly. Such generation is automatic, thus, does not add to the e-book construction overhead that will be imposed on the domain expert. Although not shown, tests are also performed given larger step and lower stopping probability , but knowledge graph with the generated facts still fair significantly. This shows that the paths (that takes into account knowledge extracted from instance-class and class-class relationships) can effectively guide the random walk. Note that the low MAP score is due to some queries output having insufficient questions in the database. Given that such questions have been exhausted, the remaining output in the rank will likely be deemed irrelevant by human evaluators. In that case, one should have at least 10 questions in store for each possible queries.
Word linkages (described in Section III-D) will also contribute some improvements, though not significant, as currently, the database is too small to be judged. Despite that, word linkages are very important in that students can query with words as input. Another benefit is that, with such word linkages, nodes with no explicit linkages set by the domain expert can also be queried based on the texts contained inside.
|e-book facts + word lingkages||43.27||3.57 %|
|e-book facts + generated facts||67.87||62.45 %|
|e-book facts + generated facts+word lingkages||68.62||64.24 %|
V-B e-Book Deployment
The e-book is set up as a website, which constantly updates its contents based on the database. As shown in Figure 1, representation (website) is separated from fact generation modules, which eases the work for both domain expert and website designer as they don’t have to consider the technicalities of their counterparts. The website is set up in such a way that students can easily scroll as well as interact with the texts. The front page of the e-book is shown in Figure 6. Student may query for related questions or book texts given a set of descriptions they have recorded. For example, assuming the student have read about number system, and that he/she recorded the paragraph on hexadecimal number system and binary number system (where these two paragraphs do not mention about conversion between number systems). They can query for related questions, which will return results like questions for the conversion between hexadecimal and binary, ranked according to relevancy. This example query is shown in Figure 7.
With the advent of blended and personalized learning, technologies should be applied to reading for more efficient knowledge search, navigation and organization. Instead of solely replacing printed books, e-books should be augmented with such capabilities. This work is gearing towards this direction, where intelligent functionalities are endowed to the e-book without significant authoring overhead. This work assumes the book to be represented as a knowledge graph. To enable intelligent functionalities with the support of the knowledge graph, preliminary work on information retrieval within the e-book is done. Knowledge graph construction method is proposed, where additional facts are generated automatically from reasoners and word-linkages. Information retrieval is then realized through random process on the knowledge graph. Evaluation shows that the method of construction not only has little authoring overhead, but that relevant information can be retrieved. The e-book in this work is also been deployed as a website for students.
This research is supported by the Ministry of Science and Technology of Taiwan (MOST 106-2811-E-024-003, MOST 106-3114-E-024-001, MOST 106-2221-E-024-019)
-  T.-H. Liang, “The effects of keyword cues and 3r strategy on children’s e-book reading,” Journal of Computer Assisted Learning, vol. 31, no. 2, pp. 176–187, 2015.
-  N. N. Chan, C. Walker, and A. Gleaves, “An exploration of students’ lived experiences of using smartphones in diverse learning contexts using a hermeneutic phenomenological approach,” Computers & Education, vol. 82, pp. 96–106, 2015.
-  D. B. Daniel and W. D. Woody, “E-textbooks at what cost? performance and use of electronic v. print texts,” Computers & Education, vol. 62, pp. 18–23, 2013.
-  M. C. Murray and J. Pérez, “E-textbooks are coming: Are we ready,” Issues in Informing Science and Information Technology, vol. 8, no. 6, pp. 49–60, 2011.
-  A. M. Embong, A. Noor, R. Ali, Z. Bakar, and A. Amin, “Teachers’ perceptions on the use of e-books as textbooks in the classroom,” World Academy of Science, Engineering and Technology, vol. 70, pp. 580–586, 2012.
-  R. Junco and C. Clem, “Predicting course outcomes with digital textbook usage data,” The Internet and Higher Education, vol. 27, pp. 54–63, 2015.
-  X. Gu, B. Wu, and X. Xu, “Design, development, and learning in e-textbooks: What we learned and where we are going,” Journal of Computers in Education, vol. 2, no. 1, pp. 25–41, 2015.
-  C.-S. Lee, M.-H. Wang, J.-L. Yu, K.-H. Lin, T.-T. Lin, S.-C. Yang, and S.-L. Cho, “Fml-based intelligent adaptive assessment platform for learning materials recommendation,” in 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2015), 2015, pp. 1–8.
-  M. Kim, K.-H. Yoo, C. Park, and J.-S. Yoo, “Development of a digital textbook standard format based on xml,” in Advances in computer science and information technology. Springer, 2010, pp. 363–377.
-  W. D. Woody, D. B. Daniel, and C. A. Baker, “E-books or textbooks: Students prefer textbooks,” Computers & Education, vol. 55, no. 3, pp. 945–948, 2010.
-  N. Abdullah and F. Gibb, “Students’ attitudes towards e-books in a scottish higher education institute: Part 1,” Library review, vol. 57, no. 8, pp. 593–605, 2008.
-  M. Kim, K.-H. Yoo, C. Park, J.-S. Yoo, H. Byun, W. Cho, J. Ryu, and N. Kim, “An xml-based digital textbook and its educational effectiveness,” Advances in Computer Science and Information Technology, pp. 509–523, 2010.
-  D. P. Brunet, M. L. Bates, J. R. Gallo, and E. A. Strother, “Incoming dental students’ expectations and acceptance of an electronic textbook program,” Journal of dental education, vol. 75, no. 5, pp. 646–652, 2011.
-  P. F. Chong, Y. P. Lim, and S. W. Ling, “On the design preferences for ebooks,” IETE Technical Review, vol. 26, no. 3, pp. 213–222, 2009.
-  D. Butler, “The textbook of the future,” Nature, vol. 458, pp. 568–570, 2009.
-  E.-L. Lim and K. F. Hew, “Students’ perceptions of the usefulness of an e-book with annotative and sharing capabilities as a tool for learning: a case study,” Innovations in Education and Teaching International, vol. 51, no. 1, pp. 34–45, 2014.
-  K. A. Sheen and Y. Luximon, “Relationship between academic discipline and user perception of the future of electronic textbooks,” Procedia Manufacturing, vol. 3, pp. 5845–5850, 2015.
-  K. Sheen and Y. Luximon, “Student perceptions on future components of electronic textbook design,” Journal of Computers in Education, vol. 4, no. 4, pp. 371–393, Dec 2017.
-  R. McFall, H. Dershem, and D. Davis, “Experiences using a collaborative electronic textbook: bringing the guide on the side home with you,” ACM SIGCSE Bulletin, vol. 38, no. 1, pp. 339–343, 2006.
-  C.-S. Lee, M.-H. Wang, S. Nohara, K.-Y. Wu, and R. Saga, “Fml-based feature similarity assessment agent for japanese/taiwanese language learning,” in 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2016), 2016, pp. 1073–1079.
-  K. Collins-Thompson and J. Callan, “Query expansion using random walk models,” in Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005, pp. 704–711.
K. Toutanova, C. D. Manning, and A. Y. Ng, “Learning random walk models for
inducing word dependency distributions,” in
Proceedings of the twenty-first international conference on Machine learning, 2004, pp. 1–8.
-  A. Arnold and W. W. Cohen, “Information extraction as link prediction: Using curated citation networks to improve gene detection,” in International Conference on Wireless Algorithms, Systems, and Applications. Springer, 2009, pp. 541–550.
-  W. W. Cohen and E. Minkov, “A graph-search framework for associating gene identifiers with documents,” BMC Bioinformatics, vol. 7, no. 1, p. 440, Oct 2006.
-  D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,” journal of the Association for Information Science and Technology, vol. 58, no. 7, pp. 1019–1031, 2007.
-  F. Yang, Z. Yang, and W. W. Cohen, “Differentiable learning of logical rules for knowledge base completion,” arXiv preprint arXiv:1702.08367, 2017.
-  N. Lao, T. Mitchell, and W. W. Cohen, “Random walk inference and learning in a large scale knowledge base,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011, pp. 529–539.
-  E. Minkov, W. W. Cohen, and A. Y. Ng, “Contextual search and name disambiguation in email using graphs,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp. 27–34.
-  B. Forouzan, Foundations of Computer Science. London, United Kingdom: Cengage Learning EMEA, 2014.
-  D. Tsarkov and I. Horrocks, “FaCT++ description logic reasoner: System description,” in Automated Reasoning, ser. Lecture Notes in Computer Science, U. Furbach and N. Shankar, Eds. Springer Berlin Heidelberg, 2006, vol. 4130, pp. 292–297.
-  D. Allemang and J. Hendler, Semantic web of the working ontologist. Morgan Kaufman, 2012.