A Map of Knowledge
Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure, comprised of elements from digital records of our interactions, can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the literal descriptions of elements onto this behaviorally informed representation. We use the course enrollment behaviors of 124,000 students at a public university to learn vector representations of its courses. From these behaviorally informed representations, a notable 88 (e.g., department and division), as well as 40 constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Math H1B as Physics 7B is to Physics H7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions. We find that the representations learned from enrollments resolved course vectors to a level of semantic fidelity exceeding that of their catalog descriptions, depicting a vector space of high conceptual rationality. We end with a discussion of the possible mechanisms by which this knowledge structure may be informed and its implications for data science.
READ FULL TEXT