Learning from teacher with partial knowledge
All teaching methods discussed in previous sections assume teacher has full access to the true manifold. However, in reality, the teacher often does not know the underlying manifold and often does not have full control over which data can be used to teach. In this section, we consider teaching in a much more practical scenario that allows a teacher, who may have limited knowledge, to teach with unconstrained data. We illustrate how this would assist the learner to improve their estimation of the relevant topological and geometrical information from the data. Following the standard setting of topological data analysis [carlsson2009topology, chazal2017introduction], we assume that the data is a finite set of points sampled from the true manifold . Using an algorithm in class (Sec LABEL:sec:points) with different ’s, the learner obtains a summary of estimations of in form of persistent homology (Sketched below, see details in [edelsbrunner2010computational, chazal2016structure]). Rather than picking a teaching set directly from , the teacher first selects a subset from , then passes to the learner in a proper sequential format according to algorithm (Sec Document) to demonstrate desired topological features of . Roughly speaking, persistent homology tracks topological changes as the learner’s approximation of varies with . Based on algorithm , for each , the learner builds a union of balls , centered at with radii equal to . Consider the nested family of . Given a non-negative integer , the inclusion , for , naturally induces a linear map between their -th homology groups and . The set of all -th homology groups together with all the linear maps induced by inclusions form a persistence module, which can be intuitively viewed as
It is shown in [chazal2016structure] that when is finite, the persistence module obtained for a fixed can be decomposed into direct sum of interval modules of the form:
where is the identity map. Recall that each -th homology group is a direct sum of , with each copy of represents a -dim loop. Hence, essentially each interval module records the lifespan of a loop, which can be depicted by a interval from the birth radius to the death radius of the loop. Therefore, each persistence module forms a collection of intervals called the persistence barcode. Conventionally, the longer an interval in the barcode, the more persistent, and thus relevant, is the corresponding topological feature. As discussed before, the difficulty of learning a manifold increases dramatically as the reach of drops. Now we illustrate how teaching helps in these situations by the following example. Our example is based on a two dimensional manifold embedded in , but method works for all manifolds.
Let the true manifold be the blueblue barbell shaped annulus shown in Figure Document with reach . Assume that the learner analyzes randomly sampled data by TDA and the teacher knows that contains a 1-dim hole. Based on , three distinct points are required to form a teaching sequence for this hole. When fewer than three data points are observed by the learner, the teacher would simply wait until more data were collected. Suppose that the learner gets three data as shown. The corresponding persistence barcode of is empty for (no 1-dim loop is ever formed for any choice of ). With , the teacher may teach by marking these points sequentially as for example . Comparing the teacher’s demonstration with the barcode, the learner would realize that is homotopy equivalent to a circle and currently points gathered are not sufficient to extract any accurate geometrical information. Further suppose that the learner intends to estimate the geometry of and so more points are sampled. A given data set is called feasible, if the learner is able to derive the true geometry of from with some , i.e. if there exists such that is homotopy equivalent to 555It is possible that is homotopy equivalent to for . However the top and the bottom of the narrow middle part of will be connected up in such , which leads to wrong geometry.. To estimate the lower bound on size of a feasible data set, we randomly sample data sets from with increasing sizes and 20 simulations for each size. Empirically it shows that feasible data sets appear only after and and appears in every simulation for . Figure Document(a) shows the persistence barcode for a data set of size .666The barcode was constructed using the GHDHI library [maria2014gudhi]. The redred bars are the longest four intervals for , which reflects the number of connected components. After , only one red bar remains which indicates contains a single component for any . The greengreen bars are the intervals for (ignoring intervals of length less than 0.05), which represents the number of 1-dim holes. The top green bar spans over and indicates that there is a 1-dim loop forms at and persists until . The bottom green bar spans over and indicates that another 1-dim loop forms at and persists until . All randomly sampled data sets of size exhibit similar persistence barcode with two long intervals for as shown. Focusing on the range of where -dim holes exist, on average 78% choice of
(with variance 0.0002) indicates two-dim loops over all simulations. Thus, without teaching, the learner would likely to conclude a wrong topological information, , with high confidence. In contrast, with a teaching set of three points, the learner is able to not only infer the correct topology immediately after teaching but also accurately estimate the geometry of by focusing on with . Figure Document(b) plots the average learning accuracy of
’s geometry for different types of learners. The blue curve shows the learners with a topological teacher who are assumed to follow a Bernoulli distribution since they are able to infer the correct geometry with every feasible data set. The orange curve is corresponding to learners who chooseuniformly from the interval where barcode for is not empty (Variances are omitted as their magnitudes are bounded above by ). The green curve shows learners, who approximate by with the most persistent homology, stay incorrect on geometry even with increasing data size. Clearly learner’s acquisition of geometry are accelerated by teaching topology.
Persistent homology has started to attract attention in machine learning [carlsson2008local, chazal2013persistence, li2014persistence, reininghaus2015stable]. However, levering these topological features for learning poses considerable challenges because the relevant topological information is not carried by the whole persistence barcode but is concentrated in a small region of that may not be obvious [hofer2017deep]. Teaching by demonstration resolves these challenges by allowing the the learner to extract the most suitable topological information after the correct homology appears in the persistence barcode, and zooming the analysis of ’s geometry into the most appropriate range of with high data efficiency. More importantly, teaching by demonstration allows accumulation of information across learners, whereas other forms of teaching can only transmit information from an already knowledgeable teacher. As pointed out in Sec LABEL:sec:points, the method of teaching by sampling points essentially assumes that the teacher knows the true manifold . However, given the intractability of manifold learning in general, there is no plausible way for the teacher to have access to . On such accounts, teaching does not resolve the true challenge of learning and instead passes off the problem to a teacher for whom the learning problem does not exist. The key advantage of teaching from demonstrations is that it allows the teacher to convey critical information of without knowing the entire manifold. For example, let be a torus as in Figure Document with . The teacher may only have enough observations to conclude that there is a loop homotopy equivalent to the greengreen circle . With sequential data, the teacher could easily pass the only loop he observed, which allows the learner to focus on the region of where exists. In addition, from a teacher’s perspective, much less data is needed to learn the topology of an irregular manifold than its geometry. For instance, let be the 1-dim manifold shown in Figure Document(b). Denote the reach of by and the radius of the left arc in by . Note that the teacher only needs -dense data to learn the topology of , whereas -dense data to learn the geometry. In fact, for any manifold , we may define its topological reach to be the largest number such that is homotopy equivalent to for any , where . According to Proposition 3.2 in [niyogi2008finding], for the same confidence level, points needed to achieve -dense is polynomial increasing with . Therefore when is irregular, i.e. is significantly less than , the amount of data needed to achieve -dense is much fewer than -dense. Since the topology of remains the same for data beyond -dense, it requires much less data to learning the topology of an irregular manifold than its geometry.
There are three main areas of related work: formal approaches to manifold learning, machine teaching, and human learning from teaching. [niyogi2008finding] describe a PAC learning framework for learning the homology of a manifold, which we directly build upon in Section LABEL:sec:points
. Extensions have, for example, directly tested the manifold hypothesis[fefferman2016testing], and estimated the reach of a manifold [aamari2017estimating]
. This line of work assumes data are isolated sample points and are not formualted by a teacher. The literatures on machine teaching, algorithmic teaching, and Bayesian teaching investigate the implications of having a teacher for machine learning algorithms. Machine teaching has focused on the problem of teaching standard machine learning algorithms, most commonly formalizing the single best set of teaching points (maximize the probability of the true hypothesis)[Zhu2015, Liu2016]. Algorithmic teaching similarly investigates the problem of teaching but within the deterministic algorithmic learning framework [doliwa2014recursive]. Bayesian teaching has been investigated with standard probabilistic machine learning algorithms [eaves2016toward, yangexplainable]. All of these assume that the relevant data are points, rather than more structured data and all require that the teacher knows the correct answer. The literature on human learning emphasizes the structured nature of the data presented by teachers in the forms of pairing data to form comparisons [Shafto2008, Shafto2014] and series data to form demonstrations [brand2002evidence, ho2016showing]. Both the machine teaching and Bayesian teaching approaches listed above have also been applied to teaching human learners in simple cognitive science-style experiments. Bayesian teaching has been used model more realistic phenomena such as infant-directed speech [eaves2016infant].
We considered the problem of teaching low-dimensional manifolds using structured data, which extends mathematical approaches to manifold learning and research in machine learning toward learning contexts more consistent with the richness of human learning. Building on prior work in manifold learning, we formalize teaching manifolds from data and comparisons, observe that contrary to intuition, teaching does not facilitate learning as much as one would expect due to constraints imposed by the reach of the manifold. Considering learning from teaching demonstrations---sequences of data points---we show that learning can be greatly facilitated by teaching. This approach relies on separating teaching the geometry of the manifold itself from teaching the topology of the manifold. Focusing on teaching only the topology, we show that sequences of points can be used to represent the homology groups of the manifold, which compactly capture important abstract structure that can be used to facilitate future learning. Moreover, this relaxes the overly stringent and implausible requirement that the teacher must know the manifold exactly. Instead, the teacher can proceed with only an accurate reconstruction of the topologically-relevant reach, which is almost always less stringent than the true reach. Due to the polynomial increase in data required to achieve reductions in the reach, this is a substantial improvement. Future work may extend this approach toward more naturalistic learning problems faced by humans or solved by machine learning. The approaches are not restricted to manifold teaching and it would be interesting to explore teaching more general mathematical objects with low dimensional topological structures, such as graphs, CW-complexes and even groups.