1 Introduction
The shift from mass production to mass customization and personalization [Hu.2013]
requires high standards on production processes. In spite of the high variance between different products and small batch sizes of the products to be manufactured, the product quality in mass customization has to be comparable to the quality of products from established mass production processes. It is therefore essential to keep process rampup times low and to achieve the required product quality as directly as possible. This requires a profound and solid understanding of the dependencies between process parameters and quality criteria of the final product, even before the start of production (SOP). Various ways exist to gain this kind of process knowledge: for example, by carrying out experiments, setting up simulations, or exploiting available expert knowledge. In production, expert knowledge in particular plays a central role. This is because complex causeeffect relationships operate between the inputoutput parameters during machining, and these generally have to be set in a resultoriented manner in a short amount of time without recourse to realtime data sets. Indeed, process rampup is still commonly done by process experts purely based on their knowledge. Furthermore, many processes are controlled by experts during production to ensure that consistently high quality is produced.
In the course of digitalization, the acquisition of and the access to data in manufacturing have increased significantly in recent years. Sensors, extended data acquisition by the controllers themselves, and the continuous development of lowcost sensors allow for the acquisition of large amounts of data [Wuest.2016]. Accordingly, more and more datadriven approaches, most notably machine learning methods, are used in manufacturing to describe the dependencies between process parameters and quality parameters [Weichert.2019]. In principle, such datadriven methods are suitable for the rapid generation of quality prediction models in production, but the quality of machine learning models crucially depends on the amount and the information content of the available data. The data can be generated from experiments or from simulations. In general, experiments for process development or improvement are expensive and accordingly the number of experiments to be performed should be kept to a minimum. In this context, design of experiment can be used to obtain maximum information about the process behavior with as few experiments as possible [Montgomery.2017], [Fedorov.2014]. Similarly, the generation of data using realistic simulation models can be expensive as well, because the models must be created, calibrated, and – depending on the process – high computing capacities are required to generate the data. Concluding, the data available in manufacturing before the SOP is typically rather small.
This paper introduces a novel and general methodology to leverage expert knowledge in order to compensate such data sparsities and to arrive at prediction models with good predictive power in spite of small datasets. Specifically, the proposed methodology is dedicated to shape expert knowledge, that is, expert knowledge about the qualitative shape of the inputoutput relationship to be learned. Simple examples of such shape knowledge are prior monotonicity or prior convexity knowledge, for instance. Additionally, the proposed methodology directly involves process experts in capturing and in incorporating their shape knowledge into the resulting prediction model.
In more detail, the proposed methodology proceeds as follows. In a first step, an initial, purely databased prediction model is trained. A process expert then inspects selected, particularly informative graphs of this model and specifies in what way these graphs confirm or contradict his shape expectations. In a last step, the thus specified shape expert knowledge is incorporated into a new prediction model which strictly complies with all the imposed shape constraints. In order to compute this new model, the semiinfinite optimization approach to shapeconstrained regression is taken, based on the algorithms from [Schmid.2021]. In the following, this approach is referred to as the SIASCOR method for brevity. While a semiinfinite optimization approach has also been pursued in [Kurnatowski.2021], the algorithm used here is superior to the referencegrid algorithm from [Kurnatowski.2021], both from a theoretical and from a practical point of view. Additionally, the paper [Kurnatowski.2021] treats only a single kind of shape constraints, namely monotonicity constraints.
The general methodology is applied to the exemplary process of grinding with brushes. In spite of the small set of available measurement data, the methodology proposed here leads to a highquality prediction model for the surface roughness of the brushed workpiece.
The paper is organized as follows. Section 2 gives an overview of the related work. In Section 3, the general methodology to capture and incorporate shape expert knowledge is introduced, and its individual steps are explained in detail. Section 4 describes the application example, that is, the brushing process. Section 5 discusses the resulting prediction models applied to the brushing process and compares them to more traditional machine learning models. Section 6 concludes the paper with a summary and an outlook on future research.
2 Some related works
In [Weichert.2019] it is shown that machine learning models used for optimization of production processes are often trained with relatively small datasets. In this context, attempts are often made to represent complex relationships with complex models and small datasets. Also in other domains, such as process engineering [Napoli.2011] or medical applications [Shaikhina.2017], small amounts of data play a role in the use of machine learning methods. Accordingly, there already exist quite some methods to train complex models with small datasets in the literature. These known approaches to sparsedata learning can be categorized as purely databased methods on the one hand and as expertknowledgebased methods on the other hand. In the following literature review, expertknowledgebased approaches that typically require large – or, at least, nonsparse – datasets are not included. In particular, the projection [Lin.2014, Schmid.2020] and rearrangementbased [Dette.2006, Chernozhukov.2009] approaches to monotonic regression are not reviewed here.
2.1 Purely databased methods for sparsedata learning in manufacturing
An important method for training machine learning models with small datasets is to generate additional, artificial data. Among these virtualdata methods the megatrenddiffusion (MTD) technique is particularly common. It was developed by [Li.2007] using flexible manufacturing system scheduling as an example. In [Li.2013]
virtual data is generated using a combination of MTD and a plausibility assessment mechanism. In the second step, the generated data is used to train an artificial neural network (ANN) and a support vector regression model with sample data from the manufacturing of liquidcrystaldisplay (LCD) panels. Using multilayer ceramic capacitor manufacturing as an example, bootstrapping is used in
[Tsai.2008] to generate additional virtual data and then train an ANN. The authors of [Napoli.2011]also use bootstrapping and noise injection to generate virtual data and consequently improve the prediction of an ANN. The methodology is applied to estimate the freezing point of kerosene in a topping unit in chemical engineering. In
[Chen.2017]virtual data is generated using particle swarm optimization to improve the prediction quality of an extreme learning machine model.
In addition to the methods for generating virtual data and the use of simple machine learning methods such as linear regression, lasso or ridge regression
[Bishop.2006], other machine learning methods from the literature can also be used in the context of small datasets. For example, the multimodel approaches in [Li.2012], [Chang.2015] can be mentioned here. The multimodel approaches are used in the field of LCD panel manufacturing to improve the prediction quality. Another concrete example are the models described in [Torre.2019], which are based on polynomial chaos expansion. These models are also suitable for learning complex relationships in spite of few data points.2.2 Expertknowledgebased methods for sparsedata learning in manufacturing
An extensive general survey about integrating prior knowledge in learning systems is given in [Rueden.2021]. The integration of knowledge depends on the source and the representation of the knowledge: for example, algebraic equations or simulation results represent scientific knowledge and can be integrated into the learning algorithm or the training data, respectively.
Apart from this general reference, the recent years brought about various papers on leveraging expert knowledge in specific manufacturing applications. Among other things, these papers are motivated by the fact that production planning becomes more and more difficult for companies due to mass customization. In order to improve the quality of production planning, [Schuh.2019] show that enriching production data with domain knowledge leads to an improvement in the calculation of the transition time with regression trees.
Another broad field of research is knowledge integration via Bayesian networks. In
[Zhang.2020] domain knowledge is incorporated using a Bayesian network to predict the energy consumption during injection molding. In [Lokrantz.2018] a machine learning framework is presented for root cause analysis of faults and quality deviations, in which knowledge is integrated via Bayesian networks. Based on synthetically generated manufacturing data, an improvement of the inferences could be shown compared to models without expert knowledge. In [He.2019] Bayesian networks are used to inject expert knowledge about the manufacturing process of a cylinder head to evaluate the functional state of manufacturing on the one hand, and to identify causes of functional defects of the final product on the other hand. Another possibility of root cause analysis using domainspecific knowledge is described by [Rahm.2018]. Here, knowledge is acquired within an assistance system and combined with machine learning methods to support the diagnosis and elimination of faults occurring at packaging machines.In [Lu.2017], knowledge of the electrochemical micromachining process is incorporated into the structure of a neural network. It is demonstrated that integrating knowledge achieves better prediction accuracy compared to classical neural networks. Another way to integrate knowledge about the additive manufacturing process into neural networks is based on causal graphs and proposed by [Nagarajan.2019]. This approach leads to a more robust model with better generalization capabilities. In [Ning.2019], a control system for a grinding process is presented in which, among other things, a fuzzy neural network is used to control the surface roughness of the workpiece. Incorporating knowledge into models using fuzzy logic is a wellknown and proven method, especially in the field of grinding [Brinksmeier.2006].
3 A methodology to capture and incorporate shape expert knowledge
As has been pointed out in the previous section, there are expertknowledgefree and expertknowledgebased methods to cope with small datasets in the training of machine learning models in manufacturing. An obvious advantage of expertknowledgebased approaches is that they typically yield models with superior predictive power, because they take into account more information than the pure data. Another clear advantage of expertknowledgebased approaches is that their models tend to enjoy higher acceptance among process experts, because the experts are directly involved in the training of these models.
Therefore, this paper proposes a general methodology to capture and incorporate expert knowledge into the training of a powerful prediction model for certain process output quantities of interest. Specifically, the proposed methodology is dedicated to shape expert knowledge, that is, prior knowledge about the qualitative shape of the considered output quantity as a function
(3.1) 
of relevant process input parameters . Such shape expert knowledge can come in many forms. An expert might know, for instance, that the considered output quantity is monotonically increasing w.r.t. , concave w.r.t. , and monotonically decreasing and convex w.r.t. .
In a nutshell, the proposed methodology to capture and incorporate shape expert knowledge proceeds in the following four steps.

Training of an initial purely databased prediction model

Inspection of the initial model by a process expert

Specification of shape expert knowledge by the expert

Integration of the specified shape expert knowledge into the training of a new prediction model which strictly complies with the imposed shape knowledge.
This new and shapeknowledgecompliant prediction model is computed with the help of the SIASCOR method [Schmid.2021] and it is therefore referred to as the SIASCOR model. After a first run through the steps above, the shape of the SIASCOR model can still be insufficient in some respects, because the shape knowledge specified at the first run might not have been complete, yet. In this case, step two to four can be passed through again, until the expert notices no more shape knowledge violations in the final SIASCOR model. Schematically, this procedure is sketched in Figure 1.
In the remainder of this section, the individual steps of the proposed methodology are explained in detail. The input parameter range on which the models are supposed to make reasonable predictions is always denoted by the symbol . It is further assumed that is a rectangular set, that is,
(3.2) 
with lower and upper bounds and for the th input parameter . Additionally, the – typically small – set of measurement data available for the relationship (3.1) is always denoted by the symbol
(3.3) 
3.1 Training of an initial prediction model
In the first step of the methodology, an initial purely databased model is trained for (3.1), using standard polynomial regression with ridge or lasso regularization [Bishop.2006]. So, the initial model is assumed to be a multivariate polynomial
(3.4) 
of some degree , where is the vector consisting of all monomials of degree less than or equal to and where is the vector of the corresponding monomial coefficients. In training, these monomial coefficients are tuned such that optimally fits the data and such that, at the same time, the ridge or lasso regularization term is not too large. In other words, one has to solve the simple unconstrained regression problem
(3.5) 
where and are suitable regularization hyperparameters ( corresponding to lasso and corresponding to ridge regression). As usual, these hyperparameters are chosen such that some crossvalidation error becomes minimal.
3.2 Inspection of the initial prediction model
In the second step of the methodology, a process expert inspects the initial model in order to get an overview of its shape. To do so, the expert has to look at  or dimensional graphs of the initial model. Such  and dimensional graphs are obtained by keeping all input parameters except one (two) constant to some fixed value(s) of choice and by then considering the model as a function of the one (two) remaining parameter(s). As soon as the number of inputs is larger than two, there are infinitely many of these graphs and it is notoriously difficult for humans to piece them toghether to a clear and coherent picture of the model’s shape [Oesterling.2016]. It is therefore crucial to provide the expert with a small selection of particularly informative graphs, namely graphs with particularly high model confidence and graphs with particularly low model confidence.
A simple method of arriving at such high and lowfidelity graphs is as follows. Choose those two points , from a given grid
(3.6) 
in with minimal or maximal accumulated distances from the data points, respectively. In other words,
(3.7) 
where the gridpoint indices and are defined by
(3.8)  
(3.9) 
with being the initial model’s prediction at the gridpoint . Starting from the two points and , one then traverses each input dimension range. In this manner, one obtains, for each input dimension , a dimensional graph of the initial model of particularly high fidelity (namely the function ) and a dimensional graph of particularly low fidelity (namely the univariate function ). See Figure 2 for exemplary high and lowfidelity graphs as defined above.
An alternative method of obtaining low and highfidelity input parameters and graphs is to use designofexperiments techniques [Fedorov.2014], but this alternative approach is not pursued here.
After inspecting particularly informative graphs as defined above, the expert can further explore the initial model’s shape by navigating through and investigating arbitrary graphs of the initial model with the help of commercial software or standard slider tools (from Python Dash or PyQt, for instance).
3.3 Specification of shape expert knowledge
In the third step of the methodology, the process expert specifies his shape expert knowledge about the inputoutput relationship (3.1) of interest. In this process, the expert can greatly benefit from the initial model and especially from the high and lowfidelity graphs generated in the second step. Indeed, with the help of these graphs, the expert can, on the one hand, easily detect shape behavior that contradicts his expectations and, on the other hand, identify shape behavior that already matches his expectations for the shape of (3.1). When inspecting the graphs from Figure 2, for instance, the expert might notice that the initial model exceeds or deceeds physically meaningful bounds. Similarly, the expert might notice that the initial model

is convex w.r.t. (as he expects)

is not monotonically decreasing w.r.t. (contrary to what he expects).
All the shape knowledge that is noticed and worked out in this manner can then be specified and expressed pictorially in the form of simple schematic graphs like the ones from Figure 3.
3.4 Integration of shape expert knowledge into the training of a new prediction model
In the fourth step, the shape expert knowledge specified in the third step is integrated into the training of a new and shapeknowledgecompliant prediction model, using the SIASCOR method. Similarly to the initial model, the SIASCOR model is assumed to be a multivariate polynomial
(3.10) 
of some degree (not necessarily equal to the degree of the initial model) and , represent the monomials and the corresponding monomial coefficients as in (3.4). In contrast to the initial model training, however, the monomial coefficients are now tuned such that not only optimally fits the data but also strictly satisfies all the shape constraints specified in the third step. In other words, one has to solve the constrained regression problem
(3.11) 
subject to the shape constraints specified in the third step. In order to do so, the core semiinfinite optimization algorithm from [Schmid.2021] is used, which covers a large variety of allowable shape constraints.
Some simple examples of shape constraints covered by the algorithm are boundedness constraints
(3.12) 
with given lower and upper bounds , monotonic increasingness or decreasingness constraints
(3.13)  
(3.14) 
in a given input dimension , as well as convexity or concavity constraints
(3.15)  
(3.16) 
in a specified input dimension . A more complex kind of shape constraint that is also covered by the employed algorithm is the socalled rebound constraint. It constrains the amount the model can rise after a descent to be no larger than a given rebound factor . In mathematically precise terms, a rebound constraint in the th input dimension takes the following form:
(3.17) 
for all values of the input parameters in the remaining dimensions , where
(3.18) 
and where is the prescribed rebound factor. Sample graphs of a model that satisfies this rebound constraint with can be seen in Figure 3.
An important asset of the approach to shapeconstrained regression taken here is that the core algorithm can handle arbitrary combinations of the kinds of shape constraints mentioned above, in an efficient manner. Also, the core algorithm is entirely implemented in Python which makes it particularly easy to use and interface. Another asset of the proposed approach is that the considered shapeconstrained regression problem (3.11) features no hyperparameter except for the polynomial degree . Consequently, no tuning of hardtointerpret hyperparameters is necessary. Concerning other, more theoretical, merits of the employed semiinfinite optimization algorithm, the reader is referred to [Schmid.2021].
4 Application example
4.1 The brushing process
The brushing process is a metalcutting process used for the grinding of metallic surfaces with the help of brushes. Its main applications are the deburring of precision components [Gillespie.1979], the structuring of decorative surfaces of glass [Novotny.2017], and the functional surface preparation of metals for subsequent process steps of joining [Teicher.2018]. Common to all these applications is that the brushing process functions as a finishing process for components with a high inherent added value. Additionally, brushing processes have established themselves in certain highly automated mass production processes [Kim.2012].
While the focus of [DIN8589]
is still on steel wires as brushing filaments, in recent years filaments made of plastic with interstratified abrasive grits have become much more important. Such filaments act only as carrier elements of the machining substrate and, accordingly, the corresponding brushing process can be classified as a process with a geometrically undefined cutting edge. In view of their increased relevance, only brushing filaments with interstratified abrasive grits are considered here. See Figure
4 for a schematic representation of the considered brushing processes.Apart from the material parameters of the workpiece, the machining process is influenced, on the one hand, by technological parameters of the process and, on the other hand, by a multitude of material parameters of the brush. Important technological parameters are the numbers of revolutions and of the brush and of the workpiece, the cutting depth , and the cutting time . The brush parameters relate to the individual filaments (length , diameter , modulus of elasticity, and other technical properties), their arrangement (axial, radial), and their coupling to the base body (cast, plugged). The cutting substrate as an abrasive grain is characterized, among other things, by the grain material, the grain concentration and the grain diameter . In addition, the shape of the brush is determined by its width and its diameter .
In view of this large variety of technological and material parameters, it is a challenging task to choose the tool and the tool settings such that a prescribed target value for the roughness of the brushed workpiece is reached quickly but also robustly. It is therefore important to have good prediction models for the surface roughness of the brushed workpiece.
In principle, such prediction models can be obtained from a comprehensive simulation of the brushing process [Wahab.2007], [Novotny.2017]. Such simulationbased models are expensive and complex because – in addition to the many process parameters mentioned above – the dynamic behavior of the tool has to be broken down to the filaments and microscopically to the individual grain in engagement. In particular, the dynamically changing tool diameter [Matuszak.2015] has to be taken into account. In addition to the challenging modeling procedure, the resulting models are typically expensive to evaluate. Currently, these factors still limit the applicability of simulationbased models in realworld process design and process control. And therefore it is important to build good alternative prediction models in brushing, for example, by using machine learning.
4.2 Input parameters, output parameter, and dataset
In this paper, such an alternative, machine learning model is built. Specifically, the modeled output quantity is the arithmeticmean surface roughness of the brushed workpiece,
(4.1) 
It is modeled as a function of particularly important process parameters of the brushing process, namely
(4.2) 
The dataset used for the training of the prediction model consists of measurement points. Table 1 shows the ranges of the process and quality parameters covered by the measurement data.
symbol  process parameter  value range  unit 

diameter of the abrasive grits, expressed in terms of the mesh size  
cutting time, that is, the time the brush is engaged, including contact with the workpiece  
number of revolutions of the brush  
number of revolutions of the workpiece  
cutting depth  
arithmeticmean roughness 
5 Results and discussion
In this section, SIASCOR is applied to the brushing process example. In particular, shape expert knowledge is integrated according to the methodology described in Section 3. Aside from SIASCOR, a purely datadriven Gaussian process regression (GPR) was conducted for the brushing example. In the end, the two regression models are compared and their advantages and shortcomings are discussed.
5.1 Initial model
As a first step, an initial purely databased model was trained to visually assist the process expert in specifying shape knowledge for the SIASCOR model. A polynomial model (3.4) with the relatively small degree
was used to prevent an overfit to the small dataset. The parameters of the model were computed via lasso regression with a learning rate
selected by means of crossvalidation using scikitlearn [Pedregosa.2011]. Additionally, prior to training the input variables were transformed with the standard transformation [Kuhn.2013] for all and then scaled to the unit hypercube. The standard transformation with the square root function lead to a better generalization performance.5.2 Capturing shape exptert knowledge
As a second step, for the inspection of the initial model, two points , of particularly high fidelity and of particularly low fidelity were computed according to (3.6)(3.8) (Table 2). The corresponding dimensional graphs of the initial model (anchored in these two points) are visualized in Figure 5. When inspecting and analyzing the shape of these graphs, the process expert detected several physical inconsistencies. For example, some of the initial model’s predictions for are significantly lower than the surface roughness that is technologically achievable with the brushing process. Another example is the violation of convexity along the direction. With these observations in mind, the expert specified shape constraints for the SIASCOR model in the form of the schematic graphs from Figure 6. Specifically, the expert imposed the boundedness constraint upon the surface roughness. Along the direction, the expert required monotonic decreasingness and convexity. In the direction of and , the model was required to be convex and to satisfy the rebound constraint (3.17) with . And finally, the model was constrained to be convex w.r.t. and monotonically increasing w.r.t. .
point  

400  106  1964  438  0.84  
800  480  1000  1000  0.25 
5.3 SIASCOR model
With the aforementioned shape constraints and the data described in Section 4.2, the SIASCOR model was trained as explained in Section 3.4. For the degree of the polynomial model, was found to be the best fit. Moreover, the variables were transformed with the root function for all and then scaled to the unit hypercube. Table 3 lists various performance indices and Figure 8 shows two plots of the final SIASCOR model.
5.4 GPR model
In addition to the SIASCOR model, a GPR model was trained for the sake of comparison since GPR with an appropriately chosen kernel is wellsuited for small datasets. As a kernel, the sum of an anisotropic Matérn kernel with
and a whitenoise kernel was chosen:
(5.1)  
where denotes the anisotropic norm of the component vector and where is if and otherwise. As usual, to optimize the hyperparameters and , the marginal likelihood was maximized according to [Williams.2006], using the Python package scikitlearn [Pedregosa.2011]. Due to the anisotropy of the Matérn kernel, for each input dimension , a separate hyperparameter is calculated. As for the SIASCOR model, the input variables were transformed with the root function for all and then scaled to the unit hypercube. Table 3 reports the pertinent performance indices and Figure 9 shows two plots of the final GPR model.
5.5 Comparison of SIASCOR and GPR
Table 3 compares the predictive power of the initial lasso, the SIASCOR and the GPR model on test data obtained by fold crossvalidation. It can be seen that the lasso and the SIASCOR models have similar averaged prediction errors and a similar averaged coefficient of determination on the test data, while the purely databased GPR model features slightly better prediction errors. This can also be seen from Figure 7.
model  RMSE []  MAE []  [–] 

Lasso  0.0272  0.0205  0.8353 
SIASCOR  0.0260  0.0193  0.8284 
GPR  0.0174  0.0142  0.7410 
Figures 8 and 9 juxtapose two plots of the SIASCOR and the GPR model, respectively. As can be seen, in contrast to the SIASCOR model, the GPR model is starkly nonconvex w.r.t.
. In other words, the GPR model is at odds with physical shape expert knowledge, while the SIASCOR model is not. As has been explained in Section
3, the reason is that SIASCOR explicitly incorporates all the shape knowledge provided by the process expert, while the GPR model relies on the scarce data alone.Another downside of the GPR approach is that the resulting models are typically quite sensitive w.r.t. the selected kernel class and that the selection of this kernel class is typically not very systematic but rather based on heuristic rules of thumb. Accordingly, the model selection in GPR is typically quite timeconsuming and cumbersome. In the SIASCOR method, by contrast, model selection is simple because the SIASCOR models have only one hyperparameter, namely the polynomial degree . Also, the interpretation of the shape constraints needed for the SIASCOR method is straightforward and, in any case, much clearer than the interpretation and selection of different GPR kernel classes.
As a matter of fact, the solution of the SIASCOR training problem (3.10) with the algorithm from [Schmid.2021] takes a bit more computational time than the hyperparemter optimization in GPR because semiinfinite optimization problems have a more complex (bilevel) structure than the (unconstrained) marginal likelihood maximization problems used in GPR. Indeed, in the dimensional brushing example considered here, the training of the SIASCOR model typically took around minutes calculated with a standard office computer. Yet, this is negligible in view of the aforementioned clear advantages of SIASCOR over GPR in terms of shapeknowledge compliance, model selection, and interpretability.
6 Conclusion and future work
In order to achieve target product qualities quickly and consistently in manufacturing, reliable prediction models for the quality of process outcomes as a function of selected process parameters are essential. Since the datasets available in manufacturing – and especially before SOP – are typically small, the construction of datadriven prediction models is a challenging task. The present paper addresses this challenge by systematically leveraging expert knowledge. Specifically, this paper introduces a general methodology to capture and incorporate shape expert knowledge into machine learning models for quality prediction in manufacturing.
It consists of four steps: 1. training of an initial purely databased prediction model, 2. inspection of the initial model by a process expert, 3. specification of shape expert knowledge, and 4. integration of the specified shape expert knowledge into the training of a new prediction model that complies with the shape knowledge. In the second step, the expert may find inconsistencies in the shape of the initial model contradicting the expected shape behavior. Therefore, in the third step, the expert can constrain the shape of the model to behave as expected. It is possible to define and combine as many shape constraints as desired. In the fourth step, the specified shape constraints are passed to the SIASCOR algorithm.
The resulting SIASCOR model is mathematically guaranteed to satisfy all the shape constraints imposed by the expert. Conventional purely databased models, by contrast, do not come with such a guarantee but, on the contrary, often exhibit an unphysical shape behavior in the sparsedata case considered here. Additionally, the direct involvement of process experts in the training of the SIASCOR model increases the acceptance of and the confidence in this model. Another asset of the SIASCOR method is that, in contrast to many conventional machine learning methods, it does not involve a timeconsuming and unsystematic hyperparameter tuning or model selection step.
The proposed general methodology was applied to an exemplary brushing process in order to obtain a prediction model for the arithmeticmean surface roughness of the brushed workpiece as a function of five process parameters. The dataset available in this application consisted of only measurement points. After inspecting the initial lasso model based solely on these data, the expert defined shape constraints in all five input parameter dimensions. The SIASCOR model trained with these shape constraints was compared to a purely databased GPR model. As opposed to the SIASCOR model, the GPR model contradicts the physical shape knowledge about the surface roughness in various ways. Also, the selection of an appropriate GPR kernel class is rather heuristic and timeconsuming. In any case, the interpretation of the GPR kernel class is certainly less clear than the interpretation of the shape constraints used in the SIASCOR method.
A possible topic of future research is to develop a more sophisticated definition of high and lowfidelity graphs, using techniques from the optimal design of experiments. Another topic of future research is the further improvement of the SIASCOR algorithm’s runtimes. In addition, a methodology will be developed for assessing the model and for uncovering possible conflicts between the imposed shape constraints and the data. Such conflicts might arise especially as soon as more data is available after SOP, and the model can then be retrained. And finally, a graphical user interface will be implemented allowing the domain experts to apply the proposed methodology completely independently of external support from data scientists or mathematicians. In particular, this user interface will no longer require a manual translation of the shape knowledge specified pictorially by the expert into mathematical constraints in the form expected by the SIASCOR algorithm.
Acknowledgments
We gratefully acknowledge the funding provided by the Fraunhofer Society as part of the lighthouse project “Machine Learning for Production” (ML4P). In addition, we would like to thank Holger Pätzold (Schaeffler Technologies AG & Co KG, Herzogenaurach) for the valuable discussions regarding the brushing process, Markus Renner (Carl HilzingerThum GmbH & Co. KG, Tuttlingen) for providing the brushing tools and Konstantin Kusch (Fraunhofer IWU, Chemnitz) for data acquisition and analysis. We would also like to thank Michael Bortz, Jan Schwientek, and Philipp Seufert (Fraunhofer ITWM, Kaiserslautern) for inspiring mathematical discussions.