1 Introduction
The stellar atmospheric parameters are important reference for understanding the properties of stars, as well as fundamental information for investigating the formation and evolution of galaxies. Therefore, it is an essential problem to estimate stellar atmospheric parameters from spectra in a largescale sky survey. At the same time, with the continuous development of largescale sky surveys, the amount of observed spectra is increasing, especially the amount of spectra observed by the Large Sky Area MultiObject Fiber Spectroscopic Telescope (LAMOST). LAMOST is a typical spectroscopic telescope, with a wide field of view and the highest spectral acquisition rate in the world, provide abundant observed spectra.
A series of researches have been conducted for estimating stellar atmospheric parameters from spectra of LAMOST (Ho et al. (2017); Xiang et al. (2017, 2019); Zhang et al. (2020)). However, these studies mainly train models on the spectra with medium and high SNR. For example, Ho et al. (2017) trained the Cannon on the spectra from LAMOST DR2 with SNR 100. Xiang et al. (2017)
trained a multiplelinear regression method on the spectra from LAMOST DR2 with SNR
50. Xiang et al. (2019) trained the DDPayne on the spectra from LAMOST DR5 with SNR50. Zhang et al. (2020) trained the SLAM on the spectra from LAMOST DR5 with SNR100. As a result, the performance of learned model on the spectra with low SNR decreases evidently. For example, Ho et al. (2017) showed in Figure 8, the uncertainty of is greater than 90K, is greater than 0.17dex, [Fe/H] is greater than 0.11dex in case of SNR30.The high signaltonoise ratio (SNR) spectra contain less noise, and their spectral characteristics are obvious. Good results have been obtained for estimating stellar atmospheric parameters from the LAMOST highSNR spectra. Unfortunately, the lowSNR spectra contain a lot of noise, their spectral characteristics are indistinguishable. Therefore, it is difficult to extract effective spectral features from them, which result in evidently degraded estimation performance. Comparing stellar atmospheric parameters provided by LAMOST DR8 with those provided by APOGEE DR12 (Fig. 1), the inconsistencies increase sharply as the SNR decreasing, especially in the case of SNR 30. This phenomenon indicates that it is difficult to estimate stellar atmospheric parameters from lowSNR spectra. Furthermore, the spectra with low SNR account for a large proportion in the LAMOST data. In LAMOST DR8, more than 60% of the spectra are SNR 30 (Fig. 1). Therefore, it is potentially helpful to investigate more accurate methods for estimating stellar atmospheric parameters from lowSNR LAMOST spectra.
The difficulty in estimating stellar atmospheric parameters from lowSNR spectra lies in the feature extraction procedure.
Bu & Pan (2015)investigated the spectral feature extraction problem based on principal component analysis (PCA).
Xiang et al. (2017) extracted spectral features for stellar parameter estimation based on kernel principal component analysis (KPCA). Both PCA and KPCA are of global dimension reduction method which is sensitive to local noises and distortions. Li et al. (2014) studied the spectral feature extraction problem based on LASSO (Least Absolute Shrinkage and Selection Operator) and local smoothing techniques. It is shown that the local dimension reduction method is effective in selecting the features for low signaltonoise ratio spectra. Therefore, this paper uses the local feature extraction method LASSO to select features from LAMOST DR8 lowresolution spectra with 20SNR30.After feature selection, we train an approximate model to learn a mapping from spectral features to a stellar atmospheric parameter, for example,
and [Fe/H]. Bu & Pan (2015) used Gaussian Process Regression (GPR) to estimate stellar atmospheric parameters from SDSS DR10 spectra. Xiang et al. (2017)used multiplelinear regression method to estimate stellar atmospheric parameters from LAMOST spectra. Unfortunately, for lowresolution and lowSNR spectra, this mapping is complex, and it is difficult to fit only by a basic nonlinear regression model. Fortunately, a series of works in literature show the effectiveness of neural networks (including deep learning) in estimating stellar atmospheric parameters from spectra
Manteiga et al. (2010); Li et al. (2014, 2017). For example, Manteiga et al. (2010) used artificial neural network (ANN) to estimate stellar atmospheric parameters from Gaia spectra with 5SNR25. Therefore, this work use a special neural network, multilayer perceptron (MLP), to estimate stellar atmospheric parameters from LAMOST DR8 lowresolution spectra with 20
SNR30.The structure of this paper is as follows: Section 2 introduced the spectra used to train and test the LASSOMLP model, gave the preprocessing procedures for the spectra. Section 3 described the dimension reduction method LASSO. Section 4 described the MLP model. Section 5 verified the results of the LASSOMLP model. Section 6 trained an ensemble LASSOMLP model to estimate the [Fe/H] from the spectra of metalpoor stars. Finally, we summarized in section 7. The estimation catalog, learned model, experimental code, trained model, training data and test data are released on the following websit for scientific exploration and algorithm study: https://github.com/xrli/LASSOMLP.
2 Data and its preprocessing
This work is to design a scheme for estimating the stellar atmospheric parameters from LAMOST lowresolution spectra with SNR 30. The reference data consist of LAMOST spectra and their reference labels. The reference label is the parameters to be estimated, e.g., or [Fe/H]. Each of the LAMOST spectra in reference set has a unique common source observation in APOGEE highresolution observations. The label comes from the APOGEE catalog estimated using ASPCAP from the common source highresolution APOGEE observation. The ASPCAP (García Pérez et al. (2016)) is a pipeline for estimating the stellar parameters from the APOGEE highresolution spectra. The common source matching is conducted based on the longitudelatitude constraint with a threshold 3.0 arc seconds. If a LAMOST spectrum has multiple matching observation sources in APOGEE, then we remove the spectrum from the referene set. Finally, the matched reference data set consists of 10,773 stellar spectra with SNR 30 from common stars between APOGEE and LAMOST. The ranges of the three stellar atmospheric parameters of these stars are K for , dex for , dex for [Fe/H]. Fig. 2 shows the distribution of stellar atmospheric parameters. The reference data set are randomly divided into a training set and a test set at a ratio of 8:2.
2.1 Preprocessings of the spectra
The observed spectra are affected by the radial velocity and flux calibration. The radial velocity result in wavelength shift comparing with the theoretical spectra. Each of the abovementioned factors can increase the difficulty of parameter estimation and reduce their accuracy. Therefore, it is necessary to do some preprocessing procedures to eliminate or reduce the potential negative impacts from them.
The preprocessing procedures are as follows:

Transform the observed spectra to their rest frame based on the radial velocity estimated by the LAMOST pipeline.

Cut the observed spectra based on their common wavelength range in the rest frame and resample them with a step 0.0001 in logarithmic wavelength coordinate system. The flux is linearly interpolated from the observed spectrum on the resampled wavelength. The computed spectrum of this step is denoted as
. In this work, the common wavelength range is [3839.5, 8936.7]Å. 
Estimate the continuum. Firstly, a spectrum is processed using a median filtering algorithm to remove the spurious noises and spectral lines. The size of the filtering window is three pixels in the median filtering algorithm. Secondly, the continuum is estimated using a sixthorder polynomial fitting method. The estimated continuum is denoted as .

Divide the linear interpolated flux by the fitted continuum to normalize the spectra.
An example of the above preprocessing is presented in Fig. 3.
3 Dimension reduction for the spectra
The preprocessed spectrum is a vector in a 3670 dimensional space, and there are a lot of noises and redundant components in this kind spectra. The noises and redundancies often lead to masking effects and accuracy degradation of parameter estimation. Although some spectral features are evident and sensitive to stellar atmospheric parameters on high quality spectra, their shape and contributions cannot be found in existence of serious noises by the stellar atmospheric parameter estimation model. This phenomenon is referred to as masking effects. Therefore, we need to reduce the dimension of these spectra to reduce ineffective or irrelevant components. By doing these, we can reduce the computational complexity of the model and the influence due to noises on the parameter estimation.
Suppose a vector represents a spectrum, represents a stellar atmospheric parameter , , or [Fe/H] of the corresponding spectrum, is the training set, represents the weights that the model needs to learn. The objective function of LASSO is:
(1) 
where is a preset parameter greater than 0, which controls the number of selected features. The with value zero indicates that the corresponding spectral flux is an irrelevant component or redundancy component. Otherwise, a nonzero indicates that the corresponding is a useful component for stellar atmospheric parameter estimation.
To further explore the optimality of LASSO in dimension reduction for the spectra from LAMOST DR8 lowresolution with SNR , we compared linear dimension reduction methods with nonlinear dimension reduction methods, local dimension reduction methods with global dimension reduction methods. This paper explored PCA (Principal Component Analysis), KPCA (Kernel Principal Component Analysis), ISOMAP (Isometric Mapping), MDS (Multidimensional Scaling), LLE (Locally Linear Embedding), and LASSO. The PCA is a linear global dimension reduction method, LASSO is a local linear dimension reduction method, KPCA, ISOMAP, and MDS are nonlinear global dimension reduction methods, LLE is a local manifold dimension reduction method.
We compared the scheme without dimension reduction (the first row of the Table 1) with schemes based on dimension reduction (the second to seventh rows of the Table 1). It is shown that the schemes based on PCA, KPCA, and LASSO are better than the scheme without dimension reduction. This phenomenon indicates the existence of the sparseness in the highdimensional spectral space. Therefore, it is difficult for the model to find the data characteristics of the samples, and parameter estimation without dimension reduction will reduce the efficiency and accuracy of the model. These results indicate the necessity of dimension reduction in estimating stellar atmospheric parameters from spectra.
We compared the linear dimension reduction methods PCA, LASSO (the second and seventh rows of the Table 1) with the nonlinear dimension reduction methods KPCA, ISOMAP, LLE, MDS (the third to sixth rows of the Table 1 ). It is shown that the estimation from the linear dimension reduction methods are better than the nonlinear dimension reduction methods. The used model in this work can be represented by XMLP, where X represents a dimension reduction method. These results indicate that the linear dimension reduction methods are more suitable for the stellar atmospheric parameter estimation scheme of XMLP model.
We compared the global dimension reduction methods PCA and KPCA (the second and third rows of the Table 1) with the local dimension reduction methods LLE, LASSO (the fifth and seventh rows of the Table 1). It is shown that the estimation of the local linear method LASSO are better than the local nonlinear method LLE and the global dimension reduction methods PCA, KPCA. A feature calculated by the global dimension reduction method uses almost all the observed pixels. This characteristics make each extracted features can be afftected by any noise and distortion; Conversely, a feature of the local dimension reduction method is computed only using a small subset of observed fluxes, for example, several fluxes near a spectral line. Furthermore, the LASSO can adaptively discard some ineffective pixels according to the overall balance between the effects from noises, distortions and spectral characteristics. Therefore, the features extracted by the local method may be less affected by noise and distortion. These experimental results indicate that the parameter estimation scheme XMLP based on a global dimension reduction is less robust, and the parameter estimation method based on a local dimension reduction performs good, especially the LASSOMLP method with the property of discrimination and rejection.
Based on the abovementioned studies, it is shown that the local dimension reduction methods and the linear dimension reduction methods are more suitable for the stellar atmospheric parameter estimation. Therefore, we used the local linear dimension reduction method LASSO to reduce the dimension for the spectra from LAMOST DR8 lowresolution with SNR .
[Fe/H]  

MAE (K)  MAE (dex)  MAE (dex)  
MLP  107.5  1.750  203.5  0.611  0.343  6.322  0.128  0.0070  0.566 
PCAMLP  100.1  0.645  2006.1  0.191  0.017  0.726  0.062  0.0012  0.164 
KPCAMLP  91.57  0.118  175.6  0.686  0.011  0.424  0.071  0.0072  0.426 
ISOMAPMLP  243.4  0.0046  397.9  0.619  0.100  3.080  3.680  3.646  3.185 
LLEMLP  127.3  5.108  196.60  0.286  0.019  0.452  3.526  3.526  0.968 
MDSMLP  454.0  230.2  554.8  1.284  0.069  1.801  0.300  0.109  1.006 
LASSOMLP  84.32  0.205  164.8  0.137  0.00084  0.217  0.063  0.00035  0.095 
4 The stellar atmospheric parameter estimation model
After dimension reduction, the information of every spectrum can be represented by a vector by stacking the selected features. In estimating , or [Fe/H], the information of a spectrum can respectively represented by a 141, 553, and 833dimension vector. From this feature vector, we can estimate the stellar atmospheric parameter using a regression method. This work estimated the stellar atmospheric parameters by a MLP.
A MLP provides a global nonlinear mapping from input (a spectrum ) to an output (the stellar atmospheric parameter of the corresponding star,
, or [Fe/H]). Each node in a layer of the MLP is fully connected with the nodes in its previous layer. The first layer is referred to as input layer, the middle one is the hidden layer, and the last one is the output layer. Except for the input node, each node is a neuron with a nonlinear activation function.
Suppose is a set of training data, where is the number of stellar spectra used for learning a MLP model, represents a stellar spectrum, and is the reference value of the atmospheric parameter , log or [Fe/H] of the spectrum . In this work, the reference values of the stellar atmospheric parameters are estimated by the ASPCAP. Let denote the estimation of from the spectrum using the MLP, and respectively represent the sets of connection weights and biases in the MLP model. To learn the model parameters and
, a mean squared error loss function can be used:
(2) 
The model parameters and are optimized by iteratively minimizing the loss function. When iterations reach a preset maximum number of times or the loss function is smaller than a given threshold, we stop the iterations.
To evaluate the optimality of the MLP in estimating stellar atmospheric parameter from the LAMOST DR8 lowresolution spectra with SNR
, we investigated the performances of multiple typical regression methods. For example, LR (Linear Regression), ridge, LASSO, ElasticNet,SVR (Support Vector Regression), KNR (KNeighbors Regression), DecisionTree, GradientBoosting, XGBoost, lightGBM, and Random forest. The ranges of the three stellar atmospheric parameters in this work are
K for , dex for , dex for [Fe/H]. To increase the numerical computation performance, we used instead of. In addition, we standardized each selected feature to zero mean and one variance. This standardization helps to improve the stability of most machine learning algorithms.
[Fe/H]  

MAE (K)  MAE (dex)  MAE (dex)  
LR  156.3  14.31  223.9  0.188  0.013  0.286  0.092  0.0015  0.146 
Ridge  156.3  14.31  223.9  0.188  0.013  0.286  0.101  0.014  0.841 
Lasso  157.9  15.03  226.7  0.188  0.013  0.285  0.084  0.00053  0.249 
ElasticNet  156.3  14.31  223.9  0.192  0.013  0.291  0.086  0.003  0.332 
SVR  132.3  1.066  190.11  0.174  0.003  0.273  0.070  0.002  0.098 
KNR  139.9  0.633  197.9  0.194  0.0016  0.289  0.097  0.0043  0.133 
DecisionTree  173.2  0.223  247.1  0.299  0.010  0.450  0.156  0.0087  0.211 
GradientBoosting  135.7  1.643  199.9  0.189  0.010  0.267  0.084  0.0022  0.116 
XGBoost  130.1  4.954  185.4  0.191  0.0090  0.268  0.066  0.65  0.28 
lightGBM  123.8  1.079  177.1  0.166  0.0070  0.239  0.072  0.0013  0.101 
Random forest  122.8  0.293  177.0  0.188  0.011  0.273  0.093  0.0037  0.126 
MLP  84.32  0.205  164.8  0.137  0.00084  0.217  0.063  0.00035  0.095 
The LR is one of the most commonly used algorithms for processing regression tasks. However, the naive linear regression is usually replaced by the regularized regression methods (LASSO regression, Ridge regression and ElasticNet). The Ridge regression is a linear regression with a L2 regularization, the LASSO regression is a linear regression method with a L1 regularization, and the ElasticNet regression is a linear regression mothod combined with a L1 regularization and a L2 regularization. The high dimension of the spectra tends to result in overfitting, and the regularization is actually a technique that penalizes too many regression coefficients to reduce the risk of overfitting. We compared the ordinary linear regression method (the first row of the Table
2) , Ridge regression (the second row of the Table 2), LASSO regression (the third row of the Table 2), and ElasticNet regression (the 4th row of the Table 2). It is shown that the regularization can indeed improve the performance of linear estimation method on [Fe/H]. However, these linear methods are inferior to the nonlinear regression methods, which will be discussed furtherly in following paragraphs. These experimental results indicate that there exsit some nonlinear relationships between the spectral features and the the stellar atmospheric parameters. Therefore, it is necessary to investigate the estimation performance of some typical nonlinear regression methods.The SVR, instancebased KNR, and DecisionTree are three typical nonlinear regression methods, and their experimental results are presented in the 5th7th rows of the Table 2. Although the dimension of the spectra is high, the SVR is robust to overfitting in a highdimensional space. Therefore, the SVR achieves better performance than the linear estimation methods. Due to the number of reference spectra in the training set, the KNR can find more similar training spectra for a spectrum to be parameterized, and the experimental results indicates that KNR also outperforms the linear regression methods. However, the estimations from lowSNR spectra by DecisionTree is prone to overfitting, which leads to worse performance than the linear regression models. Therefore, we need to further investigate the ways to prevent overfitting in treebased schemes.
The ensemble learning scheme helps to eliminate or reduce overfitting phenomenon. The ensemble learning increase the generalization ability and robustness of a model by combining the prediction results of multiple basic learners. According to the generation method of basic learners, the ensemble learning methods are roughly divided into two categories: In the first category, there are strong dependencies between the basic learners, which must be generated successively; In the second category, the basic learners are independent from each other, and they can be generated parallelly and independently. The Gradient Boosting, XGBoost and lightGBM are the representatives of the first category, and Random Forest is the representative of the second category. The experimental results of the abovementioned ensemble learning methods are presented in the 8th11th rows of the Table
2.The basic idea of Gradient Boosting is to train the newly added weak classifier according to the negative gradient information of the current model loss function, and integrate the trained weak classifiers into the existing model in the form of accumulation. This process is to continuously reduce the loss function and the model deviation. Due to the excessive pursuit of reducing errors, Gradient Boosting is prone to overfitting, and takes a long time to train. Experimental results show that Gradient Boosting is inferior to the other ensemble learning methods in estimating the stellar atmospheric parameters (the 8th row of the Table
2). Therefore, the XGBoost adds a regularization term to the cost function to improve generalization ability by controling the complexity of the model. From the perspective of balancing variance and bias, it reduces the variance of the model, makes the learned model simpler, and reduces the risk of overfitting. Experimental results show that the XGBoost outperforms the Gradient Boosting in estimating stellar atmospheric parameters (the 8th and 9th rows of the Table 2). The lightGBM mainly optimizes the training speed of the model, and its basic principle is similar to the XGBboost. Therefore, there is not essential difference on accuracy between these two methods (the 9th and 10th rows of the Table 2). On the basis of building an ensemble with decision tree as the basic learner, Random forest further introduces random feature selection in the training process. This randomness makes the model have more generalization ability. Experimental results show that the Random forest outperforms decision tree in estimating the stellar atmospheric parameters (the 7th and 11th rows of the Table
2)Random forest consists of multiple decision trees which is independent from each other. On the other hand, the MLP consists of multiple layers where each layer is fully with the layer before it. Experimental results show that Random forest is inferior to MLP in estimating the stellar atmospheric parameters (the 11th and 12th rows of the Table 2). Therefore, we used MLP to estimate the stellar atmospheric parameters from LAMOST lowresolution spectra with SNR .
5 Performance Evaluation of LASSOMLP
We evaluated the reliability and accuracy of the proposed model LASSOMLP from two aspects. Firstly, it is evaluated by computing the consistencies between the LASSOMLP estimations and the APOGEE estimation from highresolution spectra (the 1st row of the Table 3). Secondly, by treating the ASPACP estimations from APOGEE highresolution spectra as benchmark, we compared statistical characteristics of the LASSOMLP estimations and LASP estimations (the 1st and 2nd rows of the Table 3). These evaluations are conducted based on the following three statistical indicators: mean absolute error (MAE), mean error (
), and standard deviation of error (
). The MAE is the average of the absolute values of errors, which can avoid the problem of mutual cancellation from errors on various spectra, and can measure the overall accuracy of an estimation model. The is the arithmetic mean of the error, representing the most likely value of the error, reflecting the systematic bias of a parameter estimation model. The describes the fluctuation around the average estimation, which reflects the uncertainty of a model.Three statistical indicators (MAE, , ) are all relatively small in the scenario of lowresolution and lowSNR spectra ( the first row of the Table 3). This result indicates an excellent consistency between LASSOMLP estimations from LAMOST spectra and the ASPCAP estimations from APOGEE highresolution spectra, and this consistency is stable on the whole. At the same time, the LASSOMLP estimation does not show any obvious systematic shift on various parameter intervals (Fig. 4). Therefore, the LASSOMLP model has a strong generalization ability in estimating the stellar atmospheric parameters from the lowSNR spectra.
We compared the parameter estimation results of LASSOMLP with those of LASP from LAMOST lowresolution spectra, ASPCAP from APOGEE highresolution spectra respectively. More consistency is shown between LASSOMLP estimation and the ASPCAP estimation (the 1st and 2nd rows of the Table 3). The fundamental principle of LASP is to calculate the difference between each observed flux in the selected wavelength range [3850,5500]Å and the corresponding flux of the reference spectra. The characteristics of accumulation result that the matching result of the LASP is prone to be affected by any noise and distortion on all pixels in this wavelength range. However, the LASSO can adaptively evaluate the combined effects from noise and spectral features on parameter estimation, discard ineffective and redundant components. Therefore, the LASSOMLP model is less susceptible to noise and distortion, performs more accurately. On the other hand, the MLP can reduce the overfitting risk through early stopping and L2 regularization term. Therefore, the LASSOMLP model have strong robustness and generalization ability (Fig. 5). Furthermore, the experimental results in Fig. 5 much less systematic bias from LASSOMLP than LASP in the case of teff4000 K and [Fe/H]1 dex. Fig. 4 show some comparison results between LASSOMLP and ASPACP. The experimental results on various different parameter intervals do not show any obvious bias trend. Therefore, the LASSOMLP is robust in estimating the stellar atmospheric parameters from the lowSNR spectra.
[Fe/H]  

MAE (K)  MAE (dex)  MAE (dex)  
LASP  137.6  49.51  169.6  0.195  0.063  0.257  0.091  0.0018  0.132 
LASSOMLP  84.32  0.205  164.8  0.137  0.00084  0.217  0.063  0.00035  0.095 
However, there are still several spectra with relatively obvious inconsistencies with the APOGEE catalog (Fig. 4). These spectra are presented in Fig. 6. Fig. 6 (a) shows a spectrum with an overestimated by the LASSOMLP model. The fluxes of the fitted continuum from this spectrum are approximately 0 near 4000 Å. Therefore, the preprocessing procedure gives some invalid results when dividing the linear interpolation flux by the fitted continuum to normalize the spectrum. That is to say, this spectrum was not properly calibrated during preprocessing. These inappropriate calibrations in preprocessing result in a large deviation in its estimation by the LASSOMLP model. Fig. 6 (b) and (c) present two spectra with underestimated and overestimated . These results are due to the large residuals in the sky light emission lines. Fig. 6 (d) shows a spectrum with an underestimated , and Fig. 6 (e) presents a spectrum with an overestimated [Fe/H]. These two spectra are affected by some cosmic ray interference. The two cases in Fig. 6 (d) and (e) indicate that it is necessary to design some mothods detecting the existences of cosmic ray interference and removing/masking them. Fig. 6 (f) presents a spectrum with an underestimated [Fe/H]. it shows that there is a lot of missing information on the [7500, 8200] Åof the spectrum. Therefore, there exist some obvious deviations in their estimations from the LASSOMLP model.
6 Improvement on the [Fe/H] estimation for metalpoor stars
Stars that present lower metallicity than that of the sun, e.g., with [Fe/H] are referred to metalpoor stars. They preserve chemical relics of early generations of stars, and thus are important for studying the early formation history of the Milky Way and the universe. However, due to limited survey volume and the nearinfrared wavelength coverage, the APOGEE is not able to provide a preferable database for metalpoor stars, and could not cover the stellar parameter space when it comes to [Fe/H]. Nevertheless, due to the [Fe/H] coverage range of the common objects between LAMOST lowresolution spectra with 20SNR30 and the APOGEE spectra, is [1.448, 0.429] dex, the accuracy of the trained LASSOMLP model is not very good in case of [Fe/H]1.448 dex. Therefore, it is important to find a proper catalogue for lowmetallicity stars and design another specific model accordingly. Fortunately, Li et al. (2018) has provided the largest catalogue for over 10,000 metalpoor stars based on LAMOST data, and for about 400 of these objects, highresolution followup observations has been performed using the Subaru Telescope, resulting in the largest uniform highprecision database for metalpoor stars (Li et al., under review). Based on this LAMOST/Subaru sample, a catalogue containing 661 LAMOST spectra has been used to establish our new model for metalpoor stars.
To improve the generalization ability of the [Fe/H] estimation model from metalpoor stellar spectrum, a novel reference set is established and denoted by reference set 2. The reference set 2 contains not only all of the 661 metalpoor stellar spectra, but also 600 spectra with [Fe/H]1.448 dex. These spectra with [Fe/H]1.448 dex are randomly selected from the LAMOST spectra from the common stars between LAMOST and APOGEE. This reference set is very small. If it is furtherly divided into a training set and a test set, there will be too little data for learning and testing, resulting in a model with poor estimation performance. A small test data set can result in an evaluation result with little statistical significance. Therefore, we designed a fivefold cross validation scheme to build and test the model. In cross validation, we divided the reference set into five mutually exclusive subsets with equal number of spectra, and established five LASSOMLP models using them. Each model is trained on the reference spectra from four subsets and tested on the reference spectra from the remaining subset. For a spectrum from suspect metalpoor star, we give its [Fe/H] estimation by computing the average of the estimated results from the five models. For convenience, this [Fe/H] estimation model for suspect metalpoor star spectrum is referred to as ensemble LASSOMLP. This work also trained a LASSOMLP model using the reference set 2, and this model is denoted by LASSOMLP.
Since LASP only provided the [Fe/H] estimations for 255 spectra of the 661 metalpoor stellar spectra, we evaluated the performance of the LASSOMLP, ensemble LASSOMLP model from two aspects. First, we treated the metalpoor star catalog as benchmark, and computed the statistical characteristics of the LASP estimations, LASSOMLP, and ensemble LASSOMLP estimations on the 255 metalpoor stellar spectra (Experiment 1). Second, we computed the inconsistency measures between the ensemble LASSOMLP estimations and the metalpoor catalog, between the LASSOMLP estimations and the metalpoor catalog on the 661 metalpoor star spectra (Experiment 2).
The experiment 1 shows more consistencies between the LASSOMLP estimation and benchmark than the LASP (the 1st and 2nd rows of the Table 3). However, the standard deviation () of the error by the LASSOMLP model is larger than that of LASP. This is due to the small number of samples in the training set, and the complexity of the LASSOMLP model. The LASSOMLP is more complex than the LASP model. Therefore, the LASSOMLP model is prone to overfitting in case of a small training set. The performance evaluation results of the ensemble LASSOMLP are presented in the 3rd and 5th rows of the Table 4. It is shown that the ensemble LASSOMLP significantly improves the accuracy and stability of the parameter estimation. Therefore, we reestimate the [Fe/H] using the ensemble LASSOMLP model for 222 spectra with LASSOMLP [Fe/H] estimation smaller than 1.448 dex. The final estimation results show that there are 209 spectra with [Fe/H]1.5 dex in the LAMOST DR8 stellar spectra with 20SNR30.
For all of the 661 metalpoor stellar spectra, both the LASSOMLP model and the ensemble LASSOMLP model give [Fe/H] estimations. The experimental results in experiment 2 also show that the the ensemble LASSOMLP estimations are more consistent with metalpoor star catalog (the 4th and 5th rows of the Table 4). Therefore, this work propose the ensemble LASSOMLP model for estimating the [Fe/H] on the metalpoor stellar spectra.
In theory, a parameter estimation model should be trained and tested on independent samples. In this work, however, the reference data of the metalpoor stars are scarce. Therefore, we did not divide the reference set 2 into a separate training set and a test set. As a result, it is probably that there exist some optimism to a certain in evaluation results on LASSOMLP
and ensemble LASSOMLP (Table 4).MAE  

LASP  0.274  0.270  0.161 
LASSOMLP  0.166  0.052  0.223 
Ensemble LASSOMLP  0.068  0.014  0.092 
LASSOMLP*  0.217  0.023  0.311 
Ensemble LASSOMLP*  0.076  0.0050  0.107 
7 conclusion
The proposed models achieve good results in the estimating the stellar atmospheric parameters from LAMOST low resolution spectra with SNR . However, there are some limitations to be dealt with in furture. For example, the parameters coverage of the reference spectra is very small. In future, we should try to expand the parameters coverage of the training set.
In this paper, we estimated the stellar atmospheric parameters from 1,162,760 LAMOST lowresolution spectra SNR (LAMOST DR8), and released it. We also released the model code, trained models, the training spectra and test spectra for reference.
The released catalog is organized in a csv file. This file describes the LASP estimations and the proposed model estimates for all 1,162,760 spectra from LAMOST DR8 with SNR . Among them, LASP, LASP, and [Fe/H]LASP represent stellar atmospheric parameters provided by LASP. MLP, MLP, [Fe/H]MLP· represent stellar atmospheric parameters estimated by the proposed scheme. LAMOSTobsid represents the obsid corresponding to the spectrum. The estimation catalog, learned model, experimental code, trained model, training data and test data are released on the following websit for scientific exploration and algorithm study: https://github.com/xrli/LASSOMLP.
Acknowledgements.
The authors thank the reviewer and editor for their instructive comments. This work was supported by the National Natural Science Foundation of China (Grant Nos. 11973022, 11973049, and U1811464), the Natural Science Foundation of Guangdong Province (No. 2020A1515010710), and the Youth Innovation Promotion Association of the CAS (id. Y202017).References
 Jofré et al. (2010) Jofré, P., Panter, B., Hansen, C. J., et al. 2010, , 517, A57
 Re Fiorentin et al. (2007) Re Fiorentin, P., BailerJones, C. A. L., Lee, Y. S., et al. 2007, , 467, 1373
 Li et al. (2014) Li, X., Wu, Q. M. J., Luo, A., et al. 2014, , 790, 105
 RecioBlanco et al. (2006) RecioBlanco, A., Bijaoui, A., & de Laverny, P. 2006, , 370, 141
 Bu & Pan (2015) Bu, Y. & Pan, J. 2015, , 447, 256
 Xiang et al. (2017) Xiang, M.S., Liu, X.W., Shi, J.R., et al. 2017, , 464, 3657
 Li et al. (2015) Li, X., Lu, Y., Comte, G., et al. 2015, , 218, 3
 BailerJones (2000) BailerJones, C. A. L. 2000, , 357, 197
 Katz et al. (1998) Katz, D., Soubiran, C., Cayrel, R., et al. 1998, , 338, 151
 Zhang et al. (2019) Zhang, X., Zhao, G., Yang, C. Q., et al. 2019, , 131, 094202
 Koleva et al. (2009) Koleva, M., Prugniel, P., Bouchard, A., et al. 2009, , 501, 1269
 Manteiga et al. (2010) Manteiga, M., Ordóñez, D., Dafonte, C., et al. 2010, , 122, 608
 Ting et al. (2019) Ting, Y.S., Conroy, C., Rix, H.W., et al. 2019, , 879, 69
 Lee et al. (2008) Lee, Y. S., Beers, T. C., Sivarani, T., et al. 2008, , 136, 2022
 Ness et al. (2015) Ness, M., Hogg, D. W., Rix, H.W., et al. 2015, , 808, 16
 Prugniel & Soubiran (2001) Prugniel, P. & Soubiran, C. 2001, , 369, 1048
 Li et al. (2017) Li, X.R., Pan, R.Y., & Duan, F.Q. 2017, Research in Astronomy and Astrophysics, 17, 036
 SánchezBlázquez et al. (2006) SánchezBlázquez, P., Peletier, R. F., JiménezVicente, J., et al. 2006, , 371, 703
 Boeche et al. (2018) Boeche, C., Smith, M. C., Grebel, E. K., et al. 2018, , 155, 181
 Zhang et al. (2020) Zhang, B., Liu, C., & Deng, L.C. 2020, , 246, 9
 Wu et al. (2011) Wu, Y., Luo, A.L., Li, H.N., et al. 2011, Research in Astronomy and Astrophysics, 11, 924
 García Pérez et al. (2016) García Pérez, A. E., Allende Prieto, C., Holtzman, J. A., et al. 2016, , 151, 144
 Liu et al. (2014) Liu, C.X., Zhang, P.A., & Lu, Y. 2014, Research in Astronomy and Astrophysics, 14, 423432
 Ho et al. (2017) Ho, A. Y. Q., Ness, M. K., Hogg, D. W., et al. 2017, , 836, 5
 Yang & Li (2015) Yang, T. & Li, X. 2015, , 452, 158
 Chen et al. (2015) Chen, Y.Q., Zhao, G., Liu, C., et al. 2015, Research in Astronomy and Astrophysics, 15, 1125
 Lee et al. (2015) Lee, Y. S., Beers, T. C., Carlin, J. L., et al. 2015, , 150, 187
 Xiang et al. (2019) Xiang, M., Ting, Y.S., Rix, H.W., et al. 2019, , 245, 34
 Zhao et al. (2012) Zhao, G., Zhao, Y.H., Chu, Y.Q., et al. 2012, Research in Astronomy and Astrophysics, 12, 723
 Wu et al. (2014) Wu, Y., Du, B., Luo, A., et al. 2014, Statistical Challenges in 21st Century Cosmology, 306, 340
 Taylor (2005) Taylor, M. B. 2005, Astronomical Data Analysis Software and Systems XIV, 347, 29
 Wang et al. (2020) Wang, R., Luo, A.L., Chen, J.J., et al. 2020, , 891, 23]
 Luo et al. (2015) Luo, A.L., Zhao, Y.H., Zhao, G., et al. 2015, Research in Astronomy and Astrophysics, 15, 1095
 Jofré et al. (2019) Paula Jofré, Ulrike Heiter, and Caroline Soubiran. Accuracy and precision of industrial stellar abundances. Annual Review of Astronomy and Astrophysics, 57:571616, 2019
 Ren et al. (2016) Ren, J.J., Liu, X.W., Xiang, M.S., et al. 2016, Research in Astronomy and Astrophysics, 16, 45
 Ye & Xie (2010) Ye, G.B. & Xie, X. 2010, arXiv:1006.5086
 Efron et al. (2004) Efron, B., Hastie, T., Johnstone, I., et al. 2004, math/0406456
 Gao & Li (2017) Gao, W. & Li, X.. ru . 2017, , 41, 331
 Li et al. (2018) Li, H., Tan, K., & Zhao, G. 2018, , 238, 16. doi:10.3847/15384365/aada4a