1 Introduction
The Transiting Exoplanet Survey Satellite (TESS) (Ricker et al., 2014) was launched by NASA on April 18, 2018 with the primary objective of allsky surveying more than 200,000 nearEarth stars in search of transiting exoplanets using highprecision photometry, producing light curves with a 2minute cadence. The TESS Objects of Interests (TOI) have been released periodically and archived at the Mikulski Archive for Space Telescopes (MAST, https://archive.stsci.edu/). The TOI includes planetary candidates, as well as potential planetary candidates and other astrophysical targets, including false positives, comprising the database used for searching for confirmed exoplanets. As of March 23, 2020 TESS has released 1766 TOIs with 43 confirmed planets and 412 false positives (see, https://tess.mit.edu/publications/).
Previously, Kepler Space Telescope launched by NASA in 2009 was designed to determine the occurrence frequency of Earthsized planets. Towards this objective, Kepler observed about 200,000 stars with high photometric precision discovering thousands of transiting exoplanets and exoplanetary candidates (Borucki et al., 2010; Jenkins et al., 2010a; Koch et al., 2010; Christiansen et al., 2012). During the prime missions (2009 May 2 2013 May 11) Kepler was pointing at a single field of view of about 115 square degrees in the constellations of Cygnus and Lyra. The many periodic signals detected by Kepler were processed using the Kepler Science Processing Pipeline (Jenkins et al., 2010b). They were assembled into a database of threshold crossing events (TCEs). Direct human input was required to remove false positives and instrumental effects from this database. However, the resulting TCEs database contains data produced by many possible sources, such as eclipsing binaries, background eclipsing binaries and many other possible false alarm sources, in addition to small fraction of exoplanetary candidates (EPCs), and still require considerable analysis for confirmed identification of exoplanets.
Recently, Shallue and Vanderburg (2018)
identified transiting exoplanets in Kepler satellite data using Deep Learning (DL) algorithm based on training of convolutional neural networks using the GoogleVizier system
(Golovin et al., 2017). Shallue and Vanderburg (2018)trained the neural networks to classify whether a given light curve signal is a signature of a transiting exoplanet with low false positive rate. By using their algorithm, they identify multiplanet resonant chains around Kepler80 and Kepler90. Later, the extended Kepler K2 mission, which starting in Nov. 2013, was designed to use the remaining Kepler capabilities after the completion of the prime mission including the technical failures of the reaction wheels. During this observation phase, the photometric accuracy was reduced, and the pointing varied in different regions of the sky. Nevertheless,
Dattilo et al. (2019) used a similar automated technique based on Shallue and Vanderburg (2018) study that is applied to mission data K2 while identifying two previously unknown exoplanets.Automated classification methods for transiting exoplanets from TESS data have been developed using machine learning (ML) techniques in several studies (e.g., Ansdell et al., 2018; Zucker and Giryes, 2018; Yu et al., 2019; Osborn et al., 2020)
that demonstrate the usefulness and feasibility of this approach with various degrees of improved classification performance. In this paper, we describe an application of novel algorithms, which combine several ML approaches and low rank matrix decomposition, including algorithms that identify anomalies in high dimensional big data by using augmentation approach. This methods, utilized semisupervised and unsupervised learning was developed by
ThetaRay, Inc. (https://thetaray.com/) for uncovering financial crimes, cyber and Internet of Things (IoT) security, was applied for transiting EPCs search, reported in this study. By using Kepler data with confirmed exoplanets as part of the algorithm training phase and validation, the ThetaRay platform was applied to TESS data yielding 39 new EPCs out of nearly 11000 TCEs, demonstrating the feasibility and utility of this new platform.2 Machine Learning Methods
2.1 The ThetaRay Algorithm
In the present study we utilize ThetaRay
AIbased Fintech algorithms, commercially developed for anomaly detection (financial crimes) in financial institutions, cyber security and IoT for smooth operations of critical infrastructure installations. Since transiting exoplanets light curves are rare and only appear in small number of all observed Kepler or TESS stellar light curves, they are classified as ‘anomalies’ in our analysis when
ThetaRay system utilizes the strengths of its algorithms to identify transiting EPCs in the large number of TCEs. To identify these ‘anomalies’, or exoplanet lightcurves, ThetaRay’s algorithms generates a datadriven ‘normal’ profile of the data ingested, and simultaneously identifies anomalies also called abnormal events, providing forensics that categorizes each event based on its features. This is done autonomously by the algorithm without the need to have rules or signatures. ThetaRay’s algorithmic engine utilizes techniques drawn from a wide variety of mathematical disciplines, such as harmonic analysis, diffusion geometry and stochastic processing, low rank matrix decomposition, randomized algorithms in general and randomized linear algebra in particular, geometric measure theory, manifold learning, neural networks/deep learning, and compact representation by dictionaries. One approach models the data as a diffusion process using Brownian motion of a random walk process to geometrize the data. There is no need for any semantic understanding of the processed data, nor are there any predefined rules, heuristics or weights in the system. The diffused collected dataset is then converted into a Markov matrix through a normalized graphLaplacian and modeled as a stochastic process that is applied in many dimension (could reach thousands)  see the Appendix for additional details of the algorithms.
2.2 Kepler Satellite data ML training
We have focused on light curves produced by the Kepler space telescope, which collected the light curves of 200,000 stars in our milky way galaxy for 4 years with continuous 30min or 1min sampling. To train the algorithm to identify planets candidates in Kepler light curves, we used a training set of labeled Threshold Crossing Events (TCEs). We obtained all the 15,737 TCEs produced by Kepler and utilized in Google’s research deep learning method (Shallue and Vanderburg, 2018; Dattilo et al., 2019), where they used a supervised convolutional neural network machine learning architecture that included 2,202 features: 201 features of ‘local view’ and 2,001 features of ‘global view’. The ‘global view’ represents the entire light curve and the ‘local view’ represents a phasefolded window around the identified transit.
We derived our training set of labeled TCEs from the Autovetter Planet Candidate Catalog for Q1Q17 DR24 (Catanzarite, 2015; Coughlin et al., 2016) hosted at the NASA Exoplanet Archive (https://exoplanetarchive.ipac.caltech.edu/). We obtained the TCE labels from the catalog’s
“av_training_set” column, which has three possible values: planet candidate (PC), astrophysical false positive (AFP) and nontransiting phenomenon (NTP). We ignored TCEs with the “unknown” label (UNK). These labels were produced by manual vetting and other diagnostics. We obtained additional data on the TCEs such as planet number, radius of the planet, interval between consecutive planetary transits, etc., from the MAST TESS archive (https://archive.stsci.edu/missionsanddata/transitingexoplanet surveysatellitetess) for data labeling and use in our analysis.
2.2.1 Features
Feature engineering is the process of using data domain knowledge to create features by manipulating the data through mathematical and statistical relations (for examples, see section 2.2.4) of the various components in order to improve the performance of the AI/ML algorithms. The feature engineering process includes deciding which features to develop, creating the features, checking how the features work with the model, improving the features as needed, and going back to deciding on or creating additional data features until the ML/AI algorithm results are optimized. We applied the feature engineering process on our dataset and created new features in addition to the existing features available in MAST in order to provide more information which will quantify various aspects of the data used by the AI/ML algorithm in the present analysis. We produced a total of 424 features that were used for the analysis. We chose the combination of features that provided the best results under the capabilities of ThetaRay’s system, validated in the training step. In the feature engineering process, we tested the effectiveness of different combinations of features under the limits of ThetaRay’s system.
2.2.2 Existing features
Additional TCEs Data were downloaded from MAST. We narrowed down the data only to the required fields for the present task, such as the planet number, the radius of the planet, the interval between consecutive planetary transits, etc., and selected the relevant data from all the fields from “Data Columns in the Kepler TCE Table” (https://exoplanetarchive.ipac.caltech.edu/docs/API_tce_columns.html) using the visualization of the variables (especially KDE plots, see below). Below is the description of the variables and labels used in our analysis.

Unique key  concatenation of Kepler ID and Planet Number. Kepler ID is a target identification number, as listed in the Kepler Input Catalog (KIC). The KIC was derived from a groundbased imaging survey of the Kepler field conducted prior to launch. The survey’s purpose was to identify stars for the Kepler exoplanet survey by magnitude and color. The full catalog of 13 million sources can be searched at the MAST archive. The subset of 4 million targets found upon the Kepler CCDs can be searched via the Kepler Target Search form.

Kepler Input Catalog (KIC) (Brown et al., 2011).

MAST archive  http://archive.stsci.edu/kepler/kic10/search.php.

Kepler Target Search form  http://archive.stsci.edu/kepler/kepler_fov/search.php.


av_training_set  Autovetter Training Set Label. If the TCE was included in the training set, the training label encodes what is believed to be the “true” classification, and takes a value of either PC, AFP or NTP. The TCEs in the UNKNOWN class sample are marked UNK. Training labels are given a value of NULL for TCEs not included in the training set. For more detail about how the training set is constructed, see Autovetter Planet Candidate Catalog for Q1Q17 Data Release 24 (KSCI19091): https://exoplanetarchive.ipac.caltech.edu/docs/KSCI19091001.pdf.

tce_prad  Planetary Radius (Earth radii). The radius of the planet obtained from the product of the planet to stellar radius ratio and the stellar radius.

tce_max_mult_ev  Multiple Event Statistic (MES). The maximum calculated value of the MES. TCEs that meet the maximum MES threshold criterion and other criteria listed in the TCE release notes are delivered to the Data Validation (DV) module of the data analysis pipeline for transit characterization and the calculation of statistics required for disposition. A TCE exceeding the maximum MES threshold are removed from the timeseries data and the SES and MES statistics recalculated. If a second TCE exceeds the maximum MES threshold then it is also propagated through the DV module and the cycle is iterated until no more events exceed the criteria. Candidate multiplanet systems are thus found this way. Users of the TCE table can exploit the maximum MES statistic to help filter and sort samples of TCEs for the purposes of discerning the event quality, determining the likelihood of planet candidacy, or assessing the risks of observational followup. DV module – http://archive.stsci.edu/kepler/manuals/KSCI19081001_Data_Processing_Handbook.pdf

tce_period  Orbital Period (days). The interval between consecutive planetary transits.

tce_time0bk
 Transit Epoch (BJD)  2,454,833.0. The time corresponding to the center of the first detected transit in Barycentric Julian Day (BJD) minus a constant offset of 2,454,833.0 days. The offset corresponds to 12:00 on Jan 1, 2009 UTC.

tce_duration  Transit Duration (hrs). The duration of the observed transits. Duration is measured from first contact between the planet and star until last contact. Contact times are typically computed from a bestfit model produced by a Mandel and Agol (2002) model fit to a multiquarter Kepler light curve, assuming a linear orbital ephemeris.

tce_model_snr  Transit SignaltoNoise (SNR). Transit depth normalized by the mean uncertainty in the flux during the transits.

av_pred_class  Autovetter Predicted Classification. Predicted classifications, which are the ‘optimum MAP classifications.’ Values are either PC, AFP, or NTP.

tce_depth  Transit Depth (ppm). The fraction of stellar flux lost at the minimum of the planetary transit. Transit depths are typically computed from a bestfit model produced by the Mandel and Agol (2002) model fit to a multiquarter Kepler light curve, assuming a linear orbital ephemeris.

tce_impact  Impact Parameter. The skyprojected distance between the center of the stellar disc and the center of the planet disc at conjunction, normalized by the stellar radius.

local_view
 vector of length 201: a ‘local view’ of the TCE. It shows the shape of the transit in detail (closeup of the transit event).
2.2.3 Visualization of Kepler Data
We investigated the Kepler data and visualized the variables with Pandas package in Python. For example, we visualize the distributions of the numerical variables per class using KDE (Kernel Density Estimation) plots. In Figure
1 we show several interesting examples with a gap between the curves labeled ‘Planets’ and ‘Not planets’ as identified by ThetaRay system and validated by the Kepler data training set. It can be concluded that these features are significant in candidate exoplanet identification and therefore we have included them in the model. If both curves coincide, it can be concluded that the behavior is the same for label ‘planets’ and ‘not planets’, and so we chose not to include these features in the model.Another example of our analysis is demonstrated in the ‘heat map’, which is basically a colorcoded matrix, where a correlation value between the variable of features is used to color each cell of the matrix to represent the relative value of that cell. If there is a high correlation between any variables, the dimension of the data can be reduced. The various features are labeled on the axes. Obviously, the features on the main diagonal that indicate identity correlation are light colored. It is evident from the ‘heat map’ shown in Figure 2 that most offdiagonal features are weakly correlated. The only significant offdiagonal correlations is between av_training_set  the training labels, i.e., if the TCE was included in the training set, the training label encodes what is believed to be the “true” classification, and av_pred_class  predicted classifications, which are the optimum MAP (maximum a posteriori) classifications. In fact, this field does not provide analysis information for the data but is used as forensic feature. The forensic features are not included directly in the analysis, but, provide supplementary information about the data useful for the investigation of the analysis. Some artificial correlation is also evident between the tce_time0bk  transit epoch (BJD), and tce_period  Orbital Period (days).
2.2.4 New Features
New features were developed based on the original data set from Kepler that was obtained from MAST to optimize the analysis with ThetaRay algorithm. These features were constructed from the original dataset as described below using the phasefolded “Local View” light curves (see, e.g., Shallue and Vanderburg, 2018).

global_view  the original vector of length 2001 or a ‘global view’ of the TCE that shows the characteristics of the light curve over an entire orbital period. Because of the size limitations of the ThetaRay
’s system, we performed dimension reduction. We represented groups of 20 columns in the ‘global view’ by computing the average and standard deviation of those columns. We have a total of 200 new “global_view” features.

spline_bkspace  the breakpoint spacing in time units, used for the bestfit spline. We chose the optimal spacing of spline breakpoints for each light curve by fitting splines with different breakpoint spacings, calculating the Bayesian Information Criterion (BIC, Schwarz (1978)) for each spline, and choosing the breakpoint spacing that minimized the BIC. Below, is a brief description of the new features that were computed for each TCE “Global View” and “Local View” light curves:

loc_mean – average of the “Local View” light curve.

loc_std  standard deviation of the “Local View” light curve.

loc_25% 25% percentile of the “Local View” light curve.

loc_75%  75% percentile of the “Local View”light curve.

loc_max – max value of the “Local View” light curve.

glob_mean – average of the original “Global View” light curve.

glob_std standard deviation of the original “Global View” light curve.

glob_25%  lower percentage of the original “Global View” light curve.

glob_75%  upper percentage of the original “Global View” light curve.

glob_max – max value of the original “Global View” light curve.

zScore_loc_min
– minimum value of the ZScore on the “Local View” light curve with window of 10.

zScore_loc_max – maximum value of the ZScore on the “Local View” light curve with window of 10.

zScore_glob_min – minimum value of the ZScore on the “Global View” light curve with window of 100.

zScore_glob_max – maximum Z of theScore on the “Global View” light curve with window of 100.
2.2.5 Working on ThetaRay’s System
We built in ThetaRay platform an “analysis chain”, which is a multistaged flowchart, that is composed of three main stages: Data Source, Data Frame and Analysis. The data is organized into data sources and they are uploaded to ThetaRay’s platform. We created data frames in the system with wrangling method (where, data wrangling is a process of cleaning, structuring and enriching raw data into a desired format with the intent of making it more appropriate and valuable for modeling) and split the data randomly into train and test in ThetaRay system such that 80% is allocated for training and 20% are allocated for testing. The training procedure generates profile and this was fed into different types of analyses using ThetaRay
Augmented and unsupervised algorithms, to find the best parameters that maximize the Area Under ROC Curve (AUC) in each chain, where ROC is Receiver Operating Characteristic (ROC) curve  a standard evaluation metrics for testing classification model’s performance. After the analysis and review of these results were completed, the data was processed again after modification and fine tuning of the internal parameters in the system for results improvement. Then, identification was executed again.
2.3 TESS Satellite Data Analysis
2.3.1 Preprocessing the Data
We obtained 10,803 light curves of TCEs produced by the TESS mission from MAST ( http://archive.stsci.edu/). We wanted to use the same model we built based on Kepler’s data, in order to find potential exoplanets (anomalies) in the new data from TESS. For using the same models for the two different satellites, we must convert the TESS data to the same structure as Kepler data. Therefore, we performed additional steps to prepare the light curves to be used as inputs to our system. We generated a set of TFRecord files for the TCEs. Each file contains global_view, local_view and spline_bkspace representations like in Kepler. We also created in python the following data files:

global_view  Vector of length 2001 that shows the characteristics of the light curve over an entire orbital period.

local_view  Vector of length 201 that shows the shape of the transit in detail (phasefolded closeup of the transit event).

more_features  includes

ticid  TESS ID of the target star.

planetNumber  TCE number within the target star.

planetRadiusEarthRadii  has the same meaning as the field of tce_prad in Kepler data.

spline_bkspace, mes  same meaning as tce_max_mult_ev in Kepler data.

orbitalPeriodDays  same meaning as tce_period in Kepler data.

transitEpochBtjd  same meaning as tce_time0bk in Kepler data.

transitDurationHours  same meaning as tce_duration in Kepler data.

transitDepthPpm  same meaning as tce_depth in Kepler Data.

minImpactParameter  same meaning as tce_impact in Kepler data.
TESS data is unlabeled, so av_training_set and av_pred_class fields do not exist in the TESS data, therefore, we filled these fields with zeros. tce_model_snr feature exists in Kepler data, but it does not exist in TESS data, so we calculated its value by the ratio of transitDepthPpm and transitDepthPpm_err.


Describe files  includes count, mean, std, min, max, 25% percentile, median (50%), 75% percentile. These quantities were computed on each original data row from the global_view and local_view files and on each scaling row of these files.
Following the generation of the dataset in the form of Coma Separated Values (CSVs), we applied the same manipulation on global_view, as in Kepler data, in order to reduce the dimensions, and used the analogous 424 features produced from TESS data as in Kepler data, for the analysis on ThetaRay’s system. Following this step, we applied the Detection algorithm on TESS data according to the saved model from Kepler and used the results for classification and mapping of TESS light curve TCEs data.
3 Results: Transiting Exoplanet Detection
The first results of the ThetaRay algorithm produced around 90 preliminary identification of EPCs that were further manually vetted, reducing the number of confirmed EPCs by about a factor of two. Local view lightcurves were used together with planetary candidate parameters to vet the algorithm’s output. In the manual vetting the physical parameters, such as nontypical ‘local view’ light curves (i.e, vshapes, and other nonplanetary periodic features), extremely large planetary radius, and very low signaltonoise were used. The parameters for the remaining 39 identified EPCs by the ThetaRay system form the TESS database of 10,803 TCE’s are given in Table 1. In Figure 3 we show the Local View light curves of eight selected light curves for exoplanetary candidates identified using the ThetaRay algorithm. The TESS input catalog ID number (TIC_ID), along with several parameters (tce_prad, tce_period, tce_depth defined in section 2.2.2) for the identified EPC are indicated on each panel. Of the 39 validated cases we note that only two case with planetary radius (tce_prad) or (TIC_ID 307210830 and 259377017), and a total of eight EPCs identified with . Another 15 identified EPCs were similar in size or larger than Jupiter with . We find the following properties of the 39 cases

The orbital periods (tce_period) of the identified EPCs range from 0.38d to just under 23d.

The transit depth (tce_depth) varied by about an order of magnitude in the range ppm with the signaltonoise in the range ppm.

The impact parameter was in the range .

The duration of the transits (tce_duration) was in the range d.

In four cases the identified EPCs suggest multiple planetary systems with 2 and 3 planets.
TIC_ID  # p  tce_prad  tce_max  tce_period  tce_time0bk  tce_duration  tce_model  tce_depth  tce_impact 

_mult_ev  _snr  
150162739  1  15.70639992  8.687669754  14.63549995  1335.199951  0.312352091  8.506078927  2042.819946  0.200270995 
167603396  1  2.751130104  8.034460068  14.37919998  1365.140015  0.324110419  8.00441099  1444.5  0.269908011 
254700590  1  2.881239891  7.790110111  11.77639961  1625.609985  0.193634167  7.599164669  543.3380127  0.651623011 
259377017  3  1.372750044  9.26651001  3.359859943  1387.089966  0.057147499  8.651269232  1034.959961  0.463200003 
279201188  1  4.806849957  7.558539867  14.4708004  1417.109985  0.112282082  7.839735967  652.2369995  0.375441998 
303051566  1  3.854789972  8.216239929  15.53929996  1328.569946  0.06302125  6.706231497  569.2410278  0.0374793 
307210830  3  0.869957983  11.3927002  2.253309965  1598.23999  0.042161249  10.71902329  723.9060059  0.501681983 
355509914  1  10.77110004  7.868070126  1.738260031  1326.119995  0.052042082  8.795540018  18978.40039  0.200622007 
370228465  2  4.152969837  7.710509777  12.32479954  1357.910034  0.057042085  7.614795753  4594.089844  0.0166821 
401889161  1  5.161489964  7.100709915  16.49230003  1417.01001  0.126625001  6.860909397  323.1549988  0.495678991 
422280868  1  2.566740036  8.417449951  3.133980036  1544.25  0.046455417  6.119703991  628.3099976  0.386207998 
447061717  1  2.608789921  21.75650024  9.204919815  1569.719971  0.135038748  18.51823286  4082.899902  0.0298279 
453767182  1  2.725820065  7.798190117  10.76249981  1626.119995  0.122235835  7.410331196  13611  0.027590601 
101948569  1  3.049010038  11.03339958  19.47240067  1360.109985  0.145037085  11.74638202  1645.949951  0.303943992 
102195674  1  20.62459946  68.38349915  4.378769875  1547.459961  0.159099996  67.04560811  30425.69922  0.00999983 
120916706  1  5.22453022  10.40649986  0.556737006  1386.170044  0.035527959  11.14050062  63679.10156  0.00999983 
141663326  1  24.71339989  65.67880249  6.65583992  1601.439941  0.119643748  55.47800308  6468.180176  0.985625029 
167418903  1  10.36499977  20.16550064  21.96240044  1599.300049  0.073109999  17.57117062  8278.05957  0.844699979 
170849515  1  10.83209991  16.08620071  1.941280007  1438.150024  0.074110419  20.93755124  37131.69922  0.00999983 
172464366  1  19.09110069  79.50800323  2.921689987  1470.050049  0.13241291  77.20788698  17562.40039  0.552250981 
178155732  1  2.372940063  10.47840023  5.971879959  1415.630005  0.11720375  12.1434367  316.0220032  0.226411998 
200591694  1  4.385819912  8.523739815  13.58699989  1470.150024  0.103903331  9.010778093  4702.040039  0.00999983 
206412587  1  3.578089952  7.95663023  16.51129913  1417.02002  0.106479168  7.899363151  524.2609863  0.475097001 
218524525  1  3.137619972  7.899419785  16.88459969  1494.859985  0.177322909  8.048498208  986.6010132  0.00999983 
219379012  1  4.340690136  15.25220013  1.546159983  1469.709961  0.055939998  17.6102274  1064.26001  0.692296982 
219403686  1  5.821829796  22.26140022  0.380145997  1468.579956  0.028838458  29.63871326  1336.619995  0.679122984 
235009317  1  23.10919952  73.6289978  7.456830025  1329.209961  0.158223748  49.59156193  21885.40039  0.903016984 
264537668  1  24.19919968  141.2850037  4.03110981  1469.109985  0.129590005  128.3034072  40075.69922  0.571915984 
270677759  1  9.437470436  14.45300007  9.129110336  1597.199951  0.196854994  14.20179783  8185.049805  0.805234015 
306735585  1  8.496970177  12.65380001  4.816760063  1414.079956  0.131257921  11.34648808  5228.529785  0.893122017 
307467401  1  4.158410072  44.08229828  9.587329865  1475.709961  0.281197071  25.5473955  1375.969971  0.83335799 
308994098  1  5.175449848  19.70980072  10.51659966  1552.050049  0.446095824  18.35474766  991.1049805  0.707704008 
309619055  1  10.75909996  20.72480011  10.55350018  1604.969971  0.195166245  11.93659993  9440.740234  0.894083023 
322900369  1  7.504670143  95.60189819  3.126100063  1493.890015  0.124930002  92.18038804  5151.629883  0.570958972 
335452175  1  15.31970024  58.07910156  15.49790001  1601.26001  0.134304583  60.38542514  8295.75  0.990104973 
410214984  3  4.706439972  12.51220036  8.135899544  1332.349976  0.043714583  6.290396653  4277.52002  0.144591004 
422655579  1  15.61709976  27.75729942  2.903460026  1413.140015  0.210315004  45.99784581  4657.879883  0.00999983 
423275733  1  17.97360039  24.79450035  2.052979946  1518.689941  0.110506669  36.4699389  10176.90039  0.745383978 
455278250  1  7.306509972  36.73529816  15.60929966  1521.51001  0.238732085  28.64738963  2824.179932  0.82368201 
4 Discussion and Conclusions
The TESS satellite provides observations of a large number (200,000) of stellar light curves with high photometric precision over the whole sky, divided in observing sectors, with the aim of detecting transiting Earthsized planets. The stellar object were selected to represent the brightest and closest to our solar system. The large dataset of nearly 27 gigabytes per day is then processed in the science data pipeline providing nearly 11,000 TCE’s as of the time of writing this paper. Further analysis of the TCEs is required to find confirmed examples of exoplanets, or exoplanetary candidates for more indepth processing. However, evidently this formidable data analysis task is difficult, if not impossible to carry out manually. A feasible approach for the TESS data analysis is based on automated identification techniques that were developed recently, customized for transiting exoplanetary candidates identification, utilizing AI/ML methods based on DL neural networks machine learning methods combined with anomaly identification methods reported the present study. This EPCs could be than vetted further with targeted observations and data analysis.
In this study we apply a novel algorithm developed by ThetaRay, Inc. for cybersecurity and anomaly identification in financial systems. The advantage of this AI/ML system over other machine learning methods is the combination of several algorithms, as described in this paper and the Appendix, and the direct application to any large dataset that contain possibly small number of target datapoints (‘anomalies’). We apply the system to TESS observations of TCE’s in search of transiting exoplanet signatures in the large TCE dataset. For the training set of the ML algorithm we used the Kepler exoplanet TCE’s validated with confirmed exoplanet dataset. By applying the trained ThetaRay algorithm to TESS TCE’s we report 39 new planetary candidates in wide range of sizes from below Earth’s radius to superJupiter’s radii, and planetary periods ranging from 0.38d to just under 23d. We demonstrate that the combination of DL neural networks with anomaly identification mathematical techniques provide an efficient AI/ML algorithm for the rapid automated search of transiting exoplanet candidates light curves. Although, we find that we need to apply manual vetting to reduce the number of falsepositives, the total number of EPCs identifications is manageable for secondary manual vetting of the relatively small number of lightcurves, and this approach provides the desired identification results. In future applications, the ThetaRay’s algorithm could be further optimized for transiting exoplanets identification, by including, for example, informed ML steps, potentially reducing further the falsepositive rate in this application and providing a new tool for analyzing TESS TCE data.
Acknowledgment
The resources for this research were provided by ThetaRay, Inc. LO would like to acknowledge the hospitality of the Department of Geosciences, Tel Aviv University.
Appendix
The classification of light curves as exoplanetary candidates in this paper is achieved by using the analytic platform of ThetaRay that is described in this appendix. This platform processes high dimensional big data to identify anomalous behavior in comparison to a normal profile. This anomaly detection tool is used in the present application for classification of EPCs in TESS TCE database. The normal profile is a training data driven and its generation is explained below. In the present study we used Kepler TCE data as a training dataset as described in section 2.2. This appendix describes some of the algorithms that were utilized in the study of identifying anomalies in a big data using augmentation, semisupervised and unsupervised type algorithms. The same core algorithms for anomaly identification are capable of identifying anomalies in cyber (malware), industrial malfunction (IoT) and financial (crimes) data. The algorithms were applied for the first time to astrophysical data in this study. These algorithms are part of ThetaRay (www.thetaray.com) core technology portfolio to fight financial crimes (Shabat et al., 2018a). The algorithms are housed in ThetaRay Computational Platform that enables efficient data manipulation and processing. The reported results were obtained by executing these algorithms on ThetaRay platform.
Appendix A Semisupervised processing via augmentation: Introduction
For background and context, we describe briefly the ThetaRay system current commercial applications that now have been expanded and applied to astrophysics dataset. The ThetaRay
is designed to provide a fast and accurate analytic solutions for identifying emerging risk/crime (classified as anomalies) in financial data, discovering new opportunities, and exposing blind spots within these large, complex high dimensional data sets. These AIbased algorithms radically reducing false positives, and are uniquely able to uncover “unknown unknowns” (these are threats that one is not aware of, and do not even know that one is not aware of them).
ThetaRay provides constructive solutions to anomaly detections challenges via its analytic platform designed for a big data, uncover previously unknown risks, and do so with industry low false positive rates and in real time enabling fast forensic.In this project, we assume that some labels of Kepler TCE data, which is a related dataset to TESS TCEs, are given but are not given for the TESS data. An augmented algorithm, which is considered as a learning method, generates a new data frame based on the provided labels. Then, the new data frame serves as an input to unsupervised algorithms. In this project, we apply 3 unsupervised algorithms to the augmented data: Geometricbased denoted by NY (see section C.1), algebraicbased denoted by LU (see section C.2), an hybrid of LU and NY and Neural network denoted by AE.
The augmentation method is based on Neural Network. By using a Neural Networkbased method, the default network (that can be useradjusted) consists of one input layer (the analysis data frame), three hidden layers and one output layer. All the layers are connected through “weights” that are automatically tuned during the learning (optimization) process until the network output layer values are close to the values of the provided labels. After optimization, the third hidden layer becomes the new data frame as well as the input to the unsupervised algorithms that are outlined in section B and some of them are described in details in section C.
ThetaRay’s platform covers detection and monitoring of several verticals with current emphasis on financial crimes by suppling an endtoend solution. ThetaRay provides an un and semisupervised realtime agnostic, AI based financial crimes detection platform that are based on anomaly detection algorithms of “unknown unknowns”.
Rulebased technology, which is very popular among anomaly detection tools, is intended for what is known and when you know what to look for. ThetaRay’s detection is achieved by un and semisupervised with automatic methods that are not based on rules, patterns, signatures, heuristics, data semantics of the features or any prior domain expertise and provide high detection rate and very low false positives. ThetaRay’s methodologies within its Analytics Platform are based on unbiased detection through a series of randomized advanced AIbased algorithms that can process any number of data features and can be explained, justified and anomalies can be traced back to identify features that triggered the anomalies therefore it is not classified as a black box. Thus, the platform enables past tracking of events and features that trigger the occurrence of anomalies. ThetaRay’s system operates under the assumption that is not know what to look for or what to ask. This allows their technology to potentially, detect every type of anomaly before the rules are discovered automatically. For efficient processing of the algorithms the system uses offtheshelf hardware components. Inherent parallelism in the algorithms are implemented with GPU utilization. The platform contains advanced and interactive visualization of the input and output phases of the data analysis. The detection approach is data driven thus, no preexisting models are assumed to exist. This makes this approach universal and generic and thus opens the way for different applications without the introduction of bias, limitations, and unfounded preconceptions into the processing, a property well suited for large astrophysical datasets. Mathematical and physical justification for most of the available algorithms in the system are given below.
The input training data can be enriched by a given limited set of labels. This increases the detection rate and reduces the false alarm rates. This is part of semisupervised algorithms. Semi and unsupervised algorithms are used. Currently, the platform contains eight different unsupervised algorithms for the data without labels and three different semisupervised algorithms for the data with partial labels within the detection engine. The results are fused to produce one solution. ThetaRay
combines the strengths of unsupervised and semisupervised techniques to identify anomalies in the data. Unsupervised learning assumes that there are no labels to the various data components. Semisupervised learning frameworks have made significant progress in training machine learning with limited labeled data in image domain. Augmented unsupervised learning can be used sidebyside with semisupervised learning. The augmented algorithms generate a new data frame based on the analysis data frame and the provided labels. The new data frame generated is then the new input for all the unsupervised algorithms selected. Labels are categorized as binaries, with the minority of the labels (known anomalies) marked as “1” and the remainder, which are the majority of unknown cases, assigned “0”.
Augmented process enables covering both the known and the unknown with a relative balance between them. The ThetaRay system allows for configuration of the underlying input features, algorithms and detection logic at each applications. Technically it is a neural networkbased process which generates a new data frame based on the input data frame and binary labels provided by the application (in the present case, stellar lightcurve data).
Appendix B Unsupervised algorithms: General description
 NY:

This algorithm (see, Figure 4) is based on diffusion maps (DM) methodology (Coifman and Lafon, 2006a) and it is primarily a nonlinear dimension reduction process. The anomaly identification procedure takes place inside the lower dimensional space (manifold) that is determined automatically during the training phase. An outofsample extension procedure (Coifman and Lafon, 2006b) is applied to the identification phase for each multidimensional data point, which did not participate in the training phase, to determine whether it belongs to the manifold (low dimensional space  classified as normal) or deviates from it (classified as anomalous).
The NY algorithm, which is based on DM, geometrizes the input training data. DM analyzes the ambient space (training data) and determines automatically where the data actually resides in the embedded space. We can visualize the input training data (ambient space) as a matrix of size where is the number of multidimensional data points (number of rows in the matrix) and each row is of dimension  the number of columns in the matrix. The input data is assumed to be sampled from a low dimensional manifold (embedded space) that captures the dependencies between the observable parameters. DM reduces in a nonlinear way the dimension of the ambient space which is the training data. The dimensionality reduction by DM is based on local affinities between multidimensional data points and on nonlinear embedding of the ambient space into a lower dimensional space, described as a manifold, by using a low rank matrix decomposition. The nonparametric nature of this analysis uncovers the important underlying factors of the input data and reveals the intrinsic geometry of the data represented by the embedded manifold. This manifold describes geometrically what we classify as the normal profile of the ambient data. Newly arrived multidimensional data points, which did not participate in the training procedure, are embedded into the lower dimensional space by the application of an outofsample extension algorithm. If the embedded multidimensional data point falls into the manifold, it is classified as normal otherwise it is classified as abnormal (anomalous). See section C.1 for more details.
 LU:

Based on a randomized lowrank matrix decomposition (Shabat et al., 2018b). This algorithm builds a dictionary from the training data. Then, each newly arrived multidimensional data point that is not well described (not spanned well) by the dictionary is classified as an anomalous data point.
The randomized LU (RLU) algorithm is an algebraic approach applied to input matrix of size with an intrinsic dimension smaller than . can be computed automatically or given. RLU is a low rank matrix decomposition which enables the identification of anomalies using a dictionary constructed from the training data. RLU forms a low rank matrix approximation of such that where and are orthogonal permutation matrices, and and are the lower and upper triangular matrices, respectively. A dictionary is then constructed according to ( is the transpose of a matrix). Thus, is a linear combination of the input matrix and a representation of the normal data. It is also used in the identification step to classify newly arrived multidimensional data points that did not participate in the training phase. Thus, a new incoming a multidimensional data point , which satisfies , is classified as normal; otherwise, it is classified as anomalous. Here, is the pseudo inverse of and is a quantity defined in the training phase. When applied to a matrix of size , the RLU decomposition reduces the number of multidimensional data points, resulting in a reducedmeasurements matrix of size where . Although the algorithm is a randomized, it has been proven in Shabat et al. (2018b)
that the probability that the RLU approximation will generate a big error tends to be very small. See section
C.2 for more details.  DK:

The DK Algorithm relies on successive applications of LU and NY. Assume the size of a given training matrix is data points (rows) by features (columns). RLU (described in section C.2) is applied to . The size of is reduced substantially through the application of random projection (Johnson and Lindenstrauss, 1984). Then, NY (described in section C.1) is applied to (dimension) and the matrix is embedded into a lower dimensional space and anomaly identification procedure NY is called in this embedded space.
 AE:

This is a variational autoencoder (AE) algorithm. AE is machine learning tool designed to generate complex models of data after careful distribution modeling of example data. In neural net language, AE consists of an encoder component and a decoder component. We assume that the input data set is generated from an underlying unobserved (latent) representation. Given an input data set, the encoder part of an AE approximates the distribution of the latent variables. Finally, the algorithm sets the distribution parameters of the latent layers in a manner that maximizes the likelihood of generating or reconstructing the input data in the decoder section. As soon as the distribution of the latent variables is approximated, we can sample from this distribution to generate an approximate representation of the input data. Since normality consists of and is defined by most of the data points, those will be wellapproximated by the AE, while anomalies will be poorly modeled. Therefore, by comparing the original sample with the reconstructed (generated) data, we can calculate a similarity score that enables us to detect anomalies. The goal is to use the AE as a denoising autoencoder. It allows us to encode our sample into the latent space and then reconstruct it. By comparing the original sample to the reconstruction, we are able to calculate a score that enables us to classify a data point as anomalous data point. Since we plan to use the AE for anomaly detection, we have to calculate the scores for the input and output.
Appendix C Unsupervised algorithms: Mathematical description
c.1 Diffusion geometry: Background
DMare a kernelbased method for manifold learning that can reveal the intrinsic structures in data and embed them in a low dimensional space. The DMbased approach computes the diffusion geometry. A spectral embedding of the data points provides coordinates that are used to interpolate and approximate the pointwise diffusion map embedding of data.
Manifold learning approaches are often used for modeling and uncovering intrinsic low dimensional structure in high dimensional data. DM is a method that captures data manifolds with random walks that propagate through nonlinear pathways in the data. Transition probabilities of a Markovian diffusion process (explained later how to compute them) define an intrinsic diffusion distance metric that is amenable to a low dimensional embedding. By arranging transition probabilities in a rowstochastic diffusion operator, and taking its leading eigenvalues and eigenvectors, one can derive a small set of coordinates where diffusion distances are approximated as Euclidean distances and intrinsic manifold structures are revealed.
In more details, the NY algorithm uncovers the internal geometry of the input training data denoted as . The use of geometric consdierations speeds up significantly the anomaly detection computational time. Next is a theory that supports this approach: The goal is to detect anomalies in and in newly arrived dimensional data points that did not participate in the training data . During the training procedure, size of , which is also called the dimension of , is automatically reduced. The procedure is called dimensionality reduction. Dimensionality reduction as explained later, is achieved without damaging the quality and the coherency of the data in . More than that, there is no loss of data as explained later. Dimensionality reduction is just a different representation of the training data that automatically without any human intervention reduced the dimension according to the data and uncovers the real dimension where the training data actually resides.
In general, anomaly detection is based on the notion of similarities (or affinities) between the high dimensional data points (these are the rows in the matrix ). How we detect anomalies in this big data efficiently without introducing bias and without damaging the data? Dimensionality reduction of is needed. How to achieve this reduction? The following provides the rationale why geometrization of the training data and tracking the movement of newly arrived data points identify a low dimensional manifold for learning. It is founded mathematically through the preservation of the quality and the integrity (completeness) of the data in .
The assumption is that the processed data is imbalance: High densities of dimensional samples (rows in the matrix ) represent normal data otherwise the data is classified as anomalous (abnormal) since the majority of the data is normal and thus it is classified as having high density.
Theory: How to find the low dimensional space (manifold)? It is proved that if is sampled from a low intrinsic dimensional manifold then, as (dimension) tends to infinity, the defined random walk, which travels between all the data samples, converges to a diffusion process over the manifold. This is the key to the processing of as diffusion process that guarantees efficient scan of the data through randomization without introduction of bias. It provides three complementary approaches for dimensionality reduction – diffusion distances between dimensional samples, randomization and manifold learning  emerge from this observation (theorem): 1. kernel matrix of size (huge) is constructed from distances among all the dimensional samples (rows). The distances are diffusion distances. 2. Random walk is applied to the entries in . This random walk guarantees that there is no bias between the utilization of the distances in . 3. Diffusion Maps (DM) links between the matrix and a lower dimensional space (manifold) via diffusion processing. The dimension of the embedded manifold represents the reduction of .
Geometrization of the training data  outline description of the approach: The NY algorithm is based on a geometric uncovering of a low dimensional manifold in the ambient space (the original space represented by ) by the application of DM to ambient space represented by . The input data is assumed to be sampled from a low intrinsic dimensional manifold that captures the dependencies between the observable parameters (dimensional features). DM reduces the dimension of the training data. It is based on local affinities between multidimensional data points and on nonlinear embedding of the ambient space into a lower dimensional space, described as a manifold, by using a low rank matrix decomposition. The nonparametric nature of this analysis uncovers the important underlying factors of the input data and reveals the intrinsic geometry of the data represented by the embedded manifold. This manifold describes geometrically what we classify as the normal profile in the ambient data. Newly arrived ndimensional data points, which did not participate in the training procedure, are embedded into the lower dimensional space by the application of an outofsample extension algorithm. If the embedded ndimensional data point falls into the manifold where most of the normal data reside, it is classified as normal; otherwise it is classified as abnormal (anomalous). The exchange of data between the ambient space and the manifold, where the detection takes place, does not degrade the coherency and the completeness of the data and preserves the geometrical relations (affinities) between the two spaces – ambient and embedded (manifold).
c.1.1 Diffusion geometry: outline
Let be a dataset and let be a symmetric pointwise positive kernel that defines a connected, undirected and weighted graph over . Then, a random walk over is defined by the rowstochastic transition probabilities matrix , where is an matrix whose entries are and is the diagonal degrees matrix whose th element is The vector is referred to as the degrees vector of the graph defined by .
The associated timehomogeneous random walk , is defined via the conditional probabilities on its statespace : assuming that the process starts at time , then for any time point , where is the th entry of the th power of the matrix . As long as the process is aperiodic, it has a unique stationary distribution which is the steady state of the process, i.e. , regardless the initial state
. This steady state is the probability distribution resulted from
normalization of the degrees vector , i.e.,(1) 
where . The diffusion distances at time are defined by the metric ,
(2)  
By definition, , the th row of , is the probability distribution over after time steps given that the initial state is . Therefore, the diffusion distance from Eq. 2 measures the difference between two propagations along time steps: the first is originated in and the second in . Weighing the metric by the inverse of the steady state results in ascribing high weight for similar probabilities on rare states and vice versa. Thus, a family of diffusion geometries is defined by Eq. 2, each corresponds to a single time step .
Due to the above interpretation, the diffusion distances are naturally utilized for multiscale clustering since they uncover the connectivity properties of the graph across time. In Bérard et al. (1994); Coifman and Lafon (2006a) it has been proven that under some conditions, if is sampled from a low intrinsic dimensional manifold then, as tends to infinity, the defined random walk converges to a diffusion process over the manifold.
c.2 Randomized LU decomposition: An algorithm for dictionary construction
A dictionary construction algorithm is presented. It is based on a lowrank matrix factorization being achieved by the application of the randomized LU decomposition (Shabat et al., 2018b) to a training data. This method is fast, scalable, parallelizable, consumes low memory, outperforms SVD in these categories and works also extremely well on large sparse matrices. In contrast to existing methods, the randomized LU decomposition constructs an undercomplete dictionary, which simplifies both the construction and the classification processes of newly arrived multidimensional data points. The dictionary construction is generic and general that fits different applications.
The randomized LU algorithm, which is applied to a given training data matrix of multidimensional data points and features, decomposes into two matrices and . The size of
is determined by the decaying spectrum of the singular values of the matrix
, and bounded by . Both and are linearly independent.The randomized LU decomposition algorithm (see, Figure 5) computes the rank LU approximation of a full matrix (Algorithm 1). The main building blocks of the algorithm are random projections and Rank Revealing LU (RRLU) (Pan, 2000) to obtain a stable lowrank approximation for an input matrix that is classified as a training data. In Figure 5 ‘II’ describes the generation of a dictionaries by calling item I that describes the flow of the randomized LU decomposition. The end of the execution of ‘I’ means that the training is completed. The dictionaries are the input of ‘II’ that performs the identification. Newly arrived data point that did not participate in the training is either span (classified as normal) or not spanned by the dictionary (classified as anomalous).
The RRLU algorithm, used in Algorithm 1
, reveals the connection between LU decomposition of a matrix and its singular values. Similar algorithms exist for rank revealing QR decompositions (see, for example
Gu and Eisenstat (1996)).Theorem C.1 (Pan (2000)).
Let be an matrix (). Given an integer , then the following factorization
(3) 
holds where is a lower triangular with ones on the diagonal, is an upper triangular, and are orthogonal permutation matrices. Let be the singular values of , then:
(4) 
and
(5) 
Based on Theorem C.1, we have the following definition:
Definition C.1 (RRLU Rank Approximation denoted RRLU).
Lemma C.2 ( Shabat et al. (2018b) RRLU Approximation Error).
The error of the RRLU approximation of is
(7) 
Algorithm 1 describes the flow of the RLU decomposition algorithm.
c.2.1 Randomized LU Based Classification Algorithm
Based on Section C.2, we apply the randomized LU decomposition (Algorithm 1) to matrix , yielding . The outputs and are orthogonal permutation matrices. Theorem C.3 shows that forms (up to a certain accuracy) a basis to . This is the key property of the classification algorithm.
Theorem C.3 ( Shabat et al. (2018b)).
Given a matrix . Its randomized LU decomposition is . Then, the error of representing by satisfies:
(8) 
Let be a multidimensional data point and is a dictionary. The distance between and the dictionary is defined by , where is the pseudoinverse of the matrix . If then is normal otherwise it is anomalous.
References
 Ansdell et al. (2018) Ansdell, M., Ioannou, Y., Osborn, H.P., Sasdelli, M., 2018 NASA Frontier Development Lab Exoplanet Team, Smith, J.C., Caldwell, D., Jenkins, J.M., Räissi, C., Angerhausen, D., NASA Frontier Development Lab Exoplanet Mentors, ., 2018. Scientific Domain Knowledge Improves Exoplanet Transit Classification with Deep Learning. Astrophys. J. Lett. 869, L7. doi:10.3847/20418213/aaf23b, arXiv:1810.13434.
 Bérard et al. (1994) Bérard, P., Besson, G., Gallot, S., 1994. Embedding riemannian manifolds by their heat kernel. Geometric and Functional Analysis GAFA 4, 373–398.
 Borucki et al. (2010) Borucki, W.J., Koch, D., Basri, G., Batalha, N., Brown, T., Caldwell, D., Caldwell, J., ChristensenDalsgaard, J., Cochran, W.D., DeVore, E., Dunham, E.W., Dupree, A.K., Gautier, T.N., Geary, J.C., Gilliland, R., Gould, A., Howell, S.B., Jenkins, J.M., Kondo, Y., Latham, D.W., Marcy, G.W., Meibom, S., Kjeldsen, H., Lissauer, J.J., Monet, D.G., Morrison, D., Sasselov, D., Tarter, J., Boss, A., Brownlee, D., Owen, T., Buzasi, D., Charbonneau, D., Doyle, L., Fortney, J., Ford, E.B., Holman, M.J., Seager, S., Steffen, J.H., Welsh, W.F., Rowe, J., Anderson, H., Buchhave, L., Ciardi, D., Walkowicz, L., Sherry, W., Horch, E., Isaacson, H., Everett, M.E., Fischer, D., Torres, G., Johnson, J.A., Endl, M., MacQueen, P., Bryson, S.T., Dotson, J., Haas, M., Kolodziejczak, J., Van Cleve, J., Chandrasekaran, H., Twicken, J.D., Quintana, E.V., Clarke, B.D., Allen, C., Li, J., Wu, H., Tenenbaum, P., Verner, E., Bruhweiler, F., Barnes, J., Prsa, A., 2010. Kepler PlanetDetection Mission: Introduction and First Results. Science 327, 977. doi:10.1126/science.1185402.
 Brown et al. (2011) Brown, T.M., Latham, D.W., Everett, M.E., Esquerdo, G.A., 2011. Kepler Input Catalog: Photometric Calibration and Stellar Classification. Astron. J. 142, 112. doi:10.1088/00046256/142/4/112, arXiv:1102.0342.
 Catanzarite (2015) Catanzarite, J.H., 2015. Autovetter Planet Candidate Catalog for Q1Q17 Data Release 24. KSCI19091001, NASA Ames Research Center, Moffett Field, CA.
 Christiansen et al. (2012) Christiansen, J.L., Jenkins, J.M., Caldwell, D.A., Burke, C.J., Tenenbaum, P., Seader, S., Thompson, S.E., Barclay, T.S., Clarke, B.D., Li, J., Smith, J.C., Stumpe, M.C., Twicken, J.D., Cleve, J.V., 2012. The derivation, properties, and value of kepler’s combined differential photometric precision. Publications of the Astronomical Society of the Pacific 124, 1279–1287. URL: https://doi.org/10.1086%2F668847, doi:10.1086/668847.
 Coifman and Lafon (2006a) Coifman, R.R., Lafon, S., 2006a. Diffusion maps. Applied and Computational Harmonic Analysis 21, 5 – 30.
 Coifman and Lafon (2006b) Coifman, R.R., Lafon, S., 2006b. Geometric harmonics: a novel tool for multiscale outofsample extension of empirical functions. Applied and Computational Harmonic Analysis 21, 31–52.
 Coughlin et al. (2016) Coughlin, J.L., Mullally, F., Thompson, S.E., Rowe, J.F., Burke, C.J., Latham, D.W., Batalha, N.M., Ofir, A., Quarles, B.L., Henze, C.E., Wolfgang, A., Caldwell, D.A., Bryson, S.T., Shporer, A., Catanzarite, J., Akeson, R., Barclay, T., Borucki, W.J., Boyajian, T.S., Campbell, J.R., Christiansen, J.L., Girouard, F.R., Haas, M.R., Howell, S.B., Huber, D., Jenkins, J.M., Li, J., PatilSabale, A., Quintana, E.V., Ramirez, S., Seader, S., Smith, J.C., Tenenbaum, P., Twicken, J.D., Zamudio, K.A., 2016. Planetary Candidates Observed by Kepler. VII. The First Fully Uniform Catalog Based on the Entire 48month Data Set (Q1Q17 DR24). Astrophys. J. Supp. 224, 12. doi:10.3847/00670049/224/1/12, arXiv:1512.06149.
 Dattilo et al. (2019) Dattilo, A., Vanderburg, A., Shallue, C.J., Mayo, A.W., Berlind, P., Bieryla, A., Calkins, M.L., Esquerdo, G.A., Everett, M.E., Howell, S.B., Latham, D.W., Scott, N.J., Yu, L., 2019. Identifying Exoplanets with Deep Learning. II. Two New SuperEarths Uncovered by a Neural Network in K2 Data. Astron. J. 157, 169. doi:10.3847/15383881/ab0e12, arXiv:1903.10507.
 Golovin et al. (2017) Golovin, D., Solnil, B., Moitra, S., Kochanski, G., Karro, J., D., S., 2017. Google Vizier: A Service for BlackBox Optimization. ACM ISBN 9781450348874/17/08, 1487. doi:10.1145/3097983.3098043.
 Gu and Eisenstat (1996) Gu, M., Eisenstat, S.C., 1996. Efficient algorithms for computing a strong rankrevealing QR factorization. SIAM Journal on Scientific Computing 17, 848–869.
 Jenkins et al. (2010a) Jenkins, J.M., Caldwell, D.A., Chandrasekaran, H., Twicken, J.D., Bryson, S.T., Quintana, E.V., Clarke, B.D., Li, J., Allen, C., Tenenbaum, P., Wu, H., Klaus, T.C., Cleve, J.V., Dotson, J.A., Haas, M.R., Gilliland, R.L., Koch, D.G., Borucki, W.J., 2010a. INITIAL CHARACTERISTICS OF KEPLER LONG CADENCE DATA FOR DETECTING TRANSITING PLANETS. Astrophys. J. Lett. 713, L120–L125. URL: https://doi.org/10.1088%2F20418205%2F713%2F2%2Fl120, doi:10.1088/20418205/713/2/l120.
 Jenkins et al. (2010b) Jenkins, J.M., Caldwell, D.A., Chandrasekaran, H., Twicken, J.D., Bryson, S.T., Quintana, E.V., Clarke, B.D., Li, J., Allen, C., Tenenbaum, P., Wu, H., Klaus, T.C., Middour, C.K., Cote, M.T., McCauliff, S., Girouard, F.R., Gunter, J.P., Wohler, B., Sommers, J., Hall, J.R., Uddin, A.K., Wu, M.S., Bhavsar, P.A., Cleve, J.V., Pletcher, D.L., Dotson, J.A., Haas, M.R., Gilliland, R.L., Koch, D.G., Borucki, W.J., 2010b. OVERVIEW OF THE KEPLER SCIENCE PROCESSING PIPELINE. Astrophys. J. Lett. 713, L87–L91. URL: https://doi.org/10.1088%2F20418205%2F713%2F2%2Fl87, doi:10.1088/20418205/713/2/l87.
 Johnson and Lindenstrauss (1984) Johnson, W.B., Lindenstrauss, J., 1984. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics 26, 1.
 Koch et al. (2010) Koch, D.G., Borucki, W.J., Basri, G., Batalha, N.M., Brown, T.M., Caldwell, D., ChristensenDalsgaard, J., Cochran, W.D., DeVore, E., Dunham, E.W., Gautier, T.N., Geary, J.C., Gilliland, R.L., Gould, A., Jenkins, J., Kondo, Y., Latham, D.W., Lissauer, J.J., Marcy, G., Monet, D., Sasselov, D., Boss, A., Brownlee, D., Caldwell, J., Dupree, A.K., Howell, S.B., Kjeldsen, H., Meibom, S., Morrison, D., Owen, T., Reitsema, H., Tarter, J., Bryson, S.T., Dotson, J.L., Gazis, P., Haas, M.R., Kolodziejczak, J., Rowe, J.F., Cleve, J.E.V., Allen, C., Chandrasekaran, H., Clarke, B.D., Li, J., Quintana, E.V., Tenenbaum, P., Twicken, J.D., Wu, H., 2010. KEPLER MISSION DESIGN, REALIZED PHOTOMETRIC PERFORMANCE, AND EARLY SCIENCE. Astrophys. J. Lett. 713, L79–L86. URL: https://doi.org/10.1088%2F20418205%2F713%2F2%2Fl79, doi:10.1088/20418205/713/2/l79.
 Mandel and Agol (2002) Mandel, K., Agol, E., 2002. Analytic Light Curves for Planetary Transit Searches. Astrophys. J. Lett. 580, L171–L175. doi:10.1086/345520, arXiv:astroph/0210099.
 Osborn et al. (2020) Osborn, H.P., Ansdell, M., Ioannou, Y., Sasdelli, M., Angerhausen, D., Caldwell, D., Jenkins, J.M., Räissi, C., Smith, J.C., 2020. Rapid classification of TESS planet candidates with convolutional neural networks. Astron. Astrophys. 633, A53. doi:10.1051/00046361/201935345, arXiv:1902.08544.
 Pan (2000) Pan, C.T., 2000. On the existence and computation of rankrevealing LU factorizations. Linear Algebra and its Applications 316, 199–222.
 Ricker et al. (2014) Ricker, G.R., Winn, J.N., Vanderspek, R., Latham, D.W., Bakos, G.Á., Bean, J.L., BertaThompson, Z.K., Brown, T.M., Buchhave, L., Butler, N.R., Butler, R.P., Chaplin, W.J., Charbonneau, D., ChristensenDalsgaard, J., Clampin, M., Deming, D., Doty, J., De Lee, N., Dressing, C., Dunham, E.W., Endl, M., Fressin, F., Ge, J., Henning, T., Holman, M.J., Howard, A.W., Ida, S., Jenkins, J., Jernigan, G., Johnson, J.A., Kaltenegger, L., Kawai, N., Kjeldsen, H., Laughlin, G., Levine, A.M., Lin, D., Lissauer, J.J., MacQueen, P., Marcy, G., McCullough, P.R., Morton, T.D., Narita, N., Paegert, M., Palle, E., Pepe, F., Pepper, J., Quirrenbach, A., Rinehart, S.A., Sasselov, D., Sato, B., Seager, S., Sozzetti, A., Stassun, K.G., Sullivan, P., Szentgyorgyi, A., Torres, G., Udry, S., Villasenor, J., 2014. Transiting Exoplanet Survey Satellite (TESS). volume 9143 of Society of PhotoOptical Instrumentation Engineers (SPIE) Conference Series. p. 914320. doi:10.1117/12.2063489.
 Schwarz (1978) Schwarz, G., 1978. Estimating the dimension of a model. Ann. Statist. 6, 461–464. URL: https://doi.org/10.1214/aos/1176344136, doi:10.1214/aos/1176344136.
 Shabat et al. (2018a) Shabat, G., Segev, D., Averbuch, A., 2018a. Uncovering unknown unknowns in financial services big data by unsupervised methodologies: Present and future trends, in: Proceedings of Machine Learning Research, KDD 2017 Workshop on Anomaly Detection in Finance, pp. 8–19.
 Shabat et al. (2018b) Shabat, G., Shmueli, Y., Aizenbud, Y., Averbuch, A., 2018b. Randomized LU decomposition. Applied and Computational Harmonic Analysis 44, 246–272.
 Shallue and Vanderburg (2018) Shallue, C.J., Vanderburg, A., 2018. Identifying Exoplanets with Deep Learning: A Fiveplanet Resonant Chain around Kepler80 and an Eighth Planet around Kepler90. Astron. J. 155, 94. doi:10.3847/15383881/aa9e09, arXiv:1712.05044.
 Yu et al. (2019) Yu, L., Vanderburg, A., Huang, C., Shallue, C.J., Crossfield, I.J.M., Gaudi, B.S., Daylan, T., Dattilo, A., Armstrong, D.J., Ricker, G.R., Vanderspek, R.K., Latham, D.W., Seager, S., Dittmann, J., Doty, J.P., Glidden, A., Quinn, S.N., 2019. Identifying Exoplanets with Deep Learning. III. Automated Triage and Vetting of TESS Candidates. Astron. J. 158, 25. doi:10.3847/15383881/ab21d6, arXiv:1904.02726.
 Zucker and Giryes (2018) Zucker, S., Giryes, R., 2018. Shallow Transits—Deep Learning. I. Feasibility Study of Deep Learning to Detect Periodic Transits of Exoplanets. Astron. J. 155, 147. doi:10.3847/15383881/aaae05, arXiv:1711.03163.