The Benefits of Automated Machine Learning in Hospitality: A Step-By-Step Guide and AutoML Tool

The manuscript presents a tool to estimate and predict data accuracy in hospitality by means of automated machine learning (AutoML). It uses a tree-based pipeline optimization tool (TPOT) as a methodological framework. The TPOT is an AutoML framework based on genetic programming, and it is particularly useful to generate classification models, for regression analysis, and to determine the most accurate algorithms and hyperparameters in hospitality. To demonstrate the presented tool’s real usefulness, we show that the TPOT findings provide further improvement, using a real-world dataset to convert key hospitality variables (customer satisfaction, loyalty) to revenue, with up to 93% prediction accuracy on unseen data.


2-Literature Review
AutoML is an AI tool tailored to consumer behavior research and is well-established as an efficient computer science research tool due to its relative effectiveness compared to computational effort [2]. However, few studies have relied on AutoML to estimate consumer behavior outcomes. The work of Ferreira [4] is an exception and uses unsupervised ML algorithms, such as self-organizing maps and agglomerative hierarchical clustering, to construct descriptive models capable of finding clusters of over-indebted consumers. AutoML enabled the authors to test and compare 32,730 predictive models to produce an algorithm that predicted an external dataset's indebtedness with up to 89.5% accuracy. AutoML offers hospitality scholars and practitioners several benefits. It helps gauge out-of-sample accuracy to provide a hedge against overfitting. It also provides high levels of predictive accuracy [4], building simplified models that provide viable alternatives to logit models [21]. Although ML research in general has grown recently in business research and practice (for a detailed review, please see [4]), researchers have rarely explored these benefits of AutoML [22].
We aim to illustrate a step-by-step approach to using AutoML in consumer and market prediction studies; the illustration offers insights into how to convert loyalty and customer satisfaction into revenue. AutoML requires only a few steps and limited programming knowledge from researchers. The tool we discuss can be widely applied with only basic familiarity with ML algorithms. Researchers can download conveniently referenced packages in R or Python to implement and compare multiple ML algorithms including decision trees, random forests, artificial neural networks, LASSO (least absolute shrinkage and selection operator), and regression models. However, although we acknowledge that R and Python are widely applied in data science, this manuscript focuses on how to produce results through Python for two reasons: (a) Python provides a more general approach to data science, and it (b) provides a TPOT library for AutoML, based on genetic programming (GP) (e.g., [23]). GP [24,25] is the most advanced technique in the area of evolutionary computation and since its inception, it has successfully addressed optimization tasks in various domains. The use of Python as the programming language allows for the integration of the AutoML tool with other packages that are commonly used in the ML field (i.e., NumPy, Pandas, Matplotlib, Scikit-learn, and Tensorflow). Therefore, the combination of an advanced evolutionary-based optimization process and the availability of a complete package ecosystem make the tool a suitable option in the AutoML context.

2-1-From ML to AutoML
Four basic steps characterize a standard ML workflow: (1) ingestion, (2) cleaning, (3) feature engineering, and (4) modeling. Ingestion consists of collecting data to be used in the subsequent phase. The ingestion phase can rely on a specific platform (i.e., collecting data related to a Twitter user) or specific devices (such as sensors or video cameras), or it can be performed in other simple ways with data stored in a spreadsheet. Ingestion does not benefit from the use of AutoML and therefore it is not a focus of the paper. All the TPOT tool needs is to presume that a user was able to collect data for a particular problem with the potential to be addressed by means of an ML algorithm.

2-1-1-Data Cleaning
Cleaning is a fundamental task in ML that may significantly impact model performance. In particular, the cleaning phase aims at removing or modifying data (collected in the ingestion phase) that is incorrect, irrelevant, or duplicated. In other words, the cleaning phase prepares the data for subsequent steps, ensuring that ML models are built only with high-quality data. Despite this step's apparent fundamental role, ML users who fully rely on the ML algorithm often ignore cleaning. Not even the best ML algorithm for a given instance of a problem can produce a satisfactory performance if fed poor-quality data ("garbage in, garbage out"). The cleaning phase presents an important and challenging task. Several data cleaning techniques can be considered for various purposes and according to the available data [26]. However, these steps could be considered a standard cleaning procedure: (1) removing irrelevant data, (2) detecting and fixing structural errors, (3) outlier detection, and (4) handling missing values.
In the first step, two analyses should be performed: removal of duplicate observations and removal of irrelevant observations. The presence of duplicate observations is quite common because big data are typically ingested from multiple sources. Removing irrelevant observations reduces the dataset's size and mitigates the possibility of data irrelevant to the problem at hand. For instance, if the ML task aims at analyzing the behavior of a supermarket's female clients, all observations related to male clients must be screened out.
The second step (handling structural errors) aims at increasing data. For instance, a variable reporting country may contain "Portugal" or "PT." Routines need to search for and resolve such inconsistencies. Other factors may render a data point structurally irrelevant. For instance, a variable that is invariant across all observations provides no information. The further reduction in data volume further facilitates subsequent ML steps.
Third, outlier detection aims to identify abnormal data patterns. Data analysts have multiple tools at their disposal for outlier detection [27]. A review is beyond this study scope but is presented in [28]. Removing outliers is critical because most of the ML techniques are sensitive to the presence of outliers.
Fourth, several strategies can be used for handling missing values [29]. The nature of the data (e.g., text, numbers, point in a time series) becomes a strong consideration in how to proceed. Many approaches to data cleaning exist, so selecting an appropriate procedure can be difficult [30,31]. Many data cleaning techniques can be easily automatized using the Scikit-learn ML tool [32].

2-1-2-Feature Engineering
Feature engineering aims to improve ML models' performance of ML models by creating new and possibly more informative variables, combining existing ones, or representing them in a more convenient manner. Feature engineering is a complex task that influences the resulting ML model's performance. Feature engineering most commonly uses binning, logarithmic transformation, one-hot encoding, grouping operations, feature split, and scaling. A description of the feature-engineering techniques is beyond this paper's scope, but we refer the reader to Kuhn and Johnson [29] for a complete review. What should be clear is that feature engineering is characterized by a large number of methods and that each method may have its own parameters. Feature engineering is more of an art than a science, and it is perhaps the most common source of mistakes in deploying the ML workflow.

2-1-3-Modeling
Modeling is the final task in the ML workflow. It involves the choice of an algorithm and its hyperparameter configuration. Modeling requires some experience in the ML area because no formal procedure is available for selecting the "best" algorithm to address a specific optimization problem. In fact, the no free lunch theorem states that any two ML algorithms are equivalent when their performance is averaged across all possible problems [33]. Additionally, algorithm selection is only as good as the optimization of its hyperparameters. That is, algorithms are characterized by the presence of specific parameters whose values are fundamental for achieving a robust model. For this reason, the modeling phase is a critical step that, due to the choice of the ML algorithm and its parameterization, is characterized by a high level of complexity.
Fortunately, ML practitioners can take advantage of the existing AutoML tools for building a successful ML pipeline without being a ML expert. The scientific literature reports several definitions of AutoML, which is a process designed "to reduce the demand for data scientists by enabling domain experts to build ML applications automatically without extensive knowledge of statistics and ML" [34]. In Yao et al. [35], AutoML is defined as the process that allows for the automated construction of an ML pipeline. Therefore, AutoML is particularly promising considering the hardware that is available today and the need to extract insights from a vast amount of data. Domain experts without extensive knowledge of ML can easily build a pipeline that takes care of all the steps previously discussed, from data cleaning to the definition of the final model.

2-2-Tree-Based Pipeline Optimization Tool (TPOT)
Among the AutoML packages, we adopted the TPOT to build an ML model able to achieve satisfactory performance in a complex task. The TPOT is an open-source library for performing AutoML in Python. It is based on GP [24,25,36] and provides a stochastic global search procedure to discover a top-performing model pipeline efficiently for a given dataset. In particular, one of the main advantages of the TPOT is its focus on the cleaning and feature engineering steps, which other tools often ignored, as they are only focused on the model's optimization. The choice of the TPOT in this study is related to the analysis presented in Balaji and Allen [37]. In the study, the authors benchmarked the most commonly used open-source AutoML tools, including auto-sklearn [2], TPOT, Auto ML (https://github.com/ ClimbsRocks/auto_ml), and H2o [38]. In particular, they compared the resulting models' performance on regression and classification dataset and found that auto-sklearn and the TPOT are the best performers. Therefore, we selected the TPOT for its advanced optimization process and the possibility of specifying many parameters. Expert ML practitioners and non-experts can proficiently use the tool; non-experts may rely on the default parameters, whereas ML experts may set each parameter's value. In particular, the TPOT automates the most challenging part of the ML workflow by intelligently exploring thousands of possible pipelines to find the best one for the data at hand. Figure 1 (taken from http://epistasislab.github.io/tpot/) shows the parts of the ML workflow the TPOT automates.

2-3-Advantages of Automated ML
The previous section outlined the process that data scientists need to implement whenever a business identifies a problem that can be solved with ML. Hundreds of algorithms exist for every step of data cleaning, feature engineering, and modeling. Typically, each algorithm depends on a large set of parameters, and finding the correct setting for those parameters becomes crucial for an algorithm to perform accurately. The "No Free Lunch" theorem [33] guarantees that no formal process can exist to make an optimal choice among this huge variety of algorithms and configurations. Data scientists face an extremely difficult and time-consuming process in which they must consider many algorithms and configurations. Consequently, turnaround can be slow and often take too long to capitalize on a business opportunity. AutoML is a technique that will revolutionize ML-based solutions and the way they are obtained, enabling business analysts and developers to generate ML models that can address complex scenarios [8]. The AutoML platform will abstract the steps after data ingestion. Essentially, the AutoML tool allows users to upload data, identify the labels, and "push the play button" to generate a thoroughly trained and optimized model capable of accurate prediction. When dealing with AutoML, business analysts stay focused on the business problem instead of dwelling on the ML process and workflow. AutoML handles all the steps involved in preparing the data "behind the scenes", choosing the right algorithms and optimizing and tuning the hyperparameters.
For each phase of the ML pipeline, AutoML identifies a large set of (algorithm, hyperparameter configuration) pairs. It then combines all the existing pairs in all possible combinations in an exhaustive grid-search fashion. The pipeline configuration then returns the best results. Put simply, AutoML automatizes the process that data scientists perform manually to generate a predictive model. By doing so, AutoML can test many more possible combinations than any human would ever be able to consider manually. The big benefit of AutoML is the fact that it is easy to use and does not require advanced data science skills. The only drawback is a typically large learning curve, which makes the use of powerful computational resources particularly appropriate for AutoML. AutoML represents one of the most concrete and effective efforts toward the democratization of ML, de facto representing the future of AI. It puts the power of AI into the hands of business analysts and technology decision makers.

3-AutoML Tool Using TPOT
In this tutorial, we explore how to use TPOT to search for the optimal ML pipeline automatically and how to use the resulting model to predict target variables. TPOT is described as a "genetic search algorithm" and enables researchers to find the best parameters of model ensembles [39]. As previously mentioned, TPOT relies on GP, which is a method that, applying a process that mimics the Darwinian theory of evolution, allow us to "evolve" (and thereby improve in a stepwise refinement fashion) the ML configuration until the most suitable pipeline (i.e., for instance, the one that produces the most accurate model) is found. The process is repeated for a specified number of generations before settling on a final optimal pipeline that produces the highest possible accuracy and has low complexity (e.g., the smallest number of pipeline operators [39]). The TPOT allows behavioral science researchers to predict targets using AI. Plus, it is open source (free) and runs on Python, which makes it accessible for a wide audience. The tool allows researchers to perform sophisticated analyses without the need to download Python. Furthermore, the main advantage of using the proposed tool is that researchers do not need to write Python code, as the tool is already coded for data prediction. In this section, we illustrate the AutoML methodology, by applying it to a field dataset. The section demonstrates how to use AutoML methodology and interpret the estimates. First, the section describes the input data the AutoML methodology requires and demonstrates the additional benefits of using AutoML as an alternative and effective approach to data analysis. In our applications, the TPOT library will be programmed to minimize the error for the ML pipeline. The application is publicly available as a "Google Colab" Python notebook. Google Colab allows users to write and execute Python code through the browser and is especially well suited for ML, data analysis, and education (e.g., [40]). This setup is easy to use, is user-friendly and requires only a few steps as described below. In this tutorial, we will explore how to use the TPOT to tune ML models automatically to the considered data. In the first part of the tutorial, we develop classification models, and in the second part, we develop a model for regression.

3-1-Using AutoML to Predict Customer Loyalty, Satisfaction and Revenue
We will use AutoML to predict customer loyalty (classification tool), satisfaction (regression tool), and revenue (regression tool) ( Table 1). We provide a step-by-step guide on how to use the AutoML tool in the Appendix I. With the field data, we aim to investigate consumers' evaluations of a major European hotel chain, including key variables, such as overall evaluation and loyalty. The hotel chain has several branches operating across Europe and America and offers high-end hospitality services. The survey data represent all branches and consist of 284,229 samples from consumers who stayed in the hotel between 2009 and 2019. Consumers measured satisfaction with multiple attributes on a 5-point Likert scale (i.e., "staff friendliness", "cleanliness", "location", "room quality", "sports facilities", "price", "comfort", "food quality, and "room amenities". The attribute evaluation of the hotel experience shows good reliability (α=.80). The dataset also includes overall satisfaction in a single item ("Overall Satisfaction" 5 ⭐, 4 ⭐, 3 ⭐, 2 ⭐, 1 ⭐, N/A). * For reviewers only. For confidentiality purposes, we cannot display actual revenue results, as they are part of our non-disclosure agreement (NDA) with the company.
Following the steps described in Section 2-1, the flowchart in Figure 2 summarizes the research methodology.

Figure 2. Flowchart of the methodology
We started by retrieving the data from the surveys the considered European hotel chain provided. The dataset contains all the available features and the target variables (i.e., loyalty, customer satisfaction, and estimated revenue). Subsequently, in the data cleaning step, we analyzed the data to detect outliers and handle missing values. The data did not present outliers and missing values, so we maintained all the samples. In the experimental phase, we employed the TPOT AutoML framework to construct a machine-learning pipeline automatically. In particular, as described in Section 4, we used the TPOT to build a classification model and two regression models. Finally, we validated the best pipeline the TPOT returned, following a 3-fold cross-validation procedure. We discuss all the details concerning the experiments performed with the TPOT, the results achieved, and the respective validation in Section 4.

4-Interpreting the AutoML Tool Results
In this section, we present and analyze the ML models that the TPOT produced. In particular, a classification model for predicting customer loyalty (described in Section 4.1) and two regression models (Section 4.2), one for predicting customer satisfaction and the other for revenue prediction, were built. Given that this work represents the first effort to apply AutoML to this type of application, no comparison with previous results is possible. However, we aimed to obtain the generated models' greatest possible predictive accuracy.

4-1-AutoML Classification Tool Results
Here, we describe the ML model the AutoML tool produced as a result of the TPOT's evolutionary search process. The target variable represents the consumer's loyalty to the hotel. The TPOT allows us to export the best model directly as a Python script. The Python script includes the whole pipeline for the dataset it was trained on and all the hyperparameters for the model. Focusing on the analysis of the best model, the AutoML returned the following best ML pipeline: ExtraTreesClassifier(input_matrix, bootstrap=False, criterion=entropy, max_features=0.3, min_samples_leaf=20, min_samples_split=16, n_estimators=100).
The analyses of the model reveal that the best modeling algorithm to predict customer loyalty is an ExtraTrees classifier, which is an ensemble of decision trees. A decision tree [41] is a tree-like structure composed of multiple nodes. Each of the tree's internal nodes contains a decision variable, and it has a number of branches that is equal to the number of values the decision variable can assume. Leaf nodes (also called external nodes) contain the class labels. A decision tree works by splitting the decision variable space sequentially in a set of partitions and subpartitions to form homogenous classes in terms of target variables. Practically, decision trees have roots in familiar tools, such as chisquare automatic interaction detection (CHAID). Instead of relying on a single predictor's output, the ensemble of decision trees considers the predictions of multiple models (called "weak learners") to obtain a more robust and reliable prediction. The ensemble predicts the class label through a majority voting procedure: the class that the majority of the weak learners predicts is the ensemble model's prediction. The weak learners the ExtraTrees ensemble model uses (i.e., the decision trees) are built by considering various sub-samples of the training set, thus ensuring diversity among the weak learners and controlling overfitting. To build a decision tree, the following iterative process is applied starting at the root node: 1. Let "A" be the best decision attribute for the next node. 2. Use "A" as a decision attribute for the current node To select the "best" decision attribute for the split, Gini impurity or information gain can be used [42]. Despite the selected splitting criterion, the main idea is to consider the attribute that allows for better separation of the samples belonging to different classes. The set of hyperparameters includes the splitting criterion, the tree's maximum depth, the minimum number of samples required to be at a leaf, and the minimum number of samples required to split an internal node. For full reference, we refer the reader to the Scikit-learn documentation [32]. Once the decision tree is built, a new sample can be classified following the path from the root of the tree to a leaf node and assigning the new sample the corresponding label.
To classify customer loyalty, the model obtained consists of 100 decision trees (i.e., estimators). The model's parameters included at least 20 samples per leaf and 16 samples minimum per split. The number of features to consider when looking for the best split (max_features parameter) includes 30% of the attributes. Entropy was the splitting criterion (i.e., each split maximizes the information gain). The model yielded an accuracy of 0.82 and an F1 score of 0.78. The F1 score (a combined metric of precision and recall) indicates that the precision and recall values are satisfactory. In particular, the F1 score suggests that the model is precise and robust in predicting loyalty, as it correctly classified 82% of the samples.

4-2-AutoML Regression Tool Results
In this section, we describe the best pipeline the TPOT produced for the problem of predicting customer satisfaction. In this case, the considered task is a regression problem. The pipeline's fitness (i.e., objective function) is defined by its complexity (i.e., number of steps) and a metric that quantifies the difference between the predicted and actual satisfaction (in this case, given that more than one criterion are optimized at a time, we say that GP is working as a multi-objective optimization framework). We applied the TPOT to the data for 10 generations (with a population size of 10) to find the best-fitted ML pipeline. We subsequently evaluated the optimal pipeline the TPOT suggested on a validation dataset, using 3-fold cross-validation. Using the AutoML tool on the data considered, the best pipeline returned is the following: ExtraTreesRegressor(LinearSVR (GradientBoostingRegressor(input_matrix, alpha=0. In this case, the pipeline consists of an ExtraTree regressor, a linear SVR, and a gradient boosting regressor (GB) [43]. The first ML model applied to the training set is the GB, which is a particular type of ensemble model. The working principle is simple yet effective: the idea is to combine multiple models' predictions so that the best possible subsequent model, when combined with the previously defined models, minimizes the prediction error. To implement this idea, the GB builds an additive model in a forward stage-wise manner, allowing for the optimization of arbitrary differentiable loss functions [32]. In particular, in each iteration, the GB fits a regression tree on the negative gradient of the given loss function. Therefore, at each iteration, the newly created model aims to minimize the error resulting from the application of the previously created models. The construction of the regression trees follows the same procedure explained in Section 4.1 for the construction of a decision tree. However, in this case, the sequence of splits is stopped when a further sub-partition is believed to decrease the mean square error of the target variables non-significantly. The leaf nodes contain the decision rules on which the target variable predictions are based. The GB model uses 100 predictors, and the specified loss (quantile) refers to quantile regression (i.e., an extension of linear regression used when the conditions of linear regression are not met), with alpha=0.85. The learning rate shrinks each tree's contribution by 0.01 (i.e., this value is related to the choice of using 100 estimators), the maximum depth for each estimator (i.e., regression tree) is two, and the fraction of samples to be used to fit the regression trees is 0.8. For a complete description of the parameters, we refer the reader to the Scikit-learn documentation [32].
The subsequent step in the pipeline is to provide the GB's output to the linear SVR, a linear support vector regression [44]. An SVR maps the data into a k-dimensional feature space through a nonlinear mapping. The idea is to allow a linear regression model to be fitted to the data points in the k-dimensional feature space. The obtained linear model is then used to make predictions in the new feature space. The kernel function defines the mapping from the input space into the new feature space [45]. One feature of SVR is related to the model errors: instead of minimizing the observed training error, SVR minimizes a combination of the training error and a regularization term and is meant to improve the model's generalization ability [46]. The pipeline considers an SVR with the following parameters: the regularization term (C) is equal to 20, the loss is the L2 loss (indicated as 'squared_epsilon_insensitive') [47] with epsilon=0.001, and the tolerance (tol) for the stopping criteria is 0.01.
As a final step in the pipeline, an ExtraTree regressor is used to produce the final prediction. The ExtraTree takes as input the outputs of the SVR model and, as discussed in Section 4.1, builds an ensemble of randomized trees. In this case, different from Section 4.1, in which the ensemble outputs the predicted class label, the ensemble predicts a real value that corresponds to the predicted customer satisfaction. The ExtraTree regressor contains 100 estimators, eight samples minimum per leaf, and 14 samples minimum per split. The model yielded an R² of 0.50, revealing that only 50% of the data fits the regression model. Table 2 lists the values of the metrics referring to the best model. In this case, the model produced a poor performance due to the lack of predictive attributes in the considered dataset. In other words, the collected data are not sufficient to allow the model to produce an accurate prediction of customer satisfaction. This is evident when we consider the R 2 and the explained variance in the model. As a final experiment, to show that the AutoML tool can help predict revenue from loyal customers, we created a new variable in the dataset (estimated revenue). We took the median hotel price according to season, room category, and length of stay. As we did to predict customer satisfaction, we ran the TPOT for 10 generations (with a population size of 10) to find the best-fitted ML pipeline. We subsequently evaluated the optimal pipeline using 3-fold cross-validation. Using the AutoML tool on the considered dataset, the best pipeline that was returned is the following: ExtraTreesRegressor (PCA(input_matrix, iterated_power=2, svd_solver=randomized),  bootstrap=True, max_features=0.85, min_samples_leaf=6, min_samples_split=15,  n_estimators=100) The pipeline performs an initial principal component analysis [48] to reduce the dataset's dimensionality and remove the less predictive attributes. In particular, PCA relies on the singular value decomposition of the data to project it to a lower-dimensional space. After the execution of PCA, the resulting dataset was provided as input to an ExtraTree regressor to produce the final prediction. The ExtraTree model consists of 100 estimators. In each estimator (i.e., regression tree), 85% of the attributes are considered when looking for the best split. The minimum number of samples required to split an internal node is 15, and the minimum number of samples required to be at a leaf node is six. The model yielded an R² of 0.93, suggesting that 93% of estimated revenue data fits the regression model. This result is a clear indication of the obtained model's suitability in providing an accurate prediction of the revenue. This result is further strengthened by the explained variance, which is 0.938. Therefore, we have clear evidence of the model's robustness and ability to address the problem at hand.

5-Discussion
The current practice of using AutoML in theory testing has significantly increased across various scientific fields, including financial economics [49], educational research [50], clinical research [51], and neuroscience [39]. To this end, the application of AutoML in business research is still in its infancy [4,52]. Furthermore, only a few of the newly developed methods have been implemented in consumer research [52,53]. Notably, recent technological advancements in automation and ML are essential for revolutionizing traditional consumer research and development in industry and academia, allowing researchers to generate ML models and solve complex scenarios [53]. We examine how loyalty and satisfaction can be predicted using AI tools. Overall, our findings extend prior studies on consumer loyalty in hospitality [54,55] by showing how loyalty can be predicted. Therefore, the current study contributes to emerging studies on AI's impact on data prediction [56,57]. However, unlike prior studies, our article provides a tool that can benefit academic research in solving complex problems. Furthermore, we contribute to recent hospitality research on the use of AI [18]. Prior hospitality research has mainly focused on the factors that predict hotel booking cancelations [58] or other factors, such as occupancy rates [59]. By introducing the AutoML tool, we make valuable and important contributions. AI tools have proven abilities to tackle complex data empirically [60], provide accurate estimations, and solve complex phenomena [4]. One of the difficulties in using AI is that a simple task is extremely difficult to program [8,60], as it requires training and expertise in ML. Therefore, AutoML offers the hospitality field (in academia and industry) a simple method that enables researchers to take advantage of ML without being ML experts.
The proposed tool implements AutoML in one easy-to-use package. It is user friendly, widely accessible, practical, and tailored to provide more accurate estimations. We expect hospitality and behavioral researchers to start relying on novel applications and advanced tools in their future research. Ultimately, we believe that AutoML tools may increase the scope of hospitality research, not just by providing powerful data analysis but also by addressing complex research questions. AutoML can abstract all the steps after data ingestion. Essentially, users bring their data, identify the labels, and "push a button" to generate a thoroughly trained and optimized model ready to make an accurate prediction. When dealing with AutoML, business analysts often remain focused on the business problem instead of dwelling on the ML process and workflow. AutoML handles the "behind the scenes" steps involved in preparing the data, choosing the right algorithms, and optimizing and tuning the hyperparameters. For each ML phase, AutoML identifies a large set of (algorithm, hyperparameters, and configuration) tuples. It then analyzes the space of all the existing tuples in all possible combinations, searching for the best one. Technologically advanced methods, such as GP allow AutoML not to have to consider the huge space of all possible tuples but to navigate it intelligently. The outcome is the pipeline configuration that was able to return the best results. Put in simple terms, AutoML simply automates the process that data scientists perform manually when they have to generate a predictive model.
In sum, the study presents practical ways to analyze data using AutoML, based on simple steps, including a practical research tool. We believe that behavioral and social science researchers can conduct robust and powerful analyses without the need for programming. Therefore, we encourage researchers in the behavioral and social science fields to use AutoML for their benefit. Relying on AutoML would not only supplement the data analysis but also help synthesize the results and provide more accurate outcomes. Therefore, this tutorial can be used as a starting point for behavioral researchers who aim to benefit from the use of ML. Table 3 highlights the AutoML tool's key advantages and disadvantages. The TPOT uses genetic programming to explore the search space, and this can relieve the system from an exhaustive analysis of all the possible combinations.
Only for numerical data and restricted to quantitative research No need to program to create a specific algorithm or coding. It is only designed for data analysis purposes Relies on specialized libraries Analyzing larger generations will take a longer time The TPOT can be employed to execute fast models and potentially increase the productivity of analyses without hampering prediction performance It only performs regressions and builds classification models

6-Conclusion
This study contributes to the field by presenting an AutoML tool for hospitality. Given AI's prevalence in today's business environment, AutoML's main practical value is its ability to predict consumers' responses (e.g., satisfaction, loyalty) and forecast revenue from loyal customers. Indeed, the application of AutoML may enhance future firms' performance. One key issue that managers face in most industries is their insufficient understanding of AI and how to implement it in their daily practice [61,62]. Furthermore, one of the complexities that faces managers is the difficulty that is associated with analyzing large and complex datasets [61]. Notably, this tool is beneficial for practitioners. For example, companies and service providers, including but not limited to hotels, restaurants, and retailers, can benefit from using this tool to test and predict their data (e.g., surveys, observations, etc.).
However, this study has some limitations that might inspire further research. The use of AI in businesses has been a challenge for many practitioners [63]. Many firms may not have the capacity or resources to build or implement AI models, and in most cases, they rely on data scientists or consulting firms to perform complex data analysis [63]. AutoML facilitates the application of AI in the area of hospitality by offering a free, open, and easy-to-use tool. Despite its importance, many hospitality providers might not be able to use this tool. Future research can be conducted on the implementation of AutoML tools in the hospitality domain.
The key results of this study are the 93% predicting power of estimated revenue data, which suggests that by using the AutoML tool, practitioners can determine in advance when their businesses will have more success and when that is not the case. This knowledge can help hospitality and business managers create effective seasonal offers to boost occupation in the periods in which they are expected to have lower demand. More importantly, we contribute to new knowledge in the domain, bridging the gap between advanced ML models and hospitality/business practice. By doing so, we contribute to recent research in the field (cite), providing practitioners with more technological development. Importantly, none of these prior studies in hospitality employed an AutoML tool to predict revenue, which would contribute to knowledge in this area. Moreover, the high accuracy obtained with this application represents an important contribution to Auto ML, confirming the TPOT's power in a domain that is completely different from the ones in which it has been used.

7-2-Data Availability Statement
Data is not publicly available given an NDA (non-disclosure agreement) between the company and the authors. However, it is available for editorial consultation, in case of need.

7-4-Institutional Review Board Statement
The project AutoML tool for hospitality was approved in 2022 by the Ethics Committee and Institutional Review Board, code DSCI2022-6-213250. Important tip: Before running the AutoML tool, you must ensure that all your data is in a numeric format (e.g., 1, 2, 3 …). ML libraries are built to work with numeric matrices. Therefore, if the dataset contains categorical variables, code them in a numeric format (e.g., female = 1, male = 2). The file must be uploaded as a CSV file to ensure the code works correctly. To view your data, double-click on File sample data, and the dependent variables' values will appear on the right side (see Figure A-3).

Figure A-3. Data display
 Once you upload your data set, scroll down to the next segment of the coding. At this point, you will see DATASET_NAME and TARGET VARIABLE (see Figure A-4). Choose the name of your data and your target variable. The target variable is the variable that you want to predict. It can be any variable (e.g., satisfaction, purchasing intentions). In this demonstration, the target variable is "overall evaluation." The name of the target variable (e.g., "DV") should be equivalent to the name of the chosen variable in your dataset (e.g., column labeled "DV"). The DATASET_NAME variable should simply reflect your custom dataset's name.  A rule of thumb, the most important parameters are generations, population, and cross-validation split. The population size is the number of individuals to retain in the GP population in every generation. In this tutorial we set N as 20. In general, the TPOT works better when the sample size is bigger [31,54]. The number of generations defines number of iterations of the pipeline optimization process.
The TPOT takes longer with more data. By default, the TPOT settings include 100 generations and 100 as the population size, producing 10,000 model configurations to evaluate with 10-fold cross-validation, which means that 100,000 models are fitted and evaluated on the training data in one grid search [55]. Cross-validation affects how long TPOT takes to evaluate models, as each model will be trained k times, where k equals the number of cross-validation splits [56]. To initialize the TPOT library, you need to click on the "play button" (see Figure (A-6)).

Figure A-6. Select the number of generations
 Tune the tool: once you have defined your parameters, including the dataset and target variable as well as the number of generations, you can execute the whole notebook by either selecting "Run all" from the Runtime menu or pressing Ctrl+F9 ( Figure A-7). The notebook will run through all its steps, as illustrated below, and print the outputs at the end. Import packages: The purpose of this test set is to later assess the model on data that has not been seen beforehand.
To do so, press play as you see in the Figure   as Figure A-9 shows. Please note that this step will take a few minutes to search for models. TPOT will run several algorithms, searching for the best pipeline model in terms of accuracy.
 To view the results, scroll down to (Results of the best model). Figure A-11 reports the model accuracy (0.79) and the F1 score (0.7700). The accuracy is calculated as the total number of correctly classified observations divided by the total number of observations (i.e., correctly and incorrectly classified). The F1 score is calculated as the weighted average of the precision and recall. Therefore, to understand the F1 score fully, it is necessary to define the concepts of precision and recall. Precision is defined as the ratio tp / (tp + fp), where tp is the number of true positives and fp the number of false positives. Recall is defined as the ratio tp / (tp + fn), where tp is the number of true positives and fn the number of false negatives. With the definition of precision and recall, it is possible to calculate the F1 score as F1 = 2 * (precision * recall) / (precision + recall) The F1 score reaches its best score at 1 and worst score at 0.  -When you perform regression, the matrix output will be regression oriented (see Figure 14-A)