Probabilistic Analysis Depending on the Distance from A COVID-19 Outbreak

COVID-19 has been affecting human beings since the end of 2019. Studying the characteristics of a COVID-19 outbreak is significant because it will add to the knowledge that is necessary for protecting the general public and controlling future viral outbreaks. The aims of the present research are to analyze COVID-19 outbreaks in Thailand depending on the distance from the outbreak center by using a differential equation, to construct a probability density function from the solution of the differential equation, and to prove the theorem for the probability density function depending on the distance from the outbreak. The least-squares-error method is adopted to estimate the parameters of the function describing the COVID-19 outbreak. Moreover, a cumulative distribution function, a quantile function, a sojourn function, a hazard function, the median, the expected value, variance, skewness

establish models for forecasting the total numbers of cases and deaths during the COVID-19 pandemic [7,8].The ordinary least-squares estimator method with multiple regression analysis has been adopted to measure the impact of travel history on confirmed COVID-19 cases [9].Logistic regression has been used to approximate the recovery probability of COVID-19 patients with respect to their demographic characteristics [10].Forecasting models based on time series such as the (seasonal) autoregressive moving average (with an exogenous repressor) model, prophet models developed by using the time series concept, and machine learning models have been employed to study the trends of COVID-19 outbreaks such as the number of confirmed, recovered, and deaths have been established in many regions and countries toward establishing epidemiological control of the disease.
Decomposition in time series analysis of the confirmed numbers of cases and deaths in a COVID-19 outbreak has been used to identify seasonal trends and establish the mortality ratio and reproduction number of the disease [11][12][13].Moreover, big data techniques, artificial intelligence procedures, data science, and machine learning algorithms have been applied to analyze COVID-19 outbreak data.For example, big data from the World Health Organization database, national databases, and social media communication databases has been utilized to forecast COVID-19 outbreaks toward establishing decision support systems for monitoring and controlling them [14,15].The predicted numbers of daily cases and deaths in various countries have been fitted with COVID-19 incidence data by using artificial neural networks (ANNs) [16].The Gompertz model, the logistic model, the von Bertalanffy growth function, and the inverse ANN (IANN) have been used to forecast and compare the predictive accuracy with the actual number of cases, deaths, and recoveries during COVID-19 outbreaks [17].The results show that the logistic model, Gompertz model, and IANN are the most suitable for forecasting the numbers of cases, deaths, and recoveries, respectively.Mathematical models for describing the spreading characteristics of a COVID-19 outbreak have been applied to several aspects to help decision-makers implement relevant policies to control it.Model stability and estimating its parameters, such as the population size, birth rate, infection rate, and recovery rate, are important for forecasting the number of cases.These values provide fundamental information for decision-makers to instigate policies for monitoring, controlling, and protecting the majority of the population during a COVID-19 outbreak.The parameters of a mathematical model, such as quarantine rate, quarantine effectiveness, vaccination rate, vaccine efficacy, and rate of immunity loss, have been investigated to monitor the impact of quarantine and vaccination on the control of COVID-19 [18].A susceptible (S), exposed (E), infectious (I), quarantined (Q), confirmed (C), recovered (R), and the concentration of the virus in the environmental reservoir (W) (SEIQCRW) transmission model, along with transmission routes from the environment to humans and human to human, have been applied to the COVID-19 outbreak to identify suitable prevention measures [19].A SIR transmission model has been utilized for decision-making on lockdowns during the COVID-19 pandemic [20].Asymptotic stability analysis of disease-free and endemic equilibria using the SEIR transmission model has been applied to establish the influences of asymptomatic cases and reinfection [21].SIR transmission explained through a logistic model has been adopted to describe and interpret a COVID-19 outbreak [22].The reproduction number from the SEIR transmission model has been utilized to analyze short-term COVID-19 incidence data [23].A SEIQRD transmission model (D is death) has been employed to investigate the transmission dynamics of the COVID-19 outbreak.The reproduction number was calculated using a next-generation matrix approach [24].SEIRD transmission models have been developed and used as a predictive model for the spread and control of COVID-19 [25] and to analyze healthcare demand and capacity for predicting and forecasting the impact of local COVID-19 outbreaks [26].A SIR transmission model has been used to compute the effects of social distancing interventions during the COVID-19 pandemic [27].Moreover, a SIRD compartmental model and variational autoencoder neural network have been used to forecast a COVID-19 pandemic in the future from the available historical big data on the subject [28].
As the above literature reviews, the models for describing the COVID-19 outbreak are based on time independent variable.For example, the forecasting models are the models with time independent variables such as the logistic and Gompertz models [1][2][3], the Bertalanffy model [4], the Boltzmann growth curve [5,6], and Regression analysis [7,8].The mathematical models, a system of differential equations, are the models with time independent variable such SEIQCRW model [19], SIR model [20], SEIR model [21], and SEIQRD [24].Most of these previous models are based on using time as the independent variable and do not involve distance and probabilistic analysis.Therefore, this research was undertaken to develop a model for describing the spread of a COVID-19 outbreak in which the computation for the models is based on probabilistic analysis by considering the distance from the center point of the outbreak.Applying a statistical control chart for monitoring and detecting the COVID-19 outbreak was also investigated.In addition, this research is applied to a case study of a COVID-19 outbreak at the Central Shrimp Market in Samut Sakhon Province, Thailand, as the center of a COVID-19 outbreak in the second wave.

2-Materials and Methods
Here, background on the spread of COVID-19 in Thailand and data collection for the probabilistic analysis of the effect of distance on the spread of COVID-19 are presented.In addition, the section contains detailed explanations of how the expected value, variance, and probabilistic analysis are derived.

2-1-Background on the COVID-19 Outbreak Depending on Distance in Thailand
The overall spread of COVID-19 has often been increased by outbreaks starting at a populous initial point of contact, such as a nightclub or market.The intensity of the viral spread has varied depending on the distance from the initial point of contact.For example, the COVID-19 outbreak at the Central Shrimp Market in Samut Sakhon province on December 23, 2020, spread as illustrated in Figure 1.At the Central Shrimp Market in Samut Sakhon province, a 67-year-old woman was identified as the first case, and hence, the point of origin of the COVID-19 second wave, which began on December 17, 2020.The virus then spread quickly to the surrounding area albeit non-uniformly.The number of positive cases as a percentage of the population tested depended on the distance from the initial point of contact.Soon after the outbreak had been identified, testing was carried out in the surrounding area comprising 2,051 samples within 2 km of the initial point of contact, of which 914 (44%) were positive, and 2,272 samples within 4 km, of which 271 (11.93%) were positive.Nine hundred and ninety samples were taken in areas that were more than 4 km from the initial point of contact, with 20 positive results (2.02%).It is clear that the distance from the initial point of contact is related to the total number of COVID-19 cases identified in this outbreak.The present research is based on that clear relationship; it is designed to study the probability of a COVID-19 outbreak depending on the distance from the outbreak center.Differential equations and probabilistic analysis are adopted to analyze the data for the COVID-19 second-wave outbreak in Thailand.The spread of the COVID-19 outbreak in Samut Sakhon province can be estimated by using the concept of the derivatives of the total number of COVID-19 cases [29].It can be seen in Figure 2 that the peak of the outbreak was around January 24, 2021.

2-2-Data Collection
Data and samples for this research consist of both real and simulated data.The real data and samples for this research comprise the total number of COVID-19 cases linked to the Central Shrimp Market in Samut Sakhon, Thailand, while the simulated data are the results of mathematical and statistical calculations.Data regarding the total number of COVID-19 cases for this research were collected in two aspects:  Before the peak outbreak stage at around one month after the first outbreak (around January 3, 2021)  The peak outbreak stage (around January 24, 2021)  After the peak outbreak stage (around February 12, 2021), which is the day with the maximum number of daily new cases and the decrease in this number thereafter.
The total number of COVID-19 cases and distances (unit: km) between two positions in Thailand are used as the observations for this research.The second wave of the COVID-19 virus began with an outbreak that spread around the Central Shrimp Market in Samut Sakhon province, Thailand, where a super spreading event began on 17 December, 2020.The total number of COVID-19 cases was gathered from Worldometer [30], where COVID-19 data such as the total number of COVID-19 cases, new cases, and total deaths are updated daily.The distances from Samut Sakhon province to the other provinces were gathered from the DistanceFromTo website [31], as shown in Figure 3.For example, the distance from Samut Sakhon to Bangkok is approximately 33.69 km, the distance from Samut Sakhon to Nakhon Pathom is 38.62 km, and so on.The total number of COVID-19 cases was gathered from December 18, 2020, to April 01, 2021.Moreover, the indexes for the distances were ranked from the shortest to the farthest with the starting point at Samut Sakhon (the center of the outbreak).For example, index 1 refers to Samut Sakhon, index 2 refers to Samut Songkhram, index 3 refers to Bangkok, and so on.

2-3-The Forecasting Model for the COVID-19 Outbreak Depending on Distance
The assumption in this research is that the distance from the centre of the COVID-19 outbreak to other points is related to the total number of COVID-19 cases.Namely, the total number of COVID-19 cases varies with the distance from the outbreak centre.The key question to answer is: How will the COVID-19 infection rate change according to the distance from the outbreak centre?Let p(s) be the total number of COVID-19 cases at each distance.The differential equation is based on the assumption that the rate of change in the total number of COVID-19 cases is directly related to the distance from the initial contact point; i.e., dp kp ds where k is the growth rate of p with respect to distance s.
When the method of separation is applied to Equation

2-4-Probability Analysis of the COVID-19 Outbreak Depending on the Distance from the Initial Contact Point
The probability density function (PDF) for the distance from the initial contact point s, which is a random variable, corresponds to the property of probability Therefore, the PDF for the assumption that the total number of COVID-19 cases will be the same at distance s is: The area to the left of 0 can be derived as 0 ≤  ≤  can be drived as 0 ( ) ( ) Likewise, the area to the right of 0 ≤  ≤  can be drived as; ( ) ( ) The area between distances a and b, where (, ) ⊆ (0, ); m=max(s), can be drivesd as The conditional probability that the total number of COVID-19 cases will be the same at a distance further away than b km when already knowing that the distance is more than a km (b>a) can be derived as Next, deriving the cumulative distribution function (CDF), , for the total number of COVID-19 cases at a certain distance is conducted as follows.
Let S be the random variable for the distance from the COVID-19 outbreak.Its relationship is as follows: Therefore, the CDF for the total number of COVID-19 cases according to distance is Next, the quantile function (QF or Qp) and the median distance for the total number of COVID-19 cases according to distance are derived as follows.
The QF: Qp=F -1 (p)=min{s∈Real;F(s)≥p};p∈(0,1) For example: The median distance is the value of s for which F(s)=0.5: [(() − 1) + 1] and the median is The properties of the PDF such as the expected value (sometimes called the first moment) and the variance, among others, are defined as follows.
The first moment (the expected value or the mean of the population): The second moment: Skewness and kurtosis are evaluated and interpreted to measure dispersion.Kurtosis is used to measure the peakedness of the PDF.Meanwhile, if the skewness is approximately zero, then the PDF is symmetrical, whereas if it is negative or positive, then it is not.Therefore, if kurt(s) is approximately zero, negative, or positive then the PDF is a symmetric, platykurtic, or leptokurtic (heavy-tailed) function, respectively Next, the sojourn function (SF) denoted as j(s) is the probability that the total number of COVID-19 cases still exist after distance s, while the hazard function (HF) denoted as h(s) is the probability that if the total number of COVID-19 cases still exist after distance s, then the total number of COVID-19 cases that do not exist after distance s is s+∆s, where ∆s is small.Thus ( ) ( ) j s pr S s  ()

2-5-Analysis of the PDF
The PDF of the total number of COVID-19 cases, which relies on the growth rate (k) and the distance (m), can be categorized into three cases.

Theorem 1:
The PDF denoted by  , () for distance m of the total number of COVID-19 cases relies on k and m.
Case 3: if k=0 and for all { 1 ≤  2 } ∈ (), then the PDF for the distance of the total number of COVID-19 cases does not exist.
Therefore, if { 1 ≤  2 } ∈ (), then the PDF that the total number of COVID-19 cases which will remain at distance  1 is more than  2 .

Proof.
Let r be a positive real number.For all  1 ,  2 ∈ () such that  1 ≤  2 , then , then case 3 holds.

2-6-Estimation of the Forecasting Model Parameters and Model Comparison
Parameter estimation is performed by using least-squares-error estimation.The error between the actual value and forecasted value is used to define the sum-of-squares error (SSE) function as follows: where pa and pf are the actual and forecasted values of the total number of COVID-19 cases, respectively.
To minimize the SSE, the critical value of the SSE function is determined by using the first partial derivative with respect to parameters C and k, which is zero.Therefore, we can respectively define Optimal values  * ,  * will occur at ( * ,  * ) > 0 and   ( * ,  * ) > 0 with respect to the minimum of the SSE function at the critical point.However, the algorithm for solving the parameter estimations is based on the Levenberge-Marquardt Algorithm [32,33].
For measuring the accuracy of the parameter estimations, we use the root-mean-squared error (RMSPE), which is a measurement of the error, and the coefficient of determination (R 2 ), which is a measurement of how much independent variable s can explain dependent variable p.These are respectively computed as where pa(s) and pf(s) are the actual and forecasted values of the total number of COVID-19 cases at any distance s, respectively, and 2

Explained variance
Total variance R  (14) where the total variance is the sum of the squares of the differences between the actual values and the mean of the actual values (i.e., total variance is ∑ (() −  ̅̅̅̅()) 2

𝑚 𝑠=1
and the explained variance is the sum of the squares of the differences between the forecasted values and the mean of the actual values; i.e., explained variance is ∑ (() −  ̅̅̅̅()) 2

𝑚 𝑠=1
. For the model comparison, the best performing one has the lowest RMSPE value and the highest R 2 value.

2-7-Application of a Control Chart for Monitoring COVID-19 Outbreaks
In this research, the exponentially weighted moving average (EWMA) control chart is applied to monitor COVID-19 outbreaks [34].To construct the EWMA control chart, the methods for estimation of the expected value and variance of the total number of COVID-19 cases can be calculated by using two methods as follows.
 The delta method [35]: This is based on a Taylor series expansion about independent distance variable s=E(S) for the total number of COVID-19 cases represented by function p=Cexp(ks); i. e. the expected value of p can be expressed as: The estimated expected value of p by using the second derivative  2   2 =  2 exp () is given by:  The sample method: This is based on the sample mean and variance of the total number of COVID-19 cases for sample size n.
The estimated expected value of p with index (i) for the distance is given by: 1 ( ( )) () and the estimated variance of p with index (i) for the distance is given by: Next, the EWMA control chart is applied for monitoring the total number of COVID-19 cases p(s(i)), which are independent and identically distributed random variables.
The EWMA control chart derived from EWMA statistic y with smoothing parameter  ∈ (0,1] can be defined as: where i is the index sequence for the distance. For i=1, 2, 3, … , n, the expected value E(y(i) of EWMA statistic y becomes: For i=1, 2, 3, … , n, the variance  () 2 of EWMA statistic y becomes: Therefore, the three control limits for the EWMA control chart for =1, 2, 3, … , n can be defined as: Center Line (CL)=E(y(i)), and: where d is the width of the control limit.
For monitoring the COVID-19 outbreak by using the EWMA control chart, the first passage time (FPT) [36], which is the first signal outside of the upper or lower control limit of the EWMA control chart.The FPT should be as large as possible to indicate a lower severity of a COVID-91 outbreak.
The FPT based on distance for monitoring the total number of COVID-19 cases which is a decreasing function that can be defined in terms of the infimum (inf) [37] as follows: The FPT based on time (t) for monitoring the total number of COVID-19 cases which is an increasing function that can be defined in terms of the infimum as follows: The summarized process of the methodology is as follows (Figure 4).

3-Results
Results from a simulation study and using the real data described in detail earlier are provided in this section.Table 1 reports the expected values, medians, variances, skewness, and kurtosis for the PDF, CDF, QF, SF, and HF for the total number of COVID-19 cases according to distance with parametersC, k, and m.Furthermore, the sample mean is 30 for positive k=0.5 and the sample mean is 100 for negative k=-0.75 (Figure 5 to 7).The 95% confidence interval (CI) for sample mean 30 in case of positive k=0.5 is (29.50, 30.50).The 95% confidence interval (CI) for sample mean 100 in case of positive k=-0.75 is (99.82,100.18).With increasing distance from the COVID-19 outbreak, the PDF increased for positive k=0.5 (Figure 5-c)) and decreased for negative k=-0.75 (Figure 5-d)).Table 2 reports the statistics estimated by using the model in Equation 2. For example, the expected values of the distance from the outbreak are 12.11, 12.09, and 13.22 for the before, peak, and after outbreak stages, respectively.The proportions of the variance for the total number of COVID-19 cases explained by the distance are measured by using R 2 , which are 0.94, 0.97, and 0.99 for the before, peak, and after peak outbreak stages, respectively.For the error, it found that the RMSPE values are 1.92, 1.85, and 1.49 for the before peak, peak, and after peak outbreak stages, respectively.Meanwhile, the 95% CIs for sample mean 322.88 for the before peak outbreak, 339.50 for the peak outbreak, and 339.49for the after peak outbreak are (319.54,326.24), (336.51,342.49), and (336.24,342.76), respectively.In addition, estimated parameter k was negative in all three stages (-0.0825,-0.0826, and -0.0757 for the before, peak, for after peak outbreak stages, respectively).The estimated C are 1.86e+03, 5.33e+03, and 5.18e+03 for the before peak, peak, after peak outbreak stages, respectively.The variances are 175.31,146.34, and 174.73 for the before peak, peak, after peak outbreak stages, respectively.The skewness is constant at 2.00 for all three stages.The kurtosis is constant at 84.00 for all three stages.Note: CI, confidence interval for the sample mean;Q, quantile function;C, constant; m, the maximum distance; k, the growth rate.
The estimated total number of COVID-19 cases for the Samut Sakhon COVID-19 outbreak with negative k for all three outbreak stages are shown in Figures 8-a   Figures 9 shows the CDF and QF for the distance from the COVID-19 outbreak for all outbreak stages.It can be seen that the CDF converges to one when the distance increases, while the QF according to the probability increases within the interval 0 and 1. Figure 10 shows the SF and HF for the distance from the Samut Sakhon COVID-19 outbreak, in which it can be seen that both decrease with increasing distance.Next, the probability and conditional probability between two distances a and b from the COVID-19 outbreak centre at Samut Sakhon in Thailand for 63 provinces are depicted in Figure 11.The probabilities that the COVID-19 outbreak will spread from Samut Sakhon (index 1: 0 km) to Bangkok (index 3: 33.69 km) are approximately 0.938, 0.938, and 0.922 for the before peak, peak, and after peak outbreak stages, respectively.Moreover, the conditional probabilities that the COVID-19 outbreak will continue after the distance to Samut Prakan (index 4: 35.35 km) are approximately 0.872, 0.872, and 0.882 for the before peak, peak, and after peak outbreak stages, respectively.Moreover, the probability and conditional probability decrease and increase after the peak outbreak stage, respectively.For example, the probability values between index of distance 3 and 1 are 0.938, 0.938, and 0.922 for before peak, peak, and after peak stage, respectively.The conditional probability values between index of distance 3 and 4 are 0.872, 0.872, 0.882 for before peak, peak, and after peak stage, respectively.When monitoring the Samut Sakhon COVID-19 outbreak by using the EWMA control chart, in the before outbreak stage when λ= 0.3 or 0.7, an FPT of 3 was only detected by using the delta method based on distance (Figures 12-a and 12-b) whereas FPTs of 16 and 17 were detected by using the delta and sample methods, respectively, based on time (Figure 13).In the peak outbreak stage, an FPT of 3 was only detected by using the delta method based on distance when λ= 0.3 or 0.7 (Figures 14-a and 14-b, respectively).On the other hand, based on time when λ= 0.3 or 0.7, FPTs of 33 and 37 were detected by using the delta method (Figures 15-a and 15-b, respectively) and 29 and 33 by using the sample method (Figures 15-c and 15-d, respectively.In the after peak outbreak stage, an FPT of 3 when λ= 0.3 or 0.7 was only detected by using the delta method based on distance (Figures 16-a and 16-b, respectively).Meanwhile, based on time when λ= 0.3 or 0.7, FPTs of 50 and 52 were detected by using the delta method (Figures 17-a  The comparisons of the results between this research and the research in the literature are discussed.The results of research in the literature review are from the system of differential equations [18][19][20][21][22][23][24][25][26] with the time independent variable for analyzing and representing the COVID-19 outbreak; however, the results from this research are based on the single differential equation with the distance independent variable and derived it to the probabilistic analysis such as the PDF, CDF, conditional probability.The probabilistic analysis is useful for uncertainty situations, which correspond to the COVID-19 outbreak with uncertainty factors.

4-Discussion
In the literature, much of the research has been on estimating the number of COVID-19 using time as the independent variable.In the present research, the spread of COVID-19 was studied by using distance as the independent variable along with probabilistic analysis of the distance from a COVID-19 outbreak.Simulated and real data of the total number of COVID-19 cases according to the distance from the outbreak in Samut Sakhon province to other provinces along with three outbreak stages (before peak, peak, and after peak) were used in the analysis.
The PDF, CDF, the probability and conditional probability between two distances, QF, SF, HF, the median, the expected value, variance, moment, skewness, and kurtosis of the distance from the COVID-19 outbreak were derived using probability theory.The findings reveal that the estimated growth rates of the total number of COVID-19 cases according to the distance from the Samut Sakhon COVID-19 outbreak for all three outbreak stages when using the forecasting model with distance as the independent variable were negative.This means that at the beginning of the outbreak, people located at a greater distance from the center of the outbreak were less likely to become infected than those located farther away.The dispersion according to the distance from the Samut Sakhon COVID-19 outbreak was analyzed for skewness and kurtosis, which were 2.00 and 84.00, respectively, for all three outbreak stages.This means that the PDFs of the distance from the COVID-19 outbreak for all three stages were positively skewed and were thus leptokurtic (heavy-tailed) functions.The properties of the statistics in the PDFs for all three stages of the outbreak based on the distance from the Samut Sakhon COVID-19 outbreak indicated a negative growth rate (Table 2), which is in accordance with the statistics based on the simulated data (Table 1).Moreover, other functions such as the CDF, QF, SF, and HF according to the distance from the Samut Sakhon COVID-19 outbreak for all three outbreak stages also indicated a negative growth rate (Table 1).
The probability and conditional probability between two distances were evaluated to consider the probability of the COVID-19 outbreak spreading based on distance for all three outbreak stages.For all three stages of the outbreak, when the difference between two distances was large, the probability was high (Figures 11-a to 11-c), and when the difference between two distances was small, the conditional probability was high (Figures 11-d to 11-f).The probability between two distances was calculated by using the area under the PDF curve without the previous outbreak condition, whereas the conditional probability was evaluated with the previous outbreak condition.This implies that the conditional probability should be used to evaluate the probability of the outbreak spreading because it represents spreading from the outbreak center.The probability and conditional probability for the before peak and peak outbreak stages were higher and lower than those of the after peak outbreak, respectively.Application of the EWMA control chart with the distance or time domain for detecting FPTs by using the delta or sample method was proposed for estimating the expected value and variance.For the distance domain, the delta method performed better than the sample method.Furthermore, the delta method detected the FPT at 3 for both values of the smoothing parameter and for all three outbreak stages studied (Figures 12-a  In general, the sample method performed better than the delta method for the time domain.Although the FPTs were 16 and 17 with the delta and sample methods, respectively, in the before peak outbreak stage (Figure 13), they were 33 and 37 with the delta method (Figures15-a and 15-b) and 29 and 33 with the sample method (Figures 15-c and 15-d) for the peak outbreak stage and 50 and 52 with the delta method (Figures 17-a and 17-b) and 46 and 48 with the sample method (Figures 17-c and 17-d) for the after peak outbreak stage when λ = 0.3 or 0.7, respectively.Moreover, the FPT increased when the smoothing parameter value was increased.The findings indicate that the delta method for the EWMA control chart with distance as the independent variable is the most suitable for monitoring the Samut Sakhon COVID-19 outbreak due to detecting the lowest FTP (3).The sample method with time as the independent variable for the EWMA control chart performed reasonably well but still attained quite a large FTP (16 was the smallest).As an illustration, an FPT of 3 for the distance domain is equivalent to that from Samut Sakhon to Bangkok (33.69 km), whereas an FPT of 16 using the time domain is equivalent to January 1, 2021, at which time the first COVID-19 cases were found in Nakhon Pathom (38.62 km), Chachoengsao (87.46 km), Surin (377.3 km), Ang Thong (117.52 km), and Ubon Ratchathani (527.74 km).In other words, the analysis of the COVID-19 outbreak based on distance as the independent variable can detect the out-of-control signal on the EWMA chart much more quickly than with time as the independent variable.However, it is important to note that the proposed forecasting model can effectively forecast the number of COVID-19 cases when there is only one outbreak, for which probability analysis is an effective approach.However, when several outbreaks emerge nationally at around the same time, probability analysis cannot be used effectively because it is not possible to identify the outbreak area in which people were infected or the number of people infected in a given outbreak area.To mitigate this, many outbreaks can be treated as a single outbreak that covers all of Thailand for practical purposes.That is to say, Thailand would become the new center of an outbreak at the regional level.

5-Conclusion
A model was developed to analyze a COVID-19 outbreak in which the rate of spread is dependent on the distance from the center of the outbreak.The important statistics and functions for probabilistic analysis, such as the moment, expected value, variance, PDF, conditional probability, CDF, QF, skewness, kurtosis, SF, and HF, were derived and analyzed.These functions and statistics are useful for advanced statistical analysis, such as inference statistics and Bayesian statistics.Moreover, real data from a COVID outbreak that occurred at the Central Shrimp Market in Samut Sakhon province, Thailand was utilized for this research.Three outbreak stages (before peak, peak, and after peak) were used in a case study.
The findings reveal that the important statistics and functions for probabilistic analysis with real COVID-19 data corresponded with those with simulated COVID-19 data.Negative growth rates were forthcoming with the real data, based on the distance from the Samut Sakhon COVID-19 outbreak.The conditional probability is a suitable approach for analyzing the probability of COVID-19 spreading due to the outbreak based on distance because the current outbreak center is extended by the previous outbreak center.Moreover, the probability and conditional probability of the distance in the before peak and peak outbreak stages were greater than in the after peak outbreak stage.The estimated growth rate for the total number of COVID-19 cases according to the distance from the Central Shrimp Market in Samut Sakhon province meets the maximum negative value at the peak outbreak.The expected value and median meet the maximum value at the peak outbreak.It was found that, for the outbreak center identified at the beginning of the outbreak, the distance of the outbreak is inversely correlated with the probability of COVID-19 infection in both the short and long periods of the outbreak.The EWMA control chart with the delta method for estimating the expected value and variance was more suitable for monitoring a COVID-19 outbreak based on distance for all three outbreak stages than using it with the sample method based on time.In a comparison of using the distance as the independent variable rather than time, the EWMA control chart with the delta method more quickly detected the out-of-control signal for the FPT of the COVID-10 outbreak based on distance than using the sample method based on time.For further research, another application for this research is survival analysis for the COVID-19 outbreak by using the sojourn function based on this research.

6-2-Data Availability Statement
The data presented in this study are available on request from the corresponding author.

6-3-Funding
This work (Grant No. RGNS 64-249) was supported by Office of the Permanent Secretary, Ministry of Higher Education, Science, Research and Innovation (OPS MHESI), Thailand Science Research and Innovation (TSRI) and Pathumwan Institute of Technology.

6-4-Acknowledgements
The authors have to thank Pathumwan Institute of Technology and King Mongkut's University of Technology North Bangkok, Thailand, for support and encouragement during this research.

Figure 1 .
Figure 1.COVID-19 spread based on distance in Thailand.Source: the COVID-19 information center of Thailand and the Center for COVID-19 Situation Administration (CCSA) of Thailand

Figure 2 .
Figure 2. The estimated spreading from the COVID-19 outbreak in Samut Sakhon

Figure 3 .
Figure 3. Distances from Samut Sakhon province to the other provinces in Thailand , ) = 0.Let ( * ,  * ) be the determinant of the partial derivatives of the SSE function such that

StartFigure 4 .
Figure 4.The diagram of the research process

Figure 5 .Figure 6 .Figure 7 .
Figure 5. Simulated total number of COVID-10 cases based on distance (a) for positive k and (b) for negative k and the PDF for the simulated distance from the outbreak (c) for positive k and (d) for negative k to 8-c.The PDF values for the distance from the Samut Sakhon COVID-19 outbreak for all three outbreak stages are presented in Figures 8-d to 8-f.It can be seen that the PDF values are decreasing functions when the distance increases for negative k in all three stages.

Figure 8 .
Figure 8.Estimated and real total numbers of COVID-19 cases and the PDF of the distance from the COVID-19 outbreak in Samut Sakhon for the (a,d) before peak outbreak stage, (b,e) peak outbreak stage, and (c,d) after peak outbreak stage, respectively.

Figure 9 .Figure 10 .
Figure 9.The CDF and QF for the distance from the Samut Sakhon COVID-19 outbreak for the (a,d) before peak outbreak stage, (b,e) peak outbreak stage, and (c,f) after peak outbreak stage, respectively

Figure 11 .
Figure 11.Probability and conditional probability between two distances from the Samut Sakhon COVID-19 outbreak for the (a,d) before peak outbreak stage, (b,e) peak outbreak stage, and (c,f) after peak outbreak stage, respectively . The EWMA control charts using the delta method for estimating the expected values and the variances are illustrated in Figures12-ato 17-a and Figures 12-b to 17-b, respectively, while those using the sample method are illustrated in Figures 12-c to 17-c and Figures 12-d to 17-d, respectively.A smoothing parameter (λ) value of 0.3 for the EWMA statistic is used in Figures 12-a to 17-a and Figures 12-c to 17-c while 0.7 is used in Figures 12-b to 17-b and Figures 12-d to 17-d.

Figure 12 .Figure 13 .Figure 14 .Figure 15 .Figure 16 .Figure 17 .
Figure 12.The EWMA control chart performances for the before peak outbreak stage of the Samut Sakhon COVID-19 outbreak according to distance using (a,b) the delta method and (c,d) the sample method and 17-b, respectively) and 46 and 48 by using the sample method (Figures17-cand 17-d, respectively).

Table 1 . The statistics for the simulated total number of COVID-19 cases outbreak data according to distance
Note: CI, confidence interval for the sample mean; Q, quantile function; C, constant; m, the maximum distance; k, the growth rate.