Bayesian Confidence Intervals for Coefficients of Variation of PM10 Dispersion

Herein, we propose the Bayesian approach for constructing the confidence intervals for both the coefficient of variation of a log-normal distribution and the difference between the coefficients of variation of two log-normal distributions. For the first case, the Bayesian approach was compared with large-sample, Chi-squared, and approximate fiducial approaches via Monte Carlo simulation. For the second case, the Bayesian approach was compared with the method of variance estimates recovery (MOVER), modified MOVER, and approximate fiducial approaches using Monte Carlo simulation. The results show that the Bayesian approach provided the best approach for constructing the confidence intervals for both the coefficient of variation of a log-normal distribution and the difference between the coefficients of variation of two log-normal distributions. To illustrate the performances of the confidence limit construction approaches with real data, they were applied to analyze real PM10 datasets from the Nan and Chiang Mai provinces in Thailand, the results of which are in agreement with the simulation results.

The coefficient of variation is defined as the standard deviation divided by the mean. It has been used as a measure of precision within and between data series. For example, Tsim et al. [18] used the coefficient of variation to analyze blood samples taken from different laboratories. Faupel-Badger et al. [19] used the coefficient of variation to compare the concentrations of an estrogen metabolite measured using two methods. In environmental studies, the coefficient of variation has often been used for measuring the daily air quality [20]. Pollution levels in different areas can be compared using the coefficient of variation: larger coefficient of variation values indicate greater dispersion whereas smaller ones indicate lower risk. The difference between the air pollution levels in two areas can be analyzed by comparing their coefficients of variation.
Niwitpong [21] proposed confidence intervals for the coefficient of variation of log-normal distribution with restricted parameter space. Nam and Kwon [11] and Hasan and Krishnamoorthy [12] constructed the confidence intervals for the ratio of coefficients of variation of two log-normal distributions. Thangjai et al. [22] proposed Bayesian confidence intervals for the coefficient of variation and the difference between the coefficients of variation of two normal distributions. For k coefficients of variation, Ng [23] estimated confidence intervals for the common coefficient of variation of log-normal populations. Thangjai et al. [10] presented simultaneous confidence intervals for the differences between the coefficients of variation of log-normal distributions. Nam and Kwon [11] proposed the method of variance estimates recovery (MOVER) approach for constructing a confidence interval for the ratio of coefficients of variation of two log-normal distributions, which Hasan and Krishnamoorthy [12] later modified. Although the MOVER approach is easy to compute using an exact formula, it is based on the initial confidence interval for a single parameter of interest. In addition, Hasan and Krishnamoorthy [12] presented the approximate fiducial approach for constructing the confidence interval for the ratio of coefficients of variation. Although this approach was very simple, it was based on simulated data.
In statistics, classical and Bayesian inference are fundamentally different. In classical inference, the parameter is unknown but fixed and its value is based on the observed values in a sample. The Bayesian approach uses a prior distribution based on the experimenter's belief that is updated with the sample information. Subsequently, the posterior distribution is used to update the prior by using Bayes' rule. In this study, the Bayesian approach was applied for constructing confidence intervals for the coefficient of variation and the difference between the coefficients of variation of log-normal distributions. The Bayesian approach is based on combining the likelihood function and the prior distribution. Depending on the choice of prior distribution, we show that the Bayesian approach has equal or better coverage accuracy and shorter average lengths than the classical approaches.
The rest of this article is organized as follows. In Section 2, confidence intervals for the coefficient of variation of a log-normal distribution are presented, while those for the difference between the coefficients of variation of log-normal distributions are given in Section 3. In Section 4, the results of simulation studies are presented. PM10 datasets from Nan and Chiang Mai provinces are used to illustrate the performances of the confidence intervals in Section 5, and concluding remarks are presented in Section 6.

2-Confidence Intervals for the Coefficient of Variation
If = ln( ) follows a normal distribution with mean and variance 2 , then follows a log-normal distribution with parameters and 2 . The mean and variance of are = exp( + ( 2 /2)) and 2 = (exp( 2 ) − 1)(exp(2 + 2 )), respective. The coefficient of variation of , is defined as a ratio of standard deviation and mean of , is expressed as = √exp( 2 ) − 1. Since the coefficient of variation of the log-normal distribution depends on parameter 2 only. While the coefficient of variation of the normal distribution, is defined as / , depends on mean and variance 2 .
Let ̅ and 2 be unbiased estimators of and 2 , respectively. Also, let ̅ and 2 be observed values of ̅ and 2 , respectively.

2-1-Classical Confidence Intervals for the Coefficient of Variation
In this section, there are three approaches for interval estimation of the coefficient of variation. The three approaches are large-sample, chi-squared, and approximate fiducial approaches.

2-2-Bayesian Confidence Interval for the Coefficient of Variation
In classical approach, the parameter is unknown, but it is fixed. Let 1 , 2 , … , be an random sample from population indexed by parameter . Let 1 , 2 , … , be observed values of 1 , 2 , … , . Then the value of is known. In Bayesian approach, the parameter is considered to be a quantity. The prior distribution, is based on the experimenter's belief, is updated with the sample information. The posterior distribution is updated prior with the use of Bayes' Rule: see Casella and Berger [24].
Let ̅ be the sample mean and let 2 be the sample variance. Also, ̅ and 2 are the observed values of ̅ and 2 , respectively.
In this paper, a random sample 1 , 2 , … , is drawn from a normal population with parameter = ( , 2 ). The likelihood function is: The logarithm of the likelihood can be written as Therefore, the Fischer information matrix is; Following the Fischer information matrix, the Jeffreys Independence prior is; In this paper is interested in the coefficient of variation. Then a flat prior for coefficient of variation is defined by; Hence, the Jeffreys Independence prior is defined by; Let | 2 , be the conditional posterior distribution for given 2 and . The | 2 , is normal distribution with mean ̂ and variance 2 / . It can written as; Furthermore, let 2 | be the posterior distribution for 2 given . It is inverse gamma distribution with shape parameter ( − 1)/2 and scale parameter ( − 1) 2 /2. That is; The posterior distribution is used to make statements for parameter which is considered a random quantity. Then the posterior distribution of the coefficient of variation can be used as a point estimate of coefficient of variation. It is denoted by: Where 2 is simulated through Monte Carlo simulation from the posterior distribution defined as in Equation 16.
A smallest confidence interval with a specified coverage probability can be obtained using the Bayesian criteria. A highest posterior density region is used to obtain the shortest confidence interval. The posterior density region consists of the values of the parameter which is highest. Therefore, the 100(1 − )% Bayesian confidence interval for the coefficient of variation is defined as; Where . and . are the lower and upper limits of the highest posterior density region of , respectively.
Since the Bayesian confidence interval in Equation 18 can be estimated using a computational procedure given in the following algorithm.

Algorithm 1.
Step 1: Generate Step 2: Calculate the value of BS  as given in Equation 17; Step 3: Repeat the step 1 -step 2 for times and obtain Step 4: Calculate . and . from the 100( /2)-th and 100(1 − /2)-th percentiles of The coverage probabilities and average lengths of the confidence intervals for the coefficient of variation can be approximated via Monte Carlo simulations using the following algorithm.
Algorithm 2. For a given , , , and : Step 1: Step 2: Calculate x and 2 ; Step Step 8: If Step 9: Calculate Step 10: Repeat the step 1 to step 9 for a large number of times (say, times) and calculate coverage probability

3-Confidence Intervals for Difference between the Coefficients of Variation
Assume that 1 = ln( 1 ) is a random sample of size 1 from normal distribution with mean 1 and variance 1 2 . Since ̅ 1 is the sample mean and 1 2 is the sample variance. Also, 2 = ln( 2 ) is a random sample of size 2 from normal distribution with mean 2 and variance 2 2 . And ̅ 2 and 2 2 are the sample mean and sample variance, respectively. The coefficients of variation of 1 and 2 are defined as 1 = √exp( 1 2 ) − 1 and 2 = √exp( 2 2 ) − 1. Therefore, the difference between the coefficients of variation is

3-1-Classical Confidence Intervals for Difference between the Coefficients of Variation
Three approaches are used to construct the confidence intervals for the difference between the coefficients of variation of log-normal distributions.

3-1-1-MOVER Confidence Interval for Difference between the Coefficients of Variation
The variance estimate of ) ln( given in Nam and Kwon [11] is used. The estimator of difference between the coefficients of variation is obtained by; and Where 1− /2 denotes the 100(1 − /2) -th percentile of the standard normal distribution, The lower and upper limits of the confidence interval for the difference between the coefficients of variation based on MOVER approach are; and 2 2 2 Therefore, the 100(1 − )% MOVER confidence interval for difference between the coefficients of variation is defined as;

3-1-2-Modified MOVER Confidence Interval for Difference between the Coefficients of Variation
Suppose that the coefficient of variation estimators of 1 and 2 are defined as ̂1 = √exp(̂1 2 ) − 1 and ̂2 = √exp(̂2 2 ) − 1, where ̂1 2 = 1 2 and ̂2 2 = 2 2 . This paper is interested in the difference between the coefficients of variation. The estimator of difference between the coefficients of variation of 1 and 2 is ̂=̂1 −̂2.
The concept of the MOVER approach is modified to construct the confidence interval for difference between the coefficients of variation. It is called the modified MOVER approach. Using the Equation 4, the confidence intervals for the coefficient of variation of 1 and Applying the MOVER approach, the lower and upper limits of the confidence interval for the difference between the coefficients of variation based on modified MOVER approach are Where MMOVER L .

3-1-3-Approximate Fiducial Confidence Interval for Difference between the Coefficients of Variation
Suppose that 5 0.
T ,  T , and   1 T are the modified normal based approximations given in Krishnamoorthy [26]. For = 1, 2, the approximations for the coefficient of variation of are defined by; ,  Applying Hasan and Krishnamoorthy [12] and the MOVER approach, the lower and upper limits for difference between the coefficients of variation are obtained by; Therefore, the 100(1 − )% approximate fiducial confidence interval for difference between the coefficients of variation is defined as;

3-2-Bayesian Confidence Interval for Difference between the Coefficients of Variation
The logarithm of the likelihood is; From the logarithm of the likelihood, the Fischer information matrix is; Therefore, the Jeffreys Independence prior is; The Jeffreys Independence prior is; The joint posterior distribution for 1  , 2  , 2 1  , and 2 2  is given by Therefore, the conditional posterior distribution for given 2 and is given by; Where = 1, 2.
The posterior distribution for 2 given is obtained by; Where = 1, 2.
The posterior distribution for the difference between the coefficients of variation is denoted by; Where . and . are the lower and upper limits of the highest posterior density region of , respectively.
The computational procedure for constructing the Bayesian confidence interval in Equation 46 is presented in Algorithm 3.

4-Simulation Studies
A simulation study with 10,000 replications and 2,500 repetitions of the Bayesian calculation was conducted to estimate the performance of the confidence intervals based on the Bayesian method and existing confidence intervals. A comparison of their performances in terms of coverage probability and average length is presented in Figure 1. A coverage probability of greater than or equal to the nominal confidence level of 0.95 and the shortest average length are the criteria for the best-performing confidence interval. Page | 148 For the coefficient of variation of a single log-normal distribution, a random sample with sample size n was generated from a normal distribution with mean = 1 and standard deviation = 0.1, 0.7, 1.2, and 1.6; the results are given in Table 1. For a large sample size, the coverage probabilities of the confidence intervals of Nam and Kwon [11] and Thangjai et al. [10] were close to 1.00 and 0.95, respectively, for ≤ 1.2 and both were less than the nominal confidence level of 0.95 for > 1.2. Although the Bayesian, Chi-squared, and approximate fiducial confidence intervals provided coverage probabilities close to the nominal confidence level of 0.95 for all values of , the Bayesian confidence interval obtained the shortest average length in all cases. Therefore, the Bayesian confidence interval was better than the other methods for constructing the confidence intervals for the coefficient of variation of a log-normal distribution. The coefficient of variation of log-normal distribution depends on the value of 2 only and is independent of . To simplify matters, the population means ( 1 , 2 ) were given the same value ( 1 = 2 = 1) when testing the confidence intervals for the difference between the coefficients of variation of log-normal distributions. Four confidence intervals constructed using the MOVER, modified MOVER, approximate fiducial, and Bayesian approaches were compared (Tables 2 and 3). The results indicate that the MOVER confidence interval was conservative since the coverage probabilities were in the range from 0.98 to 1.00 for ( 1 , 2 ) = (0.1,0.3), (0.1,0.7), (0.3,0.7), (0.3,0.9), and (0.4,1.2). However, the MOVER confidence interval performed better than the others when the sample sizes were small and the values of ( 1 , 2 ) were large. Meanwhile, the Bayesian confidence interval performed better than the others in terms of the coverage probability and average length.

5-Empirical Application
In this section, the performances of the existing and Bayesian confidence intervals were compared using real datasets. The Bayesian confidence interval was first computed using 2,500 repetitions via Monte Carlo simulation. The datasets comprising PM10 data from 24 March 2019 to 17 April 2019 reported by the Pollution Control Department for the Nan and Chiang Mai provinces in Thailand are given in Table 4 and statistics based on them are summarized in Table 5. Figures 2 and 3 show histograms and normal QQ-plots of the data, respectively. The Akaike Information Criterion (AIC) results in Table 6 indicate that the datasets can be fitted to log-normal distributions.
For the coefficient of variation of a log-normal distribution, the PM10 data for the Nan province had a coefficient of variation for the log-normal distribution of ̂= 0. The results show that the confidence intervals in both scenarios covered the population coefficient of variation and the difference between the population coefficients of variation, respectively, with the Bayesian confidence interval having the shortest length in both cases. In concert with the simulation results, the Bayesian confidence interval can be suggested for constructing the confidence intervals for the coefficient of variation and the difference between the coefficients of variation of log-normal distributions.  224  134  138  148  190  227  170  164  105  128   145  232  136  144  127  156  262  167  112  103   114  199  100  155  116  138  146  166  123  94   107  176  90  178  126  125  191  142  139  96   80  130  126  254  113  184  117  138  98 Source: Pollution Control Department (http://air4thai.pcd.go.th/webV2/download.php).

6-Discussion
Classical and Bayesian inference are fundamentally different in statistics, and we evaluated the performances of the confidence intervals for the coefficient of variation and the difference of the coefficients of variation of log-normal distributions using both approaches. For the coefficient of variation of a log-normal distribution, four classical confidence intervals constructed via two large-sample approaches based on the two variances definitions of Thangjai et al. [10] and Nam and Kwon [11], the Chi-squared approach of Niwitpong [21], and the approximate fiducial approach of Hasan and Krishnamoorthy [12]. The classical confidence intervals were derived use formulas whereas the Bayesian approach was based on a simulation technique.
For the difference between the coefficients of variation of log-normal distributions, the three classical confidence intervals: MOVER by using the variance of Nam and Kwon [11], modified MOVER by using the Chi-squared approach of Niwitpong [21], and the approximate fiducial approach with modified normal-based approximations were compared with the Bayesian confidence interval. The results in this investigation were similar to those of Harvey and van der Merwe [7], Rao and D'Cunha [9], Thangjai and Niwitpong [14], and Thangjai et al. [22].

7-Conclusion
In this study, the Bayesian approach was used to construct confidence intervals for the coefficient of variation of a log-normal distribution and the difference between the coefficients of variation of two log-normal distributions, both of which were then compared with several classical approaches. For the first scenario, although the coverage probabilities of all of the confidence intervals were close to the nominal confidence level, the Bayesian confidence interval provided the shortest average length in all cases. For the difference between the coefficients of variation of two log-normal distributions, the Bayesian confidence interval was once again the best in terms of the coverage probability and average length.
The performances of all of the approaches were appraised by application to real PM10 data from the Nan and Chiang Mai provinces in Thailand. As with the results of the simulation study, the Bayesian approach was better than the others in terms of average length. Therefore, the Bayesian approach is recommended for constructing the confidence intervals for the coefficient of variation and the difference between the coefficients of variation of log-normal distributions.

8-2-Data Availability Statement
The data presented in this study are available on request from the corresponding author.

8-3-Funding
This research was funded by Faculty of Applied Science, King Mongkut's University of Technology North Bangkok. Grant No. 641079.

8-4-Conflicts of Interest
The author declares that there is no conflict of interests regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.