Confidence Intervals for the Coefficient of Quartile Variation of a Zero-inflated Lognormal Distribution

There are many types of skewed distribution, one of which is the lognormal distribution that is positively skewed and may contain true zero values. The coefficient of quartile variation is a statistical tool used to measure the dispersion of skewed and kurtosis data. The purpose of this study is to establish confidence and credible intervals for the coefficient of quartile variation of a zero-inflated lognormal distribution. The proposed approaches are based on the concepts of the fiducial generalized confidence interval, and the Bayesian method. Coverage probabilities and expected lengths were used to evaluate the performance of the proposed approaches via Monte Carlo simulation. The results of the simulation studies show that the fiducial generalized confidence interval and the Bayesian based on uniform and normal inverse Chi-squared priors were appropriate in terms of the coverage probability and expected length, while the Bayesian approach based on Jeffreys' rule prior can be used as alternatives. In addition, real data based on the red cod density from a trawl survey in New Zealand is used to illustrate the performances of the proposed approaches.

There have been many studies involving lognormal distributions that are zero-inflated, such as diagnostic test charging [18], the concentration of air contaminants (airborne chlorine) at an industrial site in the United States [19], fish spotter data in fisheries research [20], red cod densities from trawl surveys in New Zealand [4,5], and rainfall measurements [21,22]. Furthermore, many researchers have constructed confidence intervals for the parameters of zeroinflated lognormal distributions. Zhou and Tu [23] suggested percentile-t bootstrap interval and likelihood approaches to establish the confidence intervals for the mean of diagnostic test charging data that included zero values. Tian [24] recommended constructing confidence intervals using the concept of generalized variables for the mean. Tian and Wu [25] established confidence intervals for the mean based on adjusted signed log-likelihood ratio statistics, the results of which showed that the proposed approach was suitable in all cases examined. Fletcher [4] proposed three approaches: Aitchison's estimator, a modification of Cox's approach, and a profile likelihood interval to establish confidence intervals for the mean; the profile likelihood interval was the best in terms of the coverage probability. Wu and Hsieh [5] compared the GCI approach to construct confidence intervals for the mean with Aitchison's, modified Land's, profile likelihood, maximum likelihood, and bootstrap approaches and found that GCI was appropriate in all cases. Li et al. [26] presented generalized pivotal quantity and fiducial approaches for interval estimation of the mean; their results demonstrate that the fiducial approach was suitable under all circumstances examined. Recently, Hasan and Krishnamoorthy [27] established confidence intervals for the mean based on the fiducial and MOVER approaches and reported that the former performed the best for small sample sizes. Moreover, Maneerat et al. [22,28,29] recommended Bayesian approaches for estimating the confidence intervals for the mean and its functions. Likewise, Yosboonruang et al. [21,30] also recommended that Bayesian approaches are reasonable for establishing confidence intervals for the CV and the difference between the CVs of zero-inflated lognormal data.
So far, there have not been any published reports focusing on the CQV for a highly skewed zero-inflated lognormal distribution. Herein, we propose fiducial and Bayesian approaches to construct confidence intervals for the CQV of a zero-inflated lognormal distribution. We also report the results of their efficacies via simulation and empirical studies followed by a discussion and conclusions on the research.

2-Methods
= ( , , … , , … , ) is a non-negative random sample from a zero-inflated lognormal distribution. Suppose that the non-zero observations follow a lognormal distribution denoted by = ln( )~ ( , ) for = 1, 2, … , , and the zero observations follow a binomial distribution with parameter = ( − )/ , then the probability of nonzero observations is = / . The probability density function for a zero-inflated lognormal distribution can be expressed as; As proposed by Aitchison [31], the mean and variance of X can be written as: Respectively. In this study, we focus on the CQV based on the lower and upper quartiles (Q1 and Q3), which is defined as: where Q1 and Q3 are the 25 th and 75 th percentiles, respectively.
According to Hasan and Krishnamoorthy [27], the quartiles can be defined as; where 4 1 p r     and  is the standard normal distribution function. Therefore, we can respectively represent Q1 and Q3 in Equation (1) as: And; In the following sub-sections, we present the methods used to establish the confidence interval for the CQV of a zeroinflated lognormal distribution.

2-1-The FGCI Method
The idea of fiducial inference was first suggested by Fisher [32], after which Hannig et al. [12] illustrated methods to construct the fiducial generalized pivotal quantity (FGPQ). Subsequently, Hannig [33] used Fisher's fiducial concept to prove and express a generalized fiducial recipe that is a generalization of the FGPQ. Recall that X is a zero-inflated lognormal distribution. From Equation 1, the parameters of interest are , , and , and thus their FGPQs are required. According to Li et al. [26] and Hasan and Krishnamoorthy [27], we can express the respective FGPQs for , , and as; And; Subsequently, following Equations 2, 3, and 7, the FGPQ for the CQV can be expressed as; 8. Repeat 1-7 10,000 times.

2-2-The Bayesian Approach
Since is a mixed distribution comprising lognormal and binomial distributions, the joint likelihood function can be written as; This leads to obtaining the Fisher information matrix by using the second derivative of the log-likelihood function as follows: Three different priors to construct credible intervals are present as follows.

Jeffreys' Rule Prior
According to the Fisher information matrix, Jeffreys' rule priors for unknown parameters from binomial and lognormal distributions are derived from the square root of the determinant of the Fisher information matrix. Therefore, the Jeffreys' rule prior for ′ from a binomial distribution by following Harvey and van der Merwe's method [34] is . For parameter from a lognormal distribution, the Jeffreys' rule prior becomes ( ) = 1/ , and so the Jeffreys' rule prior density function for a zero-inflated lognormal distribution is; By combining the likelihood function in Equation 8 and the prior density function in Equation (9), we can express the joint posterior density function as; Moreover, we can obtain the respective posterior distribution of   ,  , and 2  by using the integral of the function in Equation 10 with respect to the others as follows; and Thus, we can obtain The Uniform Prior Since the uniform prior density function for binomial and lognormal distributions is constant [35,36], then it must be so for a zero-inflated lognormal distribution (i.e.,  

The Normal Inverse Chi-squared Prior
The normal inverse Chi-squared prior was first proposed by Maneerat et al. [22] to establish the highest posterior density interval for the delta-lognormal mean. By following this concept, the joint posterior density function can be expressed as Accordingly, by using the integral in the same way as for Equations 11 to 13, the respective posterior distributions of   ,  , and 2  are beta, Student's t, and inverse Chi-squared distributions defined as respectively. 7. Repeat 1-6 10,000 times.
The research methodology of this study is shown in Figure 1.

3-1-Simulation Studies
The coverage probabilities and expected lengths of the confidence intervals under different scenarios were investigated via Monte Carlo simulation using the R statistical program. A confidence interval with a coverage probability that is equal to or greater than the nominal confidence level of 0.95 together with the minimum expected length shows the best performing method for a particular set of criteria. The proposed FGCI and Bayesian approaches based on Jeffreys' rule, uniform, and normal inverse Chi-squared priors were tested. For this simulation, we defined the sample sizes (n) as 15, 30, 50,   The FGCI and Bayesian approaches were replicated 2,000 times for 10,000 simulation runs. The results in Table 1 report that the coverage probabilities of FGCI were greater than or close to the nominal confidence level of 0.95 for all cases when ≥ 0.85. Likewise, the performances of the Bayesian methods based on the uniform and normal inverse

Stop
Chi-squared priors were similar to that of FGCI in terms of coverage probability for = 0.85 and 0.90, while for = 0.95, the coverage probabilities of the Bayesian-uniform prior method were greater than the others and close to the target for all cases of ≥ 30, which is the same as the results for the Bayesian-Jeffreys' rule prior method when = 0.90. Moreover, the coverage probabilities were greater than 0.95 for the Bayesian-normal inverse Chi-squared prior method when = 0.95 and = 50 or 100, and for Bayesian-Jeffreys' rule prior method when = 0.85 or 0.95 and = 50 or 100. For = 0.80, the coverage probabilities were greater than 0.95 when = 10 for the Bayesian-Jeffreys' rule prior method and = 5 or 10 for the Bayesian-uniform prior and Bayesian-normal inverse Chi-squared prior methods for all sample sizes, as was FGCI for = 100 only.
In terms of the expected length, the Bayesian-normal inverse Chi-squared prior method had the shortest expected lengths for almost all cases for = 0.80 or 0.85 except for = 0.85, = 100 and ≤ 3, for which the Bayesian-Jeffreys' rule prior method had the shortest expected lengths. For = 0.90, the expected lengths of the Bayesian-normal inverse Chi-squared prior method were shorter than the others when ≥ 2 together with 15 n  or 30. Likewise, the Bayesian-Jeffreys' rule prior method had the shortest expected lengths for = 50 or 100 and ≤ 5 whereas the Bayesian-normal inverse Chi-squared prior method had the shortest expected lengths for the other cases. For 0.95   , the expected lengths of FGCI were shorter than the others when = 15 or 30 for almost all cases while the Bayesiannormal inverse Chi-squared prior method had the shortest expected lengths for 50 n  and ≥ 3. Moreover, the expected lengths of the Bayesian-Jeffreys' rule prior method were shorter than the others for 100 n  and = 2, 3, or 5, while the Bayesian-normal inverse Chi-squared prior method had the shortest expected lengths for = 100 and = 1 or 10. However, the expected lengths of the FGCI and Bayesian methods were slightly different. Summaries of the coverage probabilities and expected lengths of the proposed methods are shown in Figures 2 and 3, respectively. However, the expected lengths of the FGCI and Bayesian methods were not different. Summaries of the coverage probabilities and expected lengths of the proposed methods are shown in Figures 2 and 3, respectively.

3-2-An Example Using Real Data
To illustrate the performance of the proposed approaches, we used the data on red cod density (kg/km 2 ) from a trawl survey implemented by the National Institute of Water & Atmospheric Research in New Zealand. This dataset contained data from 67 trawls of which 13 records had no red cod. The remaining 54 trawls were as follows [ The data are positively skewed and the log-transformation creates a normal distribution, as shown in Figures 4 and  5, respectively. The minimum Akaike information criterion (AIC) was used to test the distribution of the positive data, the results of which in Table 2 indicate that it is a lognormal distribution. Accordingly, the red cod densities from 67 trawls conformed to a zero-inflated lognormal distribution. The summary statistics for this data are = 67, = 54, = 0.81, ̂= 4.8636, = 1.4854, and ̂= 0.8348. The 95% confidence intervals and credible intervals for the CQV of the red cod densities are reported in Table 3. Similar to the findings of the simulation study, the Bayesianuniform prior method performed the best in term of the interval length.

4-Discussion
As mentioned earlier, the CQV can be used to measure the dispersion of highly skewed distributions with kurtosis, such as a zero-inflated lognormal distribution. Interval estimation methods for the CQV of a zero-inflated lognormal distribution by using FGCI and the Bayesian approach are proposed herein. In the present study, it was found that when the probability of non-zero values was 0.80 for large variances, the coverage probabilities of the Bayesian priors were close to or greater than the target. This is due to the posterior distributions of the mean and variance being higher than the variance estimator together with   being nearly equal to p in Equation 2, all of which affect obtaining the CQV estimator to generate the quantile functions. FGCI performed well for large sample cases only because the FGPQ values for the parameters depend on the sample size of non-zero values. In cases where the probability of non-zero values was greater than 0.80, the coverage probabilities of FGCI were stable and close to the target for all cases. This was similar to the performances of the Bayesian method based on uniform and normal inverse Chi-squared priors where the probability of non-zero values was equal to 0.85 or 0.90. In addition, the performance of the Bayesian-Jeffreys' rule prior method was good for almost all cases with a large sample size or a large variance together with a small sample size. The results of the empirical study using red cod density data from a trawl survey in New Zealand were the same as those of the simulation study in that the Bayesian-uniform prior method is suitable due to the lower and upper bounds covering the CQV of that dataset together with the shortest widths.

5-Conclusion
Herein, we proposed approaches based on FGCI and the Bayesian method to establish confidence intervals for the CQV of a zero-inflated lognormal distribution. The Bayesian approach comprised three priors: Jeffreys' rule, uniform, and normal inverse Chi-squared. Since the CQV is appropriate for extremely skewed data, the red cod data from a trawl survey was used to evaluate the performance of the proposed approaches. The simulation studies show that the confidence intervals constructed with the FGCI and the Bayesian-uniform prior methods performed well for cases with a high proportion of non-zero values since the coverage probabilities were greater than or closed to the target. The performance of the Bayesian-normal inverse Chi-squared prior method was similar to the FGCI and Bayesian-uniform prior methods except for cases with a high proportion of non-zero values together with a small sample size. Therefore, the FGCI and Bayesian methods based on uniform and normal inverse Chi-squared priors are suitable for constructing confidence intervals for the CQV of a zero-inflated lognormal distribution. Last, the Bayesian-Jeffreys' rule prior method is suitable for large sample sizes and a few cases with a large variance.

6-2-Data Availability Statement
The dataset of red cod densities, taken from a fisheries trawl survey in New Zealand, were compiled in a research article by Fletcher (2008) as https://doi.org/10.1007/s10651-007-0046-8.

6-3-Funding
The second author is grateful to King Mongkut's University of Technology North Bangkok. Grant No. KMUTNB-FF-65-22.

6-4-Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.