The Quality of Urban Air in Barcelona: A New Approach Applying Compositional Data Analysis Methods

The main goal of this paper is to go some steps further to improve the understanding and manageability of air quality. Quality of atmospheric air in large cities is a matter of great importance because of its impact on the environment and on the health of the population. Recently, measures restricting access of private vehicles to the centre of large cities and other measures to prevent atmospheric air pollution are currently topical. The knowledge of air quality acquires special relevance to be able to evaluate the impact of those great social and economic measures. There are many indices to express air quality. In fact, quite every country has its own, depending on the main pollutants. In general, all indices ignore the compositional nature of the concentrations of air pollutants and do not apply methods of Compositional Data Analysis and have some other weak points such as leak of standardized scale. Therefore, the methodology used is founded on Compositional Data Analysis. The air quality index has an adequate correlation between input (concentrations) and output (air quality index), it distinguishes between air pollution and air quality and it has a 0-100 reference scale which makes easier interpretation and management of air quality expression. To illustrate the proposed method, an application is made to a series of air pollution data (Barcelona, 2001-2015). The results show the effectiveness of the 2008 European directive on ambient air quality.


1-Introduction
Air pollution in cities, mainly in large cities or densely populated areas, is a burning issue that concerns citizens because of its impact on daily life and its consequences on people's health [1][2][3][4][5][6]. More than a half of the planet's inhabitants live in urban areas. This is the reason why in large cities the quality of the environment in general and the air in particular is a problem that deserves special attention. Implications of atmospheric air pollution go far beyond the health of people and affect the environment in general, the economy and the future of life on our planet [7]. Thus, the increasing demand on air quality makes it a point in urban management: cities must take measures to guarantee that air quality is at adequate levels to avoid affections on the health of population. It is very important, therefore, to have an adequate methodology to quantify the expression of atmospheric air quality to help decision makers to control it correctly.
Atmospheric air pollution is usually expressed by a numerical value called "Air Quality Index" (AQI). This index is obtained from concentrations of some air pollutants, usually: O3, CO, NO2, SO2 and suspended particulate matter of certain size or diameter: lower than 2.5 microns (PM2.5) and lower than 10 microns (PM10). Presence of pollutants and particulate matter is expressed by their concentration in units of mass relative to a total volume unit of air usually μg/m 3 . There are several methodologies to express the quality of atmospheric air from the concentrations of air pollutants. The best known is the proposal by the USA Environmental Protection Agency (EPA), which is based on a piecewise linear function that transforms the concentrations into AQI values in a certain scale. However, all the methodologies applied until now, handle concentrations of air pollutants ignoring their compositional nature and therefore committing some errors, such as calculating arithmetic averages [8][9][10].
Although there are other authors who use methodology that has to do with indexes [11,12], we believe that our proposition based on compositional data is more consistent with the nature of the kind of data under study. The most recent proposal for an Air Quality Index that takes into account the compositional nature of the data is, as far as we know, the one developed in [10]. The proposal is based on the concept of logcontrast and an air quality index is defined (AQI*) as a function of the geometric mean of concentrations of six air pollutants (O3, NO2, CO, SO2, PM10, PM2.5). The index is scaled from zero to 100 using a proportionality factor, according to concentration's values.
In the present work, an improvement of that model is proposed, taking into account the compositional nature of the air pollutants concentrations, that is, applying Compositional Data Analysis methods. One of the purposes is giving an index that makes it clear that air pollution and air quality mean opposite things, which seems not so clear in most of the existing AQI. At the same time, this index establishes a different slope variation (derivative) in the low pollution zone and the high pollution zone, which allows for a better discrimination in low polluted areas. In addition to a global index of air quality, the proposal is giving an individual index for each of the pollutants, which makes it possible the detection of possible dangerous individual pollutant levels, that otherwise could keep unnoticed in a global index. At last, to help decision makers using it in a reliable, simple and adequate way, the new index has a natural scale to express the values of air quality. Figure 1 shows the improvements offered by our proposed air quality index.

2-Definition of Air Quality Star Index
Air pollution data are given usually as a real coefficient (N, D+1)-matrix M, where D stands for the number of pollutants and +1 for the period time. Therefore, its k-row can be expressed as: (1) Being t k the k-time period (day, week, month, …), k = 1,2,…,N, N the number of time periods and the concentration of ith air pollutant at time tk (units are usually µg/m3). The most significant air pollutants for their impact on people health in urban surroundings are O3, NO2, PM10 and PM2.5 [13][14][15]. Therefore, in this work we will consider four air pollutants (D = 4) and other components of air grouped in the so-called "residual component". Then, the k-th row we take is A logcontrast (LC) is defined as a linear combination of logarithms of parts with coefficients adding up to zero [16]. If a logcontrast is computed considering air pollutants and the residual component, for any time t k , k=1,2,…,N, put, then taking into account properties of logarithmic function, the equation we have can be written as: The filling-up value of air residual component is almost never reported, its computation is very difficult and the second term in the right-hand of Equation 3 is almost constant ( [10]). Therefore, air pollution can be expressed using the considered air pollutants with Equation 3 or its approximation   The exponents can be used to highlight the impact of an air pollutant in particular, according to its impact on the population health or other criteria. In this paper, the values adopted are ( [13]): The β exponent allows applying for a nonlinear model and, thus, having a different shape of the pollution curve in the low pollution zone with respect to the high pollution zone. The value of the exponent used in this work is β = 1.25, as it has shown to be the best discriminant (see Figure 2). The multiplicative factor K global is introduced for ranging air pollution index values in a scale from zero to 100. The value of the multiplicative factor K global depends on the value of β exponent, as well as on the maximum pollutant' concentrations adopted as maximum admissible pollution [14,17]. An example is shown in Table 1 and the corresponding value of K global is 100/223.8.
Observe that API is increasing when air pollutant concentrations increase, that is, the trend of the values of this index is consistent with its name.
Once API has been defined, the definition of the Air Quality Index becomes simple: Observe that AQI is decreasing when API is increasing, that is, it corresponds to the idea that to lower value of the AQI*, the lower quality of the atmospheric air, and vice versa, according to the intuitive significance of the word "quality".
The air pollution index calculated with those pollutants is an overall index; an average value that might not reflect possible episodes of high values of a given pollutant. Therefore, it is interesting to be able to calculate individual index. Thus, for a given air pollutant its own pollution index is defined through: Then using all information stated above, we use a colour code when giving AQI* values. Table 1 shows colour codes for individual AQI* values as well as for global, and their concentration breakpoints.

Table 1. Concentration breakpoints for each considered air pollutant; in bold, maximum values assigned. Column f indicates the result according to Equation 4 and exponents are at the last row. In last column, global AQI* values and
corresponding color codes.

3-Application to Barcelona Urban Area 2001-2015
Barcelona is a city and metropolis on the Mediterranean coast of the Iberian Peninsula. It is the capital of Catalonia, and the second city in population and economic weight of the Iberian Peninsula, after Madrid.

3-1-Data Set
The data set consists of a monthly data matrix from 2001 to 2015, which is 180 months. Data have been imported directly from the European Environmental Agency (EEA's Central Data Repository) [18] where the country reports are stored. The data in the web-based application (AIDE D) reflect the current/live status of the latest data, uploaded by countries and successfully tested. The average monthly air pollutants O3, NO2, PM10 and PM2.5 are available. The value of the monthly average was computed as the geometric mean of the monthly mean values of 18 air data measuring stations located in the city of Barcelona, as indicated in Figure 3. In Table 2, there are the minimum, maximum and geometric mean values of O3, NO2, PM10 and PM2.5 concentration in g/m 3 in Barcelona. Figure 4 shows pollutant's concentration boxplot.

3-2-How to Apply AQI*
As there is stated that from the point of view of compositional data analysis, the use of log-contrasts is much more suitable and coherent as using concentrations or even log-concentrations, Figure 5 shows Log contrast and API evolution:    Figure 7 global AQI* function is plotted. For this period, average of global AQI* values is 90.95, a reasonably good value for air quality. One can observe an increasing air quality value by January 2009 (red mark).    Table 3 shows the percentage of months with a given air quality, for each pollutant as well as the global one. It is clearly observable that global air quality increases in Barcelona after 2009, mainly for the increase of AQI for small particles. The European directive on clean air, approved in May 2008, can induce this. However, it seems to have no effect on ozone concentration.

4-Conclusions
Four main conclusions can be stated from this work:  The traditional methodology on which the expression of atmospheric air quality is based, presents certain inaccuracies and deficiencies. Compositional data analysis provides a more appropriate conceptual and methodological framework for establishing a quantitative air quality index based on air pollutant concentrations. The concept of log-contrast is key to developing the new methodology.
 The new air quality index proposed in this paper seems more appropriate to express air quality because it improves the correlation between input and output, establishes a reference scale and it is based on a methodology that takes into account the nature of the numerical data (compositional data).
 The air quality must be reported using an Air Quality Report (AQR). This AQR must have the overall air quality index AQI* and individual AQI for those air pollutants which quality is lower than the corresponding breakpoint.

5-Funding and Acknowledgments
The authors are grateful to:

6-Conflict of Interest
The author declares that there is no conflict of interests regarding the publication of this manuscript. In addition, the ethical issues, including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, and redundancies have been completely observed by the authors.