# Category: Analysis

Articles with insights and analysis of the collected data

*What can the data tell us about patterns, associations and the human factor?*

**As we are closing one year of operation of the air quality monitors in Syros, a data analysis becomes timely and appropriate. This post summarizes the results of an exploratory forensic analysis of the data collected from a certain air quality monitor. Specifically, we were looking for what kind of information we can extract from the emission and other data collected over an extended time period including emission patterns and their dependence on possible governing factors like human activity and other extrinsic environment conditions. The potential of forecasting pollution levels was also investigated. The results of the analysis show some correlation between emission levels and time and day (pointing to human activity) and a weaker one with other measured environment parameters. **

## Description of the data

A recent Facebook post [1] provides the probability distributions for a number of air pollutant species, constructed from the data collected from the three air-quality monitor sites operating on Syros. In that study, data are grouped according to the four seasons and four daily intervals. Subsequently, the probability of expected emission levels is computed against season and daily interval for each species.* * In the present study, we wanted to go beyond probabilities and look into patterns and associations. For this purpose, we investigated the hourly data for particulate matter (PM 2.5 and 10) and volatile organic compound (VOC) emissions from device 8200015E, spanning the period from October 7, 2018 to July 20, 2019, encompassing a total of 6888 hours or 287 full days. In addition to the pollutants mentioned, concurrent temperature, relative humidity and ambient noise data were also collected from the same device. The reason for choosing the particular air monitor is the completeness of its data. Follows a brief description of the main results and their interpretation.

Fig. 1 shows the probability density of all measured quantities. The average value of the quantity is also given on each graph. These probabilities are unconditional (prior) and they concern the entire observation period–they are not divided into seasons or day periods as it is done in [1]–.

In the top three graphs showing the PM and VOC emission levels, we added these vertical dashed lines to show normal emission limits (at 30 ugr/m^{3} for PM and at 200 for the VOC-AQI–these are typically adopted emission limits in ordinary reports–). The % value shown to the right of the limit line is the probability that the emission level exceeds the limit (at any time). For example, for PM 10 and 2.5, the probability to exceed 30 ugr/m^{3} is 1.2 and 0.75 %, respectively, and for VOC-AQI, the probability to exceed 200 points is 33.16%. Finally, we should remark on the similarity of the PM 10 and 2.5, distributions (both seem to be Weibull types), as they are likely drawn from a common source.

## Categorization of combined emissions

We created four emission categories based on two variables: the value of VOC-AQI and the sum of the concentrations of the two PM species (since PM 10 and 2.5 have similar statistical properties). Thus, we have the following categories:

**Table 1. Definition of the Emission Categories.**

Category name and description | Category limits |

C1: low emissions, clean air. | VOC-AQI<200 & PM<30 ugr/m^{3}. |

C2: high VOC and low PM emissions, unhealthy air. | VOC-AQI≥200 & PM<30 ugr/m^{3}. |

C3: low VOC and high PM emissions, unhealthy air. | VOC-AQI<200 & PM≥30 ugr/m^{3}. |

C4: high VOC and high PM emissions, very unhealthy air. | VOC-AQI≥200 & PM≥30 ugr/m^{3}. |

The frequency of appearance of each emission category in the observations is depicted through the tile plot in Fig. 2.

Each tile in the figure represents a VOC-AQI range on the x-axis and a PM(2.5+10) concentration range on the y-axis; the color intensity of the tile is proportional to the number of times the x and y ranges of the tile occurred in the observations. The dashed lines divide the graph into the four emission categories in Table 1. The overall frequency of the emission categories is 64% for C1, 31% for C2, 3% for C3 and 2% for C4.

## Probing for associations and dependencies

In this study, we tried to look for dependence between emissions, on one hand, and some possible factors governing or affecting them such as human activity and weather conditions, on the other. Unfortunately, data about wind were not available from the monitors. Neither were data about specific human activity (such as activity of suspected pollution sources). Instead, the daily hours, week days and noise levels were used as proxies for human activity. In the following, the correlation of the emission categories defined above with the daily hours, week days, noise, and temperature, was investigated. The results are explained below.

### Ηour dependency

The top left graph in Fig. 3 is the plot of the emission category every hour computed from the recorded data and the criteria in Table 1 (the y-axis values 1 to 4 correspond to the category names C1 to C4). To obtain a comparison base for ruling out random factors, we constructed a random series of values from 1 to 4 shown in the top right graph. The relative proportions of the four values equal correspondingly the category proportions in Fig. 2. Subsequently, we computed the auto-covariance of both series for up to 72 hours of lag; these results are shown in the bottom two graphs in Fig. 3. The auto-covariance plot of the random series, as expected, is an impulse, consistent with uncorrelated white noise; the auto-covariance plot of the actual series, however, shows significant auto-correlation (that is, time dependence) and a periodicity of 24 hours consistent with daily cycles. This result confidently establishes a time pattern of the emissions ruling out any effects of randomness.

The hourly dependence of the emission categories was further investigated. The 6888 hours in observations were grouped into the 24 day-hours labeled as 0 for the first or 00:00 hour of the day to 23 for the last or 23:00 hour of the day. Subsequently, the frequency of each emission category in each day hour was computed. The results are given in the top graph in Fig. 4, where the x-axis is the day hour, the y-axis is the emission category 1, 2, 3, and 4, and the z-axis is the emission category frequency. Casual inspection of the graph reveals a dependence of Categories 1 and 2 with the day hour: C1 frequency decreases and C2 increases between hours 7 and 15 (3 pm). The dependency of C3 or C4 to the day hour is less obvious to an unaided inspection, due to the small size of these categories (their combined size is only 5% of the total observations).

We investigated the above observation more rigorously employing statistical methods. At this point it should be noted that computing the co-variance between day hour and emission category in order to prove or disprove dependence will not work generally since we do not anticipate a linear relationship between them. Consequently, we computed the conditional probabilities of each day hour in each emission category; these probabilities are plotted by the blue circles in the next four graphs in Fig. 4 for each category. The lines of the red x’s in the same plots represent the expected distribution of the day hours (that is, 1/24).

If the emission categories were uncorrelated (no dependence) to the day hours, then the red and blue lines in Fig. 4 should match closely. Inspection, however, of the four plots reveals, not only a mismatch between the two probabilities, but also a pattern for the conditional probabilities, even for the small categories C3 and C4. Of course, such mismatch could have been generated by random chance. Therefore, we experimented with random distributions of categories (preserving the original category proportions), and subsequently comparing the resulting probability quantiles. The results showed that the random distributions consistently produced no patterns and a closer fit to the red lines (as it was expected) then the actual category distributions in the figure; this helps to exclude randomness as a cause of the patterns in Fig. 4 and, hence, to strengthen our confidence that there is a dependence between day hours and emission levels.

Correlation between emission categories and day hours with a particular pattern during the working hours does point to human activity. Next, we will try to strengthen this belief by comparing the emission categories to the week day.

### Week day dependence

The 287 days of observations were grouped into the 7 days of the week and labeled starting with 1 for Sunday. Top left graph in Fig. 5 shows the category frequencies in each week day. The remaining seven graphs show, for each week day, the category frequency across the day hours. For example, we see that on Sunday between to 0 and 5 hours, C3 and C4 never appear, and that C3 appears only between 5 and 10 and C4 after 9. Another significant observation is that the hourly category patterns, particularly of C1 and C2 seen in Fig. 4, are also seen for the working days Mondays to Fridays, but the same patterns are less prevalent for Sundays and Saturdays. This last observation strengthens the hypothesis that emissions are governed by human activity.

We did not go beyond the above observations on the week days. There are many points worth investigating further in this; however, certain information that can give important clues was not available to us. The latter includes the particular location of the air-monitor device, whether it is placed in a home picking up the particular activities of the host, and so on.

### Noise correlation

Noise can be a good indicator of human activity in some cases indicating the type of the activity like the noise near traffic, ambient noise of a town district, noise of a home, etc. In this study, we did not know what flavor of noise to expect. We followed the same approach as in the day hours of Fig. 4. The noise levels are hourly averages expressed in dBA. The noise range was divided into 10 equal sized sub-ranges spanning 6 dBA per range. The frequency of coincidence of the emission categories in each noise range is plotted on the top graph in Fig. 6, where the x-axis represents the noise level (dBA).

The top graph initially suggests a significant correlation. However, unlike the day hours in Fig. 4, the noise levels are not uniformly distributed in the observations and the frequency graph may mislead us to believe a strong correlation. Instead, we examined the probabilities to detect any correlation. These are shown in the next four graphs in Fig. 6. Each graph shows (a) the probability of each noise range given the coincidence of a certain emission category by the blue circles; and (b) the prior probability of each noise range by the red x’s (compare with the probability density in Fig. 1).

We do see a significant mismatch of the two probabilities, especially for Categories 3 and 4. The average noise level for the entire data is 49.5 dBA (Fig. 1). The average noise level in C1 is 49, in C2, 51.3, in C3, 45.9, and in C4, 46.8 dBA.

To exclude random effects in the above results, we ran a quantile-to-quantile comparison in each category. The results are shown in Fig. 7. The left column of graphs shows results when the actual categories were compared, while the right column shows results when random categories were assigned to the hourly data. In each graph, the x-axis represents the noise levels of all observations (unconditional probability) and the y-axis represents the noise levels observed under a certain category. If the two data distributions are independent, then x=y; this condition is shown by the red dashed line in each graph.

Observing the left column, we see that C1 and C2 behave similar to the random results in the right column, albeit the matching error is slightly greater for the actual category. Categories C3 and C4 produced a greater mismatch that could not be achieved with a random category assignment, as seen in the right column. The particular mismatch of C3 and C4 for noise levels over 70 dBA could be possibly explained by a high-noise process producing C2, which masks, C3 and C4.

From the above analysis, we can conclude that emissions do show correlation with noise levels that is not consistent with a random coincidence. More specifically, C3 and C4 tend to appear in less noisy hours.

### Temperature dependency

A similar analysis as above was done to check for temperature dependency. The results are shown in Fig. 8. It is worth noting that Categories 3 and 4 avoid temperatures below 10 and above 27 (C). While Category 2 has the exact opposite tendency. Category 1 appears indifferent to selecting temperatures. The average temperature over the entire observation set is 17.24 (C) (see Fig. 1). As we see from the graphs, C1 yields an average of 17.7 (C), C2, 16.2, C3, 17.8, and C4, 18.1 (C).

A quantile analysis and comparison with random categories was done for temperatures (similar to the one done for noise). The results (not shown here) indicate that the probability mismatches in Fig. 8 are not likely to have occurred randomly.

## Clusters of daily emission patterns and their associations

In this section we investigate the daily emission patterns for their similarities. Fig. 3 shows the over all time series of the emission categories and Fig. 9 shows an detail example for a randomly selected day from the data: the graph has 24 hourly points forming a pattern that starts at C4 (00:00 hour), then briefly drops to C3, then remain mostly around C2 for the day hours and rises up to C4 in the evening.

The 6888 hourly emission data were divided into 287 days; each day is a 24-hour pattern of emissions like in Fig. 9. The average of the temperature and the noise level were also computed for each day. Subsequently, the 287 daily patterns were compared for similarities and five clusters were determined as best dividing the days (for some quick reference on clustering see [2]). These are discussed briefly next, where for each cluster the frequency of week days, average daily temperatures and average daily noise in the cluster are plotted.

### Cluster 1 (67.2% of all days are in this cluster)

This is the biggest of the five clusters including 67.2% of the days and has a good coherency between its members. The cluster includes a majority of C1 and C2 categories, with C1 being the most frequent. The top left graph in Fig. 10, shows all daily patterns in the cluster; the black line in the graph shows the average pattern of the cluster. In this cluster the days start with low emissions (average close to C1) and as the day progresses, especially during day hours emissions increase but not dramatically. The remaining three graphs help to identify the association of this cluster with week days, average day temperatures, and average day noise. We see that this cluster has no particular preference to week days, prefers moderate temperatures and rather quieter days.

### Cluster 2 (22.3% of all days)

This is the second in size cluster with also good coherency. This cluster also contains mostly C1 and C2 categories. The difference with Cluster 1 in Fig. 10, is that emissions are much higher, closer to C2, during evening and night and they abate during morning and evening. This cluster prefers Saturdays least of the week days, moderate temperatures, and low noise.

### Clusters 3 (3.1% of days) and 4 (2.4% of days)

These are two small clusters shown in Fig. 12. Cluster 3 has a small coherency while Cluster 4 has a greater one. Even though the data is too small to draw any general conclusions, it is worth noticing that Cluster 4 avoids Saturdays and appears only in one daily temperature. around 20 (C).

Fig. 12. Clusters 3 and 4.

### Cluster 5 (4.9% of days)

This cluster is also small and has the least coherency between its members. Notably, this cluster avoids Sundays and the lower daily temperatures.

### Possible predictors of the daily patterns

The previous figures reveal no particular strong dependence between a cluster and the week day, day average temperature and day average noise level, for which we have hoped to be possible predictors of the clusters. The weak dependence can also be seen in the following figure, where no discernible separation of the clusters is seen in the space of the week day, temperature and noise level.

The above result discourage using any linear classifier to detect or predict the daily clusters by knowing the week day, temperature and noise level. Instead a classifier like a neural network or a Kalman filter might be more appropriate. Nonetheless, as a curiosity, we attempted to construct a decision tree using the above three parameters as predictors. In our case, the classes are the 5 daily pattern clusters defined above; the predictors are the week day, average day temperature, and average day noise. Therefore, if these three parameters can be forecast for a given day, the decision tree can tell us, within certain accuracy, which daily emission pattern to expect.

The decision tree is shown in Fig. 15. The tree is 72% accurate (it classified 72% of the test days correctly). Note that Cluster 3 cannot be detected by this tree due to its small significance. For example, starting from the top of the tree, if the daily temperature is 19.78(C) or greater and the week day is later than Monday and the noise level is less than 59.5 dBA, then the most likely cluster is 1. *Decision trees, however, should not be relied upon unless we are certain of strong associations (dependence) between the classes and the parameters-predictors. *

## Conclusions and Recommendations

The study presented here is only a small effort at a much greater task to analyze and quantify the air quality near and around the port of Syros, with the ultimate goals of (a) pointing to specific pollution sources; and (b) being able to forecast pollution levels. The analysis conducted on the data of 8200015E showed a fair degree of correlation of observed emission levels and time and day, showing a tendency towards increasing emission levels during working hours and working days of the week. These two discoveries do point to human activities as a cause; however, more data from multiple sites are needed to increase our confidence that human activities *are* the main or a significant source of the observed levels of pollutants. We reserve this for a future study when more data from multiple sites will be available.

A weak correlation between the emissions and the hourly temperatures and noise levels was also detected. We note that a strong correlation of noise and emission levels will strengthen the hypothesis of human-caused pollution. At this point, no definite conclusions can be drawn about the connection of noise and emissions until sufficient data is analyzed, especially from multiple sites, including also a clarification of each monitor’s particular situation.

It was also discovered that emissions tend to follow certain daily patterns. These daily patterns consisting of 24 hourly values (from a category scale) were clustered into two major and three minor pattern clusters. Unfortunately, the association of these clusters with other measured data, such as average daily temperature, noise level, and week day, was weak. Therefore, no reliable forecasting of daily emission patterns can be made from these parameters alone.

Forecasting emission levels could be a significant achievement; and this is certainly realistic and doable for the Syros air monitors. The forecasting can be done for each of the pollutants separately or they can be combined into categories similar to what we have done here. Artificial neural networks are a good candidate framework for this purpose. There is enough data to begin training an artificial neural network that could return an hour ahead forecasts or, combined with clustering, even a day ahead forecasts.

Finally, using the collected data from all the sites on Syros and trying to identify a particular source of pollution will be a far greater task, which will require more types of data. These include: (a) weather data such as wind strength and direction and precipitation data; (b) activity data of suspected sources.

**Reference**s

**Reference**

[1] Tasos Matrapazis inLeave a CommentSyros Airmon(Public Group), Facebook, 7/15/2019: https://www.facebook.com/groups/739802836418597/permalink/866202563778623? sfns=mo [2] Google Developers,Machine learning and Clustering, https://developers.google.com/machine-learning/clustering/

by C.J. Hatziadoniu

In a previous post, we discussed particulate mater, especially PM10 and 2.5 and how one can interpret the data. In this post, we will discuss the air quality index (AQI) and how it can be computed from measured concentrations of pollutants.

Air quality assessment takes into account the concentrations of several different pollutants such as PM, CO, CO_{2}, O_{3}, and NO_{2}. The effect on the human health of each of the pollutants varies with the concentration level. AQI was devised to provide a single measure that can be compared across all relevant air pollutants. Furthermore, AQI provides descriptive quality classes (categories) indicating danger on human health.

Wikipedia offers a good source for first hand information about AQI [1]. The higher the value of AQI the worse the air quality. When the effect of more than one pollutants is measured, a separate AQI is computed for each pollutant and the highest value among all indices is taken as an overall measurement (e.g. a compounded index).

As an example, below we look at two different standards using AQI: The EPA and the EU standard and we compare them versus the same measured data for PM2.5 and 10 concentrations.

#### The EPA Index

The EPA (Environmental Protection Agency) is a government agency in the US providing standards and regulations affecting the environment. The AQI developed by the EPA is graded from 0 to 500 with increasing concentrations of pollutants. Six quality categories are specified with a descriptive name and a color code from “Good” (in green) corresponding to the lowest values of AQI to “Hazardous” (in purple) for AQI exceeding 300. The standard also specifies the averaging period of the measurement. Table 1 shows the AQI definition for PM2.5, PM10 and CO [1].

Table 1. AQI developed by EPA for PM2.5, PM10 and CO pollutants [1]

Each line in the table gives the lower and upper concentration limits of the pollutant in the category and the corresponding lower and upper values of the AQI. A linear interpolation is used to obtain the AQI for concentrations in between these limits.It should be noted that pollutant concentrations are reported as averages over specified time intervals. For example, PM2.5 and 10 concentrations are reported as 24-hour averages, while CO concentrations are reported as 8-hour averages. EPA has developed a different AQI range for different averaging periods: the longer the averaging period the greater the AQI for the same nominal concentration of the pollutant. Therefore, if one applies the AQI specified for 24-hour averages to measurements taken on 1-hour averages, the value of the computed index will be higher then the actual value–a pessimistic result.

*Example.* Consider the following daily average emissions of PM2.5 and 10 pollutants in Fig. 1 collected from the air-monitor tagged 8200015E from October 6 to December 31, 2018. The AQI is computed for each day according to Table 1 (24-hour averages). The results are shown in Fig. 2.

The figure includes in the same color code the quality categories in Table 1. Three bars are given for each day: the AQIs computed from the PM2.5 and 10 averages and the compounded AQI (i.e. the greater of the two for each day). In all cases, the PM2.5 concentrations determine the compounded index.

The figure shows that for most of the time of the data, pollution levels were not worse than moderate and that the great majority of the days fall under the category “good”. Also in the same figure, we see that there are couple of days (Oct. 22-23), where the AQI index came very close to 100, which is the lower bound of the category “unhealthy for sensitive groups”.

#### EU Index

EU defines the common air quality index (CAQI). The index resembles the one from EPA. The index is graded from 0 to 100. There are five categories with descriptive names from “Very Low” to “Very High”. As in the EPA standard, different indices are used for different averaging periods. Also CAQI specifies different values for measurements near traffic and in the city. Table 2 [2] gives the CAQI and corresponding categories for 24-hour average PM2.5 and 10 concentrations in a city. Note that the standard specifies PM2.5 as optional when a compounded index is computed.

Table 2. CAQI for 24-hour average PM2.5 and 10 concentrations in a city [2]

*Example.* The data in Fig. 1 are used here to derive the CAQI values for each day. The results are plotted in Fig. 3 along with a colored region giving the quality of CAQI. As in Fig. 2, three bars are given for each day with the CAQI from PM2.5 and 10 and the compounded CAQI. In all days, the CAQI from the PM2.5 concentrations determines the compounded CAQI.

With reference to Fig. 3, most of the days the pollution levels are low or very low. There are a few days with medium levels and two days where the levels were high; the latter are the Oct. 22 and 23, the same days when the EPA index was very close to the “high” category. Fig. 2 and 3 give consistent qualitative assessment of the air quality, with the difference occurring only in the boundary between categories.

As concluding remarks, we can say the CAQI provides a standard and a better way of reporting measurements of air pollutants; one that directly assesses effects on human health. Furthermore, CAQI (or AQI) can be computed simply in a spreadsheet like Excel, by importing data and performing simple numerical calculations.

*References*

[1] “Air Quality Index”, Wikipedia, https://en.wikipedia.org/wiki/Air_quality_index

[2] CiteairII — Common Information to European Air (2012-07-09). “CAQI Air quality index — Comparing Urban Air Quality across Borders – 2012” (PDF). Archived (PDF)from the original on 2018-02-18. Retrieved 2018-02-18.

Leave a Commentby C.J. Hatziadoniu

Let us begin with a caveat: * The author of this article has (some) expertise in mathematical gymnastics of big data, but no particular knowledge on environmental or health issues.* The following were merely written to stir up discussion, among mostly non-experts and amateurs, how to better interpret the air quality data available from the network of air monitors. In this article, we chose to talk about particulate matter as it is easier quantifiable and we can better wrap our heads around it. Particulate matter or PM refers to volatile pollutants in the form of microscopic solid or liquid particles that are suspended in the air we breathe [1]; they can come from various sources including industrial activity. These particles when inhaled can cause health damage and even reduce life expectancy. With respect to average particle size, PM is classified as PM_{10}, PM_{2.5}, etc^{*} The health effects can be expected on the basis of short-term or long-term exposure. A somewhat detailed illuminating document on this issue from 2003 can be found in [2]. Particularly PM_{2.5} is linked to cardiovascular and pulmonary disease ([2] page 14). Long-term exposure to PM_{2.5} even at the level of ` 10 μg/m ^{3}` can cause serious health issues ([2] page 16).

**Exposure Limits in the Regulations**

The concentration of PM in the air is expressed in `μg/m ^{3}`. The air monitor data provide the PM concentration averaged over a specified time interval (from a few seconds to every hour or every day). As an example, Fig. 1 shows the average PM

_{2.5}concentration for every hour over several weeks downloaded from a certain monitor. Exposure to a certain PM concentration over time determines the “dose” of PM an individual breathes in—the

`μg`of PM one has inhaled during the exposure to the polluted air—Consequently, the health effects are directly related to the dose. We should, however, distinguish between short-term and long-term exposure [2]. Generally, the level of tolerance for a short-term exposure is greater than for a long term [1,2]. How are we, then, to know looking at our data, such as in Fig. 1, whether we have been overexposed to PM and how much harm has been done? The limits set in regulations are the only (official) resource we have for determining the degree of overexposure. In the EU, Directive 2008/50/EC [3,4], along with other factors, provides the limits of PM concentrations for a certain time of exposure (e.g. 24-hour or a year). These limits, of course, were based on various studies some of which are mentioned in [2].

More specifically, the directive sets limits on PM_{10} and PM_{2.5}, separately. For PM_{10} two limits are set: the short-term exposure limit is set at `50 μg/m ^{3}` averaged over a 24-hour period; the long-term exposure limit is set at

`40 μg/m`averaged over a period of a year. For the PM

^{3}_{2.5}, however, the directive sets only the long term, the yearly, limit; this is

`25 μg/m`. The red line in Fig. 1 indicates the yearly limit. Notice in the graph the contiguous range of hours where this limit is exceeded. We should be careful here: the directive requires that the average concentration measured for a period of a year remains below the limit, not the instantaneous or daily average concentration. So having only the long-term limit creates an obvious difficulty to assess short-term exposure, particularly concerning the PM

^{3}_{2.5}concentrations which pose a greater health risk factor. However, as we will see a little later, we can get around this lack of short-term limits with some creative data filtering. First, let us go through a simple numerical example to make sense of the above limit in the directive.

So what does it mean that the yearly limit for PM_{2.5} is `25 μg/m ^{3}` ? The average person during tidal breathing inhales about

`0.5 lt`of air (

`1000 lt=1m`) in every breath; therefore, at a breath rate of 20 times a minute, the average person inhales about

^{3}`14 m`of air every day. Now, if a person is exposed to a constant concentration of PM

^{3}_{2.5}equal to the limit,

`25 μg/m`, then the person inhales the equivalent dose of

^{3}`350 μg`of PM

_{2.5}in a day for every day. According to the directive, this is the maximum safe quantity if this is repeated every day for a year; that is, the

`350 μg`represent the maximum value of our long-term daily allowance of PM

_{2.5}pollutant (if we can use “allowance” as a metaphor for how much we are permitted to consume every day). For illustration, Fig. 2 shows the daily dose of PM

_{2.5}as a percent of the maximum long-term allowance for each day represented in Fig. 1. We see that the daily allowance is exceeded during four days (Oct 22 to 25), and that the maximum over-dose goes up to 140% of the daily allowance. We should clarify here, that we are not allowed to draw the conclusion that these four days have exposed the people who happened to breathe the air into any particular danger; this is because we do not have guidelines for short-term limits; however, the red bars in the graph do raise a concern for these days especially since they occur in a row.

### Tracking the Yearly Allowance

In order to track the long-term effects of PM_{2.5} pollutants, it is suggested to use a moving average filter and then express the average as a fraction of the yearly limit set by the directive. The time window of the moving average should be set to the length of 365 days. Each day, the data of the previous 365 days are averaged and then normalized to the maximum limit (specified in the directive). The result is the cumulative exposure over the past year, which is the time window specified in the directive for the limit of PM_{2.5}. Fig. 3 gives an example of this. In this figure, we use the data of Fig. 1 as the source. Note that since data record begins on Oct 7, 2018, accumulation of the exposure begins to count at that time, and so it is growing every day, as more exposure to PM_{2.5} is added. If we had a year long data, the graph will continue to grow until one year is reached (e.g. untill Oct 7, 2019), and after that, the graph will stabilize.

In summary, we propose two methods to track PM_{2.5} concentrations and report the results:

*Short-Term: Daily Tracking*. This method tracks the daily average of the concentration and compares it to the limit value. The graph will look like that in Fig. 2. Exceeding the 100% mark for any day, will raise a flag that we are consuming pollutants too fast.*Long-Term: Past 365-Day Moving Average Tracking*. This method will track our past 365-day consumption and compare the average to the official limit. Exceeding the 100% mark indicates that, for the last 365 days, we have been breathing polluted air with unsafe concentrations of PM_{2.5}, with all possible health risks to the general population. The graph will look like that in Fig. 3.

Notes:

* PM_{10} denotes the concentration of particulate matter with an aerodynamic surface equivalent to a sphere of a radius equal to or less than ` 10 μm `. Similarly for PM_{2.5}.

**References **

[1] “Air Pollution Particulate Matter”, *GreenFacts*, GreenFacts Scientific Board, Dec 13, 2018, https://www.greenfacts.org/en/particulate-matter-pm/level-2/01-presentation.htm

[2] “Health Aspects of Air Pollution with Particulate Matter, Ozone and Nitrogen Dioxide”, *Report on a WHO Working Group*, World Health Organization, Bonn Germany, 13-15 January, 2003, http://www.euro.who.int/__data/assets/pdf_file/0005/112199/E79097.pdf

[3] “Air Quality Standards”, *Environment*, EC, Sept 6, 2018, http://ec.europa.eu/environment/air/quality/standards.htm

[4] “Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe”, *EUR-LEX*, EC, 2008, https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32008L0050