Forecasting algae blooms in aquaculture using mussels' openings data
Pondichery Vellamuthu Kripashanker, Deepan Shankar
MetadataShow full item record
Time series data consists of a series of measurements collected over a period of time. This type of data is very relevant in several domains, including healthcare, manufacturing, finance, environment, and many more. For these domains it is frequently of key importance to be able to predict the future values of these time series. Activity monitoring is a task related to forecasting where time series data is used as input signals to some events whose occurrence is supposed to depend on the values of these series. These events are typically of critical importance to the end users and the goal is to be able to anticipate them with sufficient lead time. Due to the uncertainty of the future, forecasting and anticipating these scenarios could help prevent or mitigate hazardous activities. In this thesis we address one such application - the anticipation of algae blooms in aquaculture industries. As algal blooms hinders the growth of the aquaculture species, it is of high importance to monitor the farms, avoiding serious damages to the species. In this thesis we propose a method for anticipating algae blooms based on measurements of mussels’ valve openings that domain experts think can be used as bio-sentinels of the blooms. We use machine learning models to address this predictive task and obtain models that can predict future algal bloom events based on the micro closures of the mussels. We focus our goal on predicting the presence of the algae Alexandrium Tamarense in the water environment. Due to the rarity of algae blooms, sampling procedures were used to balance the distribution of the target variable to facilitate the task of the learning algorithms. Overall, the experimental comparisons we have carried out have shown that we were able to obtain very good results, particularly in terms of being able to anticipate a high percentage of the blooms (80%) although with some false alarms (48%). Our results have also shown the advantage of adding sampling procedures to overcome the imbalanced distribution of our target variable. In summary, in this thesis we have developed a series of forecasting approaches based on feature engineering, machine learning models and sampling methods that have shown a great potential in terms of preventing algae blooms in aquaculture farms.