Contribution à la fouille de données spatio-temporelles : application à l'érosion

TitreContribution à la fouille de données spatio-temporelles : application à l'érosion
Publication TypeThesis
Year of Publication2014
AuthorsSanhes, J
Academic DepartmentPPME laboratory
DegreePhD
Number of Pages131
Date Published09/2014
UniversityUniversity of New Caledonia
CityNOUMEA
Thesis TypeComputer Science
Abstract

example, migration flows studies appear to be very different from disease spread studies. Indeed, interestingness of the first relies on tracking trajectories, whereas the second is about finding the factors of spread. Moreover, each class of a spatio-temporal problem can be tackled differently, depending on which parameters are considered: the studied spatial neighbourhood, the number of characteristics associated with the objects, or whether events are supposed correlated or independent. As a result, data mining techniques are often specific to a sub-class of spatio-temporal problem, that is to say, to a limited set of hypothesis.
In order to bring out new knowledge from data, it seems to be necessary to enlarge this set of hypothesis, that is to say, to widen the field of possibilities regarding correlations that may exist between events. For this, we propose a new model that allows to take into account more considerations than existing studies. For example, this representation allows to model the complex spatio-temporal dynamic of erosion phenomenon: an object can be split up in several other objects, or can merge with other objects into one. More precisely, we use a single directed graph, that becomes acyclic thanks to the temporal component of the problem, and that is attributed by several characteristics. Mining a single graph is a nontrivial operation, and is even more complex because of the plurality of the attributes. We focus here on searching paths of attributes, under frequency and non-redundancy constraints.
Those constraints have been largely studied for transactional databases, but have been less studied in the case of a single graph (or even not studied at all).
Conjointly to those primitive constraints, it is often necessary to filter the set of found patterns that can be too numerous and/or not relevant for experts. To do so, we need to solicit experts on the domain of the studied data. However, it is difficult to translate a wide knowledge of a given domain into constraints. In addition, such translation could plausibly bring some human mistakes. From this observation, we propose to use existing expert knowledge that has been expressed in the form of mathematical models and published in the litterature of the domain. These models present the advantages of being both highly informative and synthetic; their use avoids –or greatly reduce– human intervention. We focus on the case where those models are mathematical functions of several variables giving a result in R, that we can use as an expert measure to define a minimum threshold-based constraint. We highlight some of its theorical properties enabling search space pruning for frequent itemsets mining.
Finally, we apply the two mining methods to the study of erosion in New-Caledonia. The studied data is heterogeneous with numerical and categorical values coming from multiple sources (e.g. from satellite images, digital elevation model, land cover truth or geology).
We elaborate two scenarii. In the first one, we mine a set of pixels, that can be seen as a transactional database. We seek properties on pixels expressing a high erosion risk according to an expert model. In the second scenario, we mine a single attributed acyclic graph: we exploit the previous results to seek temporal series of characteristics leading to a high or low erosion risk. A visualisation prototype allows to remap and highlight occurrences of these paths. The results bear out the interest of proposed approaches. In particular, they highlights areas that are known for their erosive dynamic.