NASA Data Mining Algorithms
From SKYbrary Wiki
Data Mining Algorithms
NASA has developed data mining algorithms for application to data from commercial aviation and other large data repositories. Several of these algorithms are being deployed to Federal Aviation Administration (FAA) and its contractor, MITRE, under the Aviation Safety Information Analysis and Sharing (ASIAS) partnership.
Most of these algorithms are available as open-sourced code from DASHlink, a social platform for scientists working in Aeronautics and supporting research areas. These algorithms include the Multiple Kernel Anomaly Detection Algorithm (MKAD), that looks for anomalies in time sequences containing both discrete and continuous data. MKAD goes beyond looking for single variable exceedances and instead identifies violations of typical mathematical relationships across multiple variables. MKAD identifies this typicality automatically through its learning process applied to a large dataset.
Another key algorithm is Multivariate Time Series (MTS) Search, which allows a user to specify a query in the form of a time series over several variables of the user’s choosing, with arbitrary time delays between the waveforms over each variable. The MTS search algorithm quickly finds closely matching examples of the user’s query in very large data repositories.
The algorithms page within DASHlink also includes code to work with text repositories, including Latent Dirichlet Allocation, which automatically extracts topics, in the form of vectors of words, from a collection of documents, as well as Mariana, which has mostly been used to classify Aviation Safety Reporting System (ASRS) documents into a specified set of anomaly categories, but which can be used for any classification problem.
Many open-sourced and other publicly available algorithm implementations from institutions other than NASA are also available on the algorithms page.
- FAA Aviation Safety Information Analysis and Sharing system
- NASA web-based collaboration tool for those interested in data mining and systems health