Title: | Outlier Detection Using Statistical and Machine Learning Methods |
---|---|
Description: | Local Correlation Integral (LOCI) method for outlier identification is implemented here. The LOCI method developed here is invented in Breunig, et al. (2000), see <doi:10.1145/342009.335388>. |
Authors: | Siddharth Jain and Prabhanjan Tattar |
Maintainer: | Siddharth Jain <[email protected]> |
License: | GPL-2 |
Version: | 0.1 |
Built: | 2024-11-16 04:03:41 UTC |
Source: | https://github.com/cran/SMLoutliers |
We intend to provide host of methods for identifying outliers. This will cut across statistical and machine learning methods.
M.M. Breunig, H.P. Kriegel, R.T. Ng, and J. Sander. Lof: Identifying density-based local outliers. In Proc. SIGMOD Conf., pages 93-104, 2000.
data(stiff) summary(stiff)
data(stiff) summary(stiff)
We provide an R implementation of the Local Correlation Integral method for detecting outliers as developed by Breunig, et al. (2000), and we follow its description given in Papadimitriou, et al. (2002).
LOCI(data, alpha)
LOCI(data, alpha)
data |
Any R data.frame which consists of numeric values only |
alpha |
a number in the unit interval for the fractional circle search |
A simple implementation is provided here. The core function is the distance function. For each observation, a search is made for nearest neighbors within r distance of it, and then for each of these neighbors, we find the number of observations in the fractional circle. Calculations based on multi-granularity deviation factor, MDEF, help in determining the outlier.
Siddharth Jain and Prabhanjan Tattar
M.M. Breunig, H.P. Kriegel, R.T. Ng, and J. Sander. Lof: Identifying density-based local outliers. In Proc. SIGMOD Conf., pages 93-104, 2000. Papadimitriou, S., Kitagawa, H., Gibbons, P.B. and Faloutsos, C., 2003, March. Loci: Fast outlier detection using the local correlation integral. In Data Engineering, 2003. Proceedings. 19th International Conference on (pp. 315-326). IEEE.
data(stiff) OM <- LOCI(stiff,0.5) OM
data(stiff) OM <- LOCI(stiff,0.5) OM
Four measures of stiffness of 30 boards are available. The first measure of stiffness is obtained by sending a shock wave down the board, the second measure is obtained by vibrating the board, and remaining are obtained from static tests.
data(stiff)
data(stiff)
A data frame with 30 observations on the following 4 variables.
x1
first measure of stiffness is obtained by sending a shock wave down the board
x2
second measure is obtained by vibrating the board
x3
third measure is obtained by a static test
x4
fourth measure is obtained by a static test
Johnson, R.A., and Wichern, D.W. (1982-2007). Applied Multivariate Statistical Analysis, 6e. Pearson Education. Tattar, et al. (2016). A Course in Statistics with R. J. Wiley.
data(stiff) summary(stiff)
data(stiff) summary(stiff)