Measuring Different Tasks for Unstructured Data and High Speed Data in Data Stream Mining

Authors(3) :-P. Venkata Maheswara, K. Rajasekhar, Ch. Siva Sankar

Data streams are continuous flows of data. Examples of data streams include network traffic, sensor data, call center records and so on. One important problem is mining data streams in extremely large databases (e.g. 100 TB). Satellite and computer network data can easily be of this scale. However, today’s data mining technology is still too slow to handle data of this scale. In addition, data mining should be a continuous, online process, rather than an occasional one-shot process. Organizations that can do this will have a decisive advantage over ones that do not. One particular instance is from high speed network traffic where one hopes to mine information for various purposes, including identifying anomalous events possibly indicating attacks of one kind or another. A technical problem is how to compute models over streaming data, which accommodate changing environments from which the data are drawn. This is the problem of “concept drift” or “environment drift.” This problem is particularly hard in the context of large streaming data. How may one compute models that are accurate and useful very efficiently? For example, one cannot presume to have a great deal of computing power and resources to store a lot of data, or to pass over the data multiple times. Hence, incremental mining and effective model updating to maintain accurate modeling of the current stream are both very hard problems.

Authors and Affiliations

P. Venkata Maheswara
Assistant Professor, Department of Computer Science and Engineering, AITS-Tirupati, Andhra Pradesh, India
K. Rajasekhar
Assistant Professor, Department of Computer Science and Engineering, AITS-Tirupati, Andhra Pradesh, India
Ch. Siva Sankar
Assistant Professor, Department of Computer Science and Engineering, AITS-Tirupati, Andhra Pradesh, India

Data Stream, Data Stream Mining, Concept Drift/Environment Drift

  1. Latifur Khan1,Wei Fan, Data Stream Mining and Its Applications, June, 2012.
  2. Chen, S., Wang, H., Zhou, S., Yu, P (2008). Stop chasing trends: Discovering high order models in evolving data, In: Proc. ICDE, pp. 923–932 (2008).
  3. Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: a review. ACM SIGMOD Rec 2005,
  4. Mohamed Medhat Gaber, Advances in data stream mining, Volume 2, Januar y / Februar y 2012.
  5. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for projected clustering of high dimensional data streams," in Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, 2004, p.863.
  6. Gaber MM, Zaslavsky A, Krishnaswamy S. Mining data streams: a review. ACM SIGMOD Rec 2005,
  7. Nicolás García-Pedrajas · Aida de Haro-GarcíaScaling up data mining algorithms: review and taxonomy,Received: 4 June 2011 / Accepted: 26 September 2011 / Published online: 13 January 2012
  8. Madjid Khalilian, Norwati Mustapha, MD Nasir Suliman, MD Ali Mamat,” A Novel K-Means Based Clustering Algorithm for High Dimensional Data Sets”,Vol.1, IMECS 2010,march 2010.

Publication Details

Published in : Volume 2 | Issue 3 | May-June 2019
Date of Publication : 2019-06-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 08-17
Manuscript Number : SHISRRJ192310
Publisher : Shauryam Research Institute

ISSN : 2581-6306

Cite This Article :

P. Venkata Maheswara, K. Rajasekhar, Ch. Siva Sankar, "Measuring Different Tasks for Unstructured Data and High Speed Data in Data Stream Mining", Shodhshauryam, International Scientific Refereed Research Journal (SHISRRJ), ISSN : 2581-6306, Volume 2, Issue 3, pp.08-17, May-June.2019
URL : https://shisrrj.com/SHISRRJ192310

Article Preview