by Giorgio Tomassetti on Unsplash
Time series classification is one of the major research areas over the past few years mainly due to its? large number of practical applications in various domains. It has a usage in many industries such as business, hospitals, hotels and transportation. Stock market anomaly detection in business, identifying heartbeat patterns of patients in hospitals and detecting temperature levels in climate science are some of its? practical examples. Accurate time series classification can increase the business revenue by a high margin as well as facilitate optimal resource allocation and therefore many industries have a great interest in this area. There are few terms related with time series classification which need to be defined beforehand. They are time series data-sets, time series analysis and finally, time series classification.
A time series data set is a data set which represents some measurements of a quantity over a period of time. The behavior of the series heavily depends on the order of the points and changing the order of data points changes the meaning of the whole data set. Time series analysis is developing statistical models to provide reasonable explanations regarding sample data. These models can be developed using various machine learning technologies.
Time series classification deals with classifying the data points over the time based on its? behavior. There can be data sets which behave in an abnormal manner when comparing with other data sets. Identifying unusual and anomalous time series is becoming increasingly common for organizations. It is a must for an organization to identify abnormal behaviors in order to make strong business decisions and market predictions. As an example, huge business industries such as Yahoo monitor their mail servers over time in order to detect anomalies and malicious time series. In this case, Feature Extraction can be used as a methodology for time series classification.
Feature extraction related to extracting information from a time serious in order to represent the time series as a feature vector. These features can be derived by using scientific time series analysis. Correlation structure, distribution, entropy, stationarity and scaling properties are some of the examples for time series features and they facilitate to fit time series into a range of time series models. It is mainly related to statistics as most of the features which describe time series information are statistical.
Huge amounts of time series data are collected every day from many heterogeneous data sources across different application domains. A vast amount of data are generated in a fraction of a second especially in social media such as Facebook and Twitter. The highly dynamic and fluctuating nature of these domains along with collecting and storing such enormous amounts of data, poses new challenges for time series classification. As a result of the size, velocity and the complexity inherent in big data, the traditional classification methods such as instance based classification may fail in identifying anomalous time series in an accurate manner. Data noise and seasonality also increase this possibility. Feature based approaches are more interpretable and more resilient to missing data and noisy data. Therefore, preprocessing these data efficiently and identifying hidden patterns with bare minimum resources is a contemporary research interest.
A number of researchers have studied regarding time series classification over the past using different approaches. Rob Hyndman et al. propose an idea for time series classification using Principal Component Analysis (PCA) on features [1]. This research has mainly focused on detecting unusual or anomalous time series. For that, they have applied bivariate outlier detection methods on first two principal components of a particular time series and through that, they have identified the most unusual time series among a given set of time series. This methodology has been compared with K-Means clustering as a baseline method and has out-performed that as a result of using a well-researched feature space for classification.
Ben Fulcher et al. introduce a time series classification technology based on a set of selected features of a time series [2]. They have developed a mechanism for automating the process of extracting features from a time series. After generating a large number of features, the most suitable features for representing a particular time series have been selected through Greedy Approach. A time series is represented as a feature vector and a set of feature vectors are used with a classification model such as a decision tree for time series classification. This methodology has given a better performance over traditional classification methodologies such as instance based classification. In this case, they have also introduced a set of self-describable features for a time series such as lumpiness, spikiness, level shift and crossing points while using them for time series classification.
Feature based time series classification has also been used for time series analysis and visualization purposes. Nick Jones et al. propose a mechanism for time series representation using their properties measured by diverse scientific methods [3]. It supports organizing time series data sets automatically based on their properties. Time series representation has been achieved using two dimensional matrix where rows represent times series and columns represent their operations. It makes time series analysis easier as it represents a large amount of information using time series features.
Time series classification is a supportive mechanism for time series forecasting. Kasun Bandara et al. propose a mechanism for time series forecasting using Long Short-Term Memory(LSTM) networks [4]. In this case, they have developed different LSTM networks for different clusters of time series and time series forecasting for different clusters have been performed separately. In this case, feature based classification has been used as a supporting mechanism for time series clustering after representing a time series as a feature vector.
References
[1] R.J. Hyndman, E. Wang and N. Laptev. Large-scale unusual time series detection. In Proceedings ? 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015, pages 1616?1619, 2016
[2] B.D. Fulcher and N.S. Jones. Highly comparative feature-based time-series classification. IEEE Transactions on Knowledge and Data Engineering, 26(12):3026? 3037, 2014
[3] B.D. Fulcher, M.A. Little and N.S. Jones. Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83), 2013
[4] K. Bandara, C. Bergmeir and S. Smyl. Forecasting across time series databases using Long Short-Term Memory networks on groups of similar series, 2017 [online] Available at: https://arxiv.org/abs/1710.03222 [Accessed 4 Nov. 2018]