This project is based on my Bachelor in Statistics & Data Analytics thesis.
You can take a look at it on motif_clustering GitHub Repo
One of the biggest obstacles for retailers is to identify and act on the changing sales of the products. The fact that several retail channels use time series data with high dimensionality and heterogeneity makes traditional clustering techniques prone to failure. The work at hand proposes a new method for the clustering of retail sales time series through the motif discovery technique. Given the fact that subsequence clustering has been around for sometime, the contribution of the current work is a framework, which applies the Matrix Profile method to discover motifs in retail time series, capturing similarities between sales patterns. The low memory and low computational costs are what makes the Matrix Profile method fast, exact, robust, and scalable. For the current work, we analyzed the sub-categories of products. We called our approach MoClust to refer to Motif-based Time Series Clustering via Matrix Profile for Retail. The application of the motifs found to reduce dimensionality of the whole time series is the first step of our algorithm. After that, the clustering method is applied to these motifs to detect the different temporal sales patterns of the product sub-categories. Our experiments carried out on the retail dataset showed that the MoClust outperforms traditional raw time series clustering when combined with Matrix Profile distance (MPDist) compared to distance measures such as Euclidean and Dynamic Time Warping (DTW). The primary business goal of the work of this thesis is retail customer segmentation.