Multivariate Time-Series Clustering Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) 2019 Moderator Election Q&A - Questionnaire 2019 Community Moderator Election ResultsHow to deal with time series which change in seasonality or other patterns?similarity measure for multivariate time series with heterogeous length and contentHow can I predict traffic based on previous time series data?Discovering dis-associations between periods of time-seriesClustering users based on buying behaviourhow to compare different sets of time series dataDifference between Time series clustering and Time series SegmentationClassification/Prediction based on Multivariate Time SeriesMultivariate Time Series Anomalous Entry DetectionMultivariate time series classification using KNN and DTWInteractive labeling/annotating of time series dataLSTM Time series prediction for multiple multivariate series

Did Mueller's report provide an evidentiary basis for the claim of Russian govt election interference via social media?

Co-worker has annoying ringtone

What's the point of the test set?

1-probability to calculate two events in a row

How would a mousetrap for use in space work?

Semigroups with no morphisms between them

Getting prompted for verification code but where do I put it in?

Most bit efficient text communication method?

What does Turing mean by this statement?

How many morphisms from 1 to 1+1 can there be?

Karn the great creator - 'card from outside the game' in sealed

A term for a woman complaining about things/begging in a cute/childish way

C's equality operator on converted pointers

Why does 14 CFR have skipped subparts in my ASA 2019 FAR/AIM book?

Converted a Scalar function to a TVF function for parallel execution-Still running in Serial mode

Putting class ranking in CV, but against dept guidelines

Can the Flaming Sphere spell be rammed into multiple Tiny creatures that are in the same 5-foot square?

Why do early math courses focus on the cross sections of a cone and not on other 3D objects?

Significance of Cersei's obsession with elephants?

Strange behavior of Object.defineProperty() in JavaScript

If Windows 7 doesn't support WSL, then what is "Subsystem for UNIX-based Applications"?

Intuitive explanation of the rank-nullity theorem

How did Fremen produce and carry enough thumpers to use Sandworms as de facto Ubers?

Amount of permutations on an NxNxN Rubik's Cube



Multivariate Time-Series Clustering



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
2019 Moderator Election Q&A - Questionnaire
2019 Community Moderator Election ResultsHow to deal with time series which change in seasonality or other patterns?similarity measure for multivariate time series with heterogeous length and contentHow can I predict traffic based on previous time series data?Discovering dis-associations between periods of time-seriesClustering users based on buying behaviourhow to compare different sets of time series dataDifference between Time series clustering and Time series SegmentationClassification/Prediction based on Multivariate Time SeriesMultivariate Time Series Anomalous Entry DetectionMultivariate time series classification using KNN and DTWInteractive labeling/annotating of time series dataLSTM Time series prediction for multiple multivariate series










4












$begingroup$


I have a streaming data along with timestamp dataset that looks like this:
Sample Data Look alike1.png



Timestamp can be inclusive of "seconds" too, but the data may or may not change every second. it depends on the previous values(rows i.e data which came earlier w.r.t time).



Column1, Column2 .... ColumnN correspond to the variables (they change over time) and "Label" shows the different samples. You can assume that the values tend to decrease over time for a particular label.



Labels A1,B1,C1.........A2,... M labels.



Note : Values of timeNew of a Label depends on values of timeOld of that Label and Labels belong to its cluster.



I need to group Labels with similar behavior over time together (e.g. Label A1 and Label C1 should be put in the same cluster and B1,D2 may fall into same cluster over time as they tend to behave similar over time).



I thought of using DTW and get the similarity of each Label with respect to other Labels. but not sure, how to proceed when i have N Columns.



To be precise, i need to group Labels based on their similarities (Column1 .. ColumnN) over time and group them.



Once i group them when new data comes in i should be able to predict the values(Column1.. ColumnN) for a Label based on the previously seen data(can be just minutes closer to the current prediction) and the values associated with the Labels in its cluster and predict it accordingly.










share|improve this question











$endgroup$




bumped to the homepage by Community 24 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.














  • $begingroup$
    @Vincenzo please look into this question now
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:24










  • $begingroup$
    @Toros91 refer the question now
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:25










  • $begingroup$
    I've retracted my vote, lets see what others think about it as well.
    $endgroup$
    – Toros91
    Mar 22 '18 at 1:22










  • $begingroup$
    Looks like you are looking for a Multivariate Time series forecasting.
    $endgroup$
    – Rajshekar Reddy
    Nov 20 '18 at 18:38















4












$begingroup$


I have a streaming data along with timestamp dataset that looks like this:
Sample Data Look alike1.png



Timestamp can be inclusive of "seconds" too, but the data may or may not change every second. it depends on the previous values(rows i.e data which came earlier w.r.t time).



Column1, Column2 .... ColumnN correspond to the variables (they change over time) and "Label" shows the different samples. You can assume that the values tend to decrease over time for a particular label.



Labels A1,B1,C1.........A2,... M labels.



Note : Values of timeNew of a Label depends on values of timeOld of that Label and Labels belong to its cluster.



I need to group Labels with similar behavior over time together (e.g. Label A1 and Label C1 should be put in the same cluster and B1,D2 may fall into same cluster over time as they tend to behave similar over time).



I thought of using DTW and get the similarity of each Label with respect to other Labels. but not sure, how to proceed when i have N Columns.



To be precise, i need to group Labels based on their similarities (Column1 .. ColumnN) over time and group them.



Once i group them when new data comes in i should be able to predict the values(Column1.. ColumnN) for a Label based on the previously seen data(can be just minutes closer to the current prediction) and the values associated with the Labels in its cluster and predict it accordingly.










share|improve this question











$endgroup$




bumped to the homepage by Community 24 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.














  • $begingroup$
    @Vincenzo please look into this question now
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:24










  • $begingroup$
    @Toros91 refer the question now
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:25










  • $begingroup$
    I've retracted my vote, lets see what others think about it as well.
    $endgroup$
    – Toros91
    Mar 22 '18 at 1:22










  • $begingroup$
    Looks like you are looking for a Multivariate Time series forecasting.
    $endgroup$
    – Rajshekar Reddy
    Nov 20 '18 at 18:38













4












4








4


7



$begingroup$


I have a streaming data along with timestamp dataset that looks like this:
Sample Data Look alike1.png



Timestamp can be inclusive of "seconds" too, but the data may or may not change every second. it depends on the previous values(rows i.e data which came earlier w.r.t time).



Column1, Column2 .... ColumnN correspond to the variables (they change over time) and "Label" shows the different samples. You can assume that the values tend to decrease over time for a particular label.



Labels A1,B1,C1.........A2,... M labels.



Note : Values of timeNew of a Label depends on values of timeOld of that Label and Labels belong to its cluster.



I need to group Labels with similar behavior over time together (e.g. Label A1 and Label C1 should be put in the same cluster and B1,D2 may fall into same cluster over time as they tend to behave similar over time).



I thought of using DTW and get the similarity of each Label with respect to other Labels. but not sure, how to proceed when i have N Columns.



To be precise, i need to group Labels based on their similarities (Column1 .. ColumnN) over time and group them.



Once i group them when new data comes in i should be able to predict the values(Column1.. ColumnN) for a Label based on the previously seen data(can be just minutes closer to the current prediction) and the values associated with the Labels in its cluster and predict it accordingly.










share|improve this question











$endgroup$




I have a streaming data along with timestamp dataset that looks like this:
Sample Data Look alike1.png



Timestamp can be inclusive of "seconds" too, but the data may or may not change every second. it depends on the previous values(rows i.e data which came earlier w.r.t time).



Column1, Column2 .... ColumnN correspond to the variables (they change over time) and "Label" shows the different samples. You can assume that the values tend to decrease over time for a particular label.



Labels A1,B1,C1.........A2,... M labels.



Note : Values of timeNew of a Label depends on values of timeOld of that Label and Labels belong to its cluster.



I need to group Labels with similar behavior over time together (e.g. Label A1 and Label C1 should be put in the same cluster and B1,D2 may fall into same cluster over time as they tend to behave similar over time).



I thought of using DTW and get the similarity of each Label with respect to other Labels. but not sure, how to proceed when i have N Columns.



To be precise, i need to group Labels based on their similarities (Column1 .. ColumnN) over time and group them.



Once i group them when new data comes in i should be able to predict the values(Column1.. ColumnN) for a Label based on the previously seen data(can be just minutes closer to the current prediction) and the values associated with the Labels in its cluster and predict it accordingly.







machine-learning python clustering time-series






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 21 '18 at 23:14







Abhinaya Krishna

















asked Mar 20 '18 at 0:37









Abhinaya KrishnaAbhinaya Krishna

2413




2413





bumped to the homepage by Community 24 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 24 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.













  • $begingroup$
    @Vincenzo please look into this question now
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:24










  • $begingroup$
    @Toros91 refer the question now
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:25










  • $begingroup$
    I've retracted my vote, lets see what others think about it as well.
    $endgroup$
    – Toros91
    Mar 22 '18 at 1:22










  • $begingroup$
    Looks like you are looking for a Multivariate Time series forecasting.
    $endgroup$
    – Rajshekar Reddy
    Nov 20 '18 at 18:38
















  • $begingroup$
    @Vincenzo please look into this question now
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:24










  • $begingroup$
    @Toros91 refer the question now
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:25










  • $begingroup$
    I've retracted my vote, lets see what others think about it as well.
    $endgroup$
    – Toros91
    Mar 22 '18 at 1:22










  • $begingroup$
    Looks like you are looking for a Multivariate Time series forecasting.
    $endgroup$
    – Rajshekar Reddy
    Nov 20 '18 at 18:38















$begingroup$
@Vincenzo please look into this question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:24




$begingroup$
@Vincenzo please look into this question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:24












$begingroup$
@Toros91 refer the question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:25




$begingroup$
@Toros91 refer the question now
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:25












$begingroup$
I've retracted my vote, lets see what others think about it as well.
$endgroup$
– Toros91
Mar 22 '18 at 1:22




$begingroup$
I've retracted my vote, lets see what others think about it as well.
$endgroup$
– Toros91
Mar 22 '18 at 1:22












$begingroup$
Looks like you are looking for a Multivariate Time series forecasting.
$endgroup$
– Rajshekar Reddy
Nov 20 '18 at 18:38




$begingroup$
Looks like you are looking for a Multivariate Time series forecasting.
$endgroup$
– Rajshekar Reddy
Nov 20 '18 at 18:38










2 Answers
2






active

oldest

votes


















0












$begingroup$

It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).



In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).



Three additional advice:



  1. You can use pandas Dataframes for the pre-processing.

  2. You should at least have a basic understanding of test and training samples before you start anything.

  3. The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.

If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).






share|improve this answer









$endgroup$












  • $begingroup$
    Apologies, i was not clear earlier. Please do read the question now. @el-burro
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:15



















0












$begingroup$

Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):



  • An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)


  • A paper on clustering of time-series


  • A PCA-based similarity measure for multivariate time-series


  • A review on feature extraction and pattern recognition methods in time-series data


Before proceeding with any method, I believe it is important to spend some time to think of the following:



  • Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).


  • Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).


  • Depending on what you are trying to find out, you need to decide
    whether Dynamic time warping (DTW) is useful for you. For example, if
    2 time-series have the exact same pattern but one of the 2 has a
    time-delay, should they still belong in the same cluster? (and how
    small/big time-delay is acceptable to put them in the same cluster?)






share|improve this answer











$endgroup$












  • $begingroup$
    Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 7:51











  • $begingroup$
    So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
    $endgroup$
    – missrg
    Mar 21 '18 at 9:32










  • $begingroup$
    You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
    $endgroup$
    – missrg
    Mar 21 '18 at 10:15












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29287%2fmultivariate-time-series-clustering%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).



In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).



Three additional advice:



  1. You can use pandas Dataframes for the pre-processing.

  2. You should at least have a basic understanding of test and training samples before you start anything.

  3. The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.

If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).






share|improve this answer









$endgroup$












  • $begingroup$
    Apologies, i was not clear earlier. Please do read the question now. @el-burro
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:15
















0












$begingroup$

It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).



In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).



Three additional advice:



  1. You can use pandas Dataframes for the pre-processing.

  2. You should at least have a basic understanding of test and training samples before you start anything.

  3. The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.

If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).






share|improve this answer









$endgroup$












  • $begingroup$
    Apologies, i was not clear earlier. Please do read the question now. @el-burro
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:15














0












0








0





$begingroup$

It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).



In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).



Three additional advice:



  1. You can use pandas Dataframes for the pre-processing.

  2. You should at least have a basic understanding of test and training samples before you start anything.

  3. The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.

If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).






share|improve this answer









$endgroup$



It depends a bit if the timestamps have any connection to each other (is t2 impacted by t1 as example).



In general this looks like a classification problem and you can use for example sklearn. If you want to distinguish A and C and all other cases you would end up with a multi-class classification problem and not all algorithms support these. If not You can just transform the Label into target (A+B) /non_target (the rest).



Three additional advice:



  1. You can use pandas Dataframes for the pre-processing.

  2. You should at least have a basic understanding of test and training samples before you start anything.

  3. The correct success metric is both important and sometimes challenging to find. Quick example: If you choose accuracy and 99% of your values belong to a class any algorithm that per default predicts this class will have a very high accuracy.

If timestamps are connected you might want to do some pre-processing and add data from previous timestamps to the current one (example: average value of column 1 of last x previous tstamps).







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 20 '18 at 11:01









El BurroEl Burro

455311




455311











  • $begingroup$
    Apologies, i was not clear earlier. Please do read the question now. @el-burro
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:15

















  • $begingroup$
    Apologies, i was not clear earlier. Please do read the question now. @el-burro
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 23:15
















$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15





$begingroup$
Apologies, i was not clear earlier. Please do read the question now. @el-burro
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 23:15












0












$begingroup$

Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):



  • An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)


  • A paper on clustering of time-series


  • A PCA-based similarity measure for multivariate time-series


  • A review on feature extraction and pattern recognition methods in time-series data


Before proceeding with any method, I believe it is important to spend some time to think of the following:



  • Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).


  • Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).


  • Depending on what you are trying to find out, you need to decide
    whether Dynamic time warping (DTW) is useful for you. For example, if
    2 time-series have the exact same pattern but one of the 2 has a
    time-delay, should they still belong in the same cluster? (and how
    small/big time-delay is acceptable to put them in the same cluster?)






share|improve this answer











$endgroup$












  • $begingroup$
    Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 7:51











  • $begingroup$
    So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
    $endgroup$
    – missrg
    Mar 21 '18 at 9:32










  • $begingroup$
    You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
    $endgroup$
    – missrg
    Mar 21 '18 at 10:15
















0












$begingroup$

Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):



  • An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)


  • A paper on clustering of time-series


  • A PCA-based similarity measure for multivariate time-series


  • A review on feature extraction and pattern recognition methods in time-series data


Before proceeding with any method, I believe it is important to spend some time to think of the following:



  • Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).


  • Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).


  • Depending on what you are trying to find out, you need to decide
    whether Dynamic time warping (DTW) is useful for you. For example, if
    2 time-series have the exact same pattern but one of the 2 has a
    time-delay, should they still belong in the same cluster? (and how
    small/big time-delay is acceptable to put them in the same cluster?)






share|improve this answer











$endgroup$












  • $begingroup$
    Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 7:51











  • $begingroup$
    So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
    $endgroup$
    – missrg
    Mar 21 '18 at 9:32










  • $begingroup$
    You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
    $endgroup$
    – missrg
    Mar 21 '18 at 10:15














0












0








0





$begingroup$

Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):



  • An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)


  • A paper on clustering of time-series


  • A PCA-based similarity measure for multivariate time-series


  • A review on feature extraction and pattern recognition methods in time-series data


Before proceeding with any method, I believe it is important to spend some time to think of the following:



  • Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).


  • Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).


  • Depending on what you are trying to find out, you need to decide
    whether Dynamic time warping (DTW) is useful for you. For example, if
    2 time-series have the exact same pattern but one of the 2 has a
    time-delay, should they still belong in the same cluster? (and how
    small/big time-delay is acceptable to put them in the same cluster?)






share|improve this answer











$endgroup$



Aleksandr Blekh's answer in this older question provides a lot of interesting reading material for time-series clustering methods and examples. Also, I include below some interesting reading material for calculating similarity among multivariate time-series (the latest 2 are quite old but I think they are very interesting):



  • An approach on the use of DTW with multivariate time-series (the paper actual refers to classification but you might want to use the idea and adjust it for clustering)


  • A paper on clustering of time-series


  • A PCA-based similarity measure for multivariate time-series


  • A review on feature extraction and pattern recognition methods in time-series data


Before proceeding with any method, I believe it is important to spend some time to think of the following:



  • Try to select the right step for your input data (e.g. if the time-steps are per second, the time-series might be too long and unnecessarily detailed for this job, while hourly data might catch the patterns better).


  • Seasonality might be interesting to take under consideration: e.g. if the time-series are hourly and last for several days/months, there might be some daily/monthly seasonality. In this case, you might want to calculate and compare the average day/month from each time-series (if so, you will also need to decide whether weekdays and weekends should be averaged all together or treated in a different way).


  • Depending on what you are trying to find out, you need to decide
    whether Dynamic time warping (DTW) is useful for you. For example, if
    2 time-series have the exact same pattern but one of the 2 has a
    time-delay, should they still belong in the same cluster? (and how
    small/big time-delay is acceptable to put them in the same cluster?)







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 21 '18 at 9:49

























answered Mar 20 '18 at 11:06









missrgmissrg

36518




36518











  • $begingroup$
    Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 7:51











  • $begingroup$
    So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
    $endgroup$
    – missrg
    Mar 21 '18 at 9:32










  • $begingroup$
    You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
    $endgroup$
    – missrg
    Mar 21 '18 at 10:15

















  • $begingroup$
    Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
    $endgroup$
    – Abhinaya Krishna
    Mar 21 '18 at 7:51











  • $begingroup$
    So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
    $endgroup$
    – missrg
    Mar 21 '18 at 9:32










  • $begingroup$
    You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
    $endgroup$
    – missrg
    Mar 21 '18 at 10:15
















$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51





$begingroup$
Yes, data generally changes over hours/days. I do agree with your statement that DTW might be useful. I'm confused about DTW to apply on the columns. as i can add columns like day of the week and stuff along with the given N columns. I wanna know how to approach with DTW when you have multiple columns. And there can many class labels like A,B,C,D...... I need to group them based on their similarity over time. @missrg
$endgroup$
– Abhinaya Krishna
Mar 21 '18 at 7:51













$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32




$begingroup$
So if I understand correctly what you have is multivariate time-series, right? (several columns correspond to each label) One (simpler) approach would be to calculate similarity metric for each time-series separately (i.e. per column) and then sum them up before the clustering (e.g. using a weighted average). In any case, I will edit my reply and add some reading material for finding similarity among multivariate time-series.
$endgroup$
– missrg
Mar 21 '18 at 9:32












$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15





$begingroup$
You can also have a look in this post (look at the comments under the question): datascience.stackexchange.com/questions/13445/…
$endgroup$
– missrg
Mar 21 '18 at 10:15


















draft saved

draft discarded
















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29287%2fmultivariate-time-series-clustering%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Францішак Багушэвіч Змест Сям'я | Біяграфія | Творчасць | Мова Багушэвіча | Ацэнкі дзейнасці | Цікавыя факты | Спадчына | Выбраная бібліяграфія | Ушанаванне памяці | У філатэліі | Зноскі | Літаратура | Спасылкі | НавігацыяЛяхоўскі У. Рупіўся дзеля Бога і людзей: Жыццёвы шлях Лявона Вітан-Дубейкаўскага // Вольскі і Памідораў з песняй пра немца Адвакат, паэт, народны заступнік Ашмянскі веснікВ Минске появится площадь Богушевича и улица Сырокомли, Белорусская деловая газета, 19 июля 2001 г.Айцец беларускай нацыянальнай ідэі паўстаў у бронзе Сяргей Аляксандравіч Адашкевіч (1918, Мінск). 80-я гады. Бюст «Францішак Багушэвіч».Яўген Мікалаевіч Ціхановіч. «Партрэт Францішка Багушэвіча»Мікола Мікалаевіч Купава. «Партрэт зачынальніка новай беларускай літаратуры Францішка Багушэвіча»Уладзімір Іванавіч Мелехаў. На помніку «Змагарам за родную мову» Барэльеф «Францішак Багушэвіч»Памяць пра Багушэвіча на Віленшчыне Страчаная сталіца. Беларускія шыльды на вуліцах Вільні«Krynica». Ideologia i przywódcy białoruskiego katolicyzmuФранцішак БагушэвічТворы на knihi.comТворы Францішка Багушэвіча на bellib.byСодаль Уладзімір. Францішак Багушэвіч на Лідчыне;Луцкевіч Антон. Жыцьцё і творчасьць Фр. Багушэвіча ў успамінах ягоных сучасьнікаў // Запісы Беларускага Навуковага таварыства. Вільня, 1938. Сшытак 1. С. 16-34.Большая российская1188761710000 0000 5537 633Xn9209310021619551927869394п

Беларусь Змест Назва Гісторыя Геаграфія Сімволіка Дзяржаўны лад Палітычныя партыі Міжнароднае становішча і знешняя палітыка Адміністрацыйны падзел Насельніцтва Эканоміка Культура і грамадства Сацыяльная сфера Узброеныя сілы Заўвагі Літаратура Спасылкі НавігацыяHGЯOiТоп-2011 г. (па версіі ej.by)Топ-2013 г. (па версіі ej.by)Топ-2016 г. (па версіі ej.by)Топ-2017 г. (па версіі ej.by)Нацыянальны статыстычны камітэт Рэспублікі БеларусьШчыльнасць насельніцтва па краінахhttp://naviny.by/rubrics/society/2011/09/16/ic_articles_116_175144/А. Калечыц, У. Ксяндзоў. Спробы засялення краю неандэртальскім чалавекам.І ў Менску былі мамантыА. Калечыц, У. Ксяндзоў. Старажытны каменны век (палеаліт). Першапачатковае засяленне тэрыторыіГ. Штыхаў. Балты і славяне ў VI—VIII стст.М. Клімаў. Полацкае княства ў IX—XI стст.Г. Штыхаў, В. Ляўко. Палітычная гісторыя Полацкай зямліГ. Штыхаў. Дзяржаўны лад у землях-княствахГ. Штыхаў. Дзяржаўны лад у землях-княствахБеларускія землі ў складзе Вялікага Княства ЛітоўскагаЛюблінская унія 1569 г."The Early Stages of Independence"Zapomniane prawdy25 гадоў таму было аб'яўлена, што Язэп Пілсудскі — беларус (фота)Наша вадаДакументы ЧАЭС: Забруджванне тэрыторыі Беларусі « ЧАЭС Зона адчужэнняСведения о политических партиях, зарегистрированных в Республике Беларусь // Министерство юстиции Республики БеларусьСтатыстычны бюлетэнь „Полаўзроставая структура насельніцтва Рэспублікі Беларусь на 1 студзеня 2012 года і сярэднегадовая колькасць насельніцтва за 2011 год“Индекс человеческого развития Беларуси — не было бы нижеБеларусь занимает первое место в СНГ по индексу развития с учетом гендерного факцёраНацыянальны статыстычны камітэт Рэспублікі БеларусьКанстытуцыя РБ. Артыкул 17Трансфармацыйныя задачы БеларусіВыйсце з крызісу — далейшае рэфармаванне Беларускі рубель — сусветны лідар па дэвальвацыяхПра змену коштаў у кастрычніку 2011 г.Бядней за беларусаў у СНД толькі таджыкіСярэдні заробак у верасні дасягнуў 2,26 мільёна рублёўЭканомікаГаласуем за ТОП-100 беларускай прозыСучасныя беларускія мастакіАрхитектура Беларуси BELARUS.BYА. Каханоўскі. Культура Беларусі ўсярэдзіне XVII—XVIII ст.Анталогія беларускай народнай песні, гуказапісы спеваўБеларускія Музычныя IнструментыБеларускі рок, які мы страцілі. Топ-10 гуртоў«Мясцовы час» — нязгаслая легенда беларускай рок-музыкіСЯРГЕЙ БУДКІН. МЫ НЯ ЗНАЕМ СВАЁЙ МУЗЫКІМ. А. Каладзінскі. НАРОДНЫ ТЭАТРМагнацкія культурныя цэнтрыПублічная дыскусія «Беларуская новая пьеса: без беларускай мовы ці беларуская?»Беларускія драматургі па-ранейшаму лепш ставяцца за мяжой, чым на радзіме«Працэс незалежнага кіно пайшоў, і дзяржаву турбуе яго непадкантрольнасць»Беларускія філосафы ў пошуках прасторыВсе идём в библиотекуАрхіваванаАб Нацыянальнай праграме даследавання і выкарыстання касмічнай прасторы ў мірных мэтах на 2008—2012 гадыУ космас — разам.У суседнім з Барысаўскім раёне пабудуюць Камандна-вымяральны пунктСвяты і абрады беларусаў«Мірныя бульбашы з малой краіны» — 5 непраўдзівых стэрэатыпаў пра БеларусьМ. Раманюк. Беларускае народнае адзеннеУ Беларусі скарачаецца колькасць злачынстваўЛукашэнка незадаволены мінскімі ўладамі Крадзяжы складаюць у Мінску каля 70% злачынстваў Узровень злачыннасці ў Мінскай вобласці — адзін з самых высокіх у краіне Генпракуратура аналізуе стан са злачыннасцю ў Беларусі па каэфіцыенце злачыннасці У Беларусі стабілізавалася крымінагеннае становішча, лічыць генпракурорЗамежнікі сталі здзяйсняць у Беларусі больш злачынстваўМУС Беларусі турбуе рост рэцыдыўнай злачыннасціЯ з ЖЭСа. Дазволіце вас абкрасці! Рэйтынг усіх службаў і падраздзяленняў ГУУС Мінгарвыканкама вырасАб КДБ РБГісторыя Аператыўна-аналітычнага цэнтра РБГісторыя ДКФРТаможняagentura.ruБеларусьBelarus.by — Афіцыйны сайт Рэспублікі БеларусьСайт урада БеларусіRadzima.org — Збор архітэктурных помнікаў, гісторыя Беларусі«Глобус Беларуси»Гербы и флаги БеларусиАсаблівасці каменнага веку на БеларусіА. Калечыц, У. Ксяндзоў. Старажытны каменны век (палеаліт). Першапачатковае засяленне тэрыторыіУ. Ксяндзоў. Сярэдні каменны век (мезаліт). Засяленне краю плямёнамі паляўнічых, рыбакоў і збіральнікаўА. Калечыц, М. Чарняўскі. Плямёны на тэрыторыі Беларусі ў новым каменным веку (неаліце)А. Калечыц, У. Ксяндзоў, М. Чарняўскі. Гаспадарчыя заняткі ў каменным векуЭ. Зайкоўскі. Духоўная культура ў каменным векуАсаблівасці бронзавага веку на БеларусіФарміраванне супольнасцей ранняга перыяду бронзавага векуФотографии БеларусиРоля беларускіх зямель ва ўтварэнні і ўмацаванні ВКЛВ. Фадзеева. З гісторыі развіцця беларускай народнай вышыўкіDMOZGran catalanaБольшая российскаяBritannica (анлайн)Швейцарскі гістарычны15325917611952699xDA123282154079143-90000 0001 2171 2080n9112870100577502ge128882171858027501086026362074122714179пппппп

ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 (SMOTE) The 2019 Stack Overflow Developer Survey Results Are InCan SMOTE be applied over sequence of words (sentences)?ValueError when doing validation with random forestsSMOTE and multi class oversamplingLogic behind SMOTE-NC?ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)SmoteBoost: Should SMOTE be ran individually for each iteration/tree in the boosting?solving multi-class imbalance classification using smote and OSSUsing SMOTE for Synthetic Data generation to improve performance on unbalanced dataproblem of entry format for a simple model in KerasSVM SMOTE fit_resample() function runs forever with no result