Is it feasible to use decision tree algorithms for sensor fault detection?Are decision tree algorithms linear or nonlinearHow does QUEST compare to other decision tree algorithms?Ordinal feature in decision treeclassification feature selectionUnderstanding decision tree conceptDecision tree orderingMulticollinearity in Decision TreeDisadvantage of decision treeUse prediction as feature for a decision treeClassify sensor data (multivariate time series) with Python's scikit-learn decision tree

How exactly does Hawking radiation decrease the mass of black holes?

Are there moral objections to a life motivated purely by money? How to sway a person from this lifestyle?

What is the term for a person whose job is to place products on shelves in stores?

"The cow" OR "a cow" OR "cows" in this context

Mistake in years of experience in resume?

How do I produce this symbol: Ϟ in pdfLaTeX?

What is this word supposed to be?

How to find if a column is referenced in a computed column?

Restricting the options of a lookup field, based on the value of another lookup field?

How to not starve gigantic beasts

My bank got bought out, am I now going to have to start filing tax returns in a different state?

Crossed out red box fitting tightly around image

Multiple fireplaces in an apartment building?

What makes accurate emulation of old systems a difficult task?

A faster way to compute the largest prime factor

Can a level 2 Warlock take one level in rogue, then continue advancing as a warlock?

Will I lose my paid in full property

Which big number is bigger?

What is the unit of time_lock_delta in LND?

How important is it that $TERM is correct?

Contradiction proof for inequality of P and NP?

Find the identical rows in a matrix

Is there any pythonic way to find average of specific tuple elements in array?

"Whatever a Russian does, they end up making the Kalashnikov gun"? Are there any similar proverbs in English?



Is it feasible to use decision tree algorithms for sensor fault detection?


Are decision tree algorithms linear or nonlinearHow does QUEST compare to other decision tree algorithms?Ordinal feature in decision treeclassification feature selectionUnderstanding decision tree conceptDecision tree orderingMulticollinearity in Decision TreeDisadvantage of decision treeUse prediction as feature for a decision treeClassify sensor data (multivariate time series) with Python's scikit-learn decision tree













3












$begingroup$


The gist is me wanting to separate system faults from sensor faults given some dataset from a wireless sensor network using a machine learning algorithm.



For instance, if I have some temperature sensors in a given area and their corresponding readings from every sort of time interval, I would like to know whether an abnormal value is due to an actual fault, or due to a faulty sensor. Of course, it would be a given that the training set would have such entries tagged with either sensor fault or system fault.



I have thought of just using something like linear regression but I would like it to work even if the system could not be modeled like that. Decision tree seemed to me like a more appropriate algorithm for this.



Lastly, there is also some consideration for the time it takes for training and classification as I wish to see if it can be used for systems which respond really quickly to such anomalies.



Sorry if it's a bit wordy but I wasn't sure how much information I should put since this is my first time posting (I'm not even sure if this is the right stack exchange to post this). Anyway, thanks in advance for the answers!










share|improve this question









$endgroup$




bumped to the homepage by Community 43 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.














  • $begingroup$
    Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
    $endgroup$
    – Sixiang.Hu
    Apr 29 '18 at 8:07










  • $begingroup$
    @Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
    $endgroup$
    – Aldazar
    Apr 29 '18 at 8:42










  • $begingroup$
    If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
    $endgroup$
    – Sixiang.Hu
    Apr 29 '18 at 8:51










  • $begingroup$
    But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
    $endgroup$
    – Aldazar
    Apr 29 '18 at 9:05










  • $begingroup$
    To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
    $endgroup$
    – davmor
    Aug 24 '18 at 3:20















3












$begingroup$


The gist is me wanting to separate system faults from sensor faults given some dataset from a wireless sensor network using a machine learning algorithm.



For instance, if I have some temperature sensors in a given area and their corresponding readings from every sort of time interval, I would like to know whether an abnormal value is due to an actual fault, or due to a faulty sensor. Of course, it would be a given that the training set would have such entries tagged with either sensor fault or system fault.



I have thought of just using something like linear regression but I would like it to work even if the system could not be modeled like that. Decision tree seemed to me like a more appropriate algorithm for this.



Lastly, there is also some consideration for the time it takes for training and classification as I wish to see if it can be used for systems which respond really quickly to such anomalies.



Sorry if it's a bit wordy but I wasn't sure how much information I should put since this is my first time posting (I'm not even sure if this is the right stack exchange to post this). Anyway, thanks in advance for the answers!










share|improve this question









$endgroup$




bumped to the homepage by Community 43 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.














  • $begingroup$
    Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
    $endgroup$
    – Sixiang.Hu
    Apr 29 '18 at 8:07










  • $begingroup$
    @Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
    $endgroup$
    – Aldazar
    Apr 29 '18 at 8:42










  • $begingroup$
    If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
    $endgroup$
    – Sixiang.Hu
    Apr 29 '18 at 8:51










  • $begingroup$
    But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
    $endgroup$
    – Aldazar
    Apr 29 '18 at 9:05










  • $begingroup$
    To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
    $endgroup$
    – davmor
    Aug 24 '18 at 3:20













3












3








3





$begingroup$


The gist is me wanting to separate system faults from sensor faults given some dataset from a wireless sensor network using a machine learning algorithm.



For instance, if I have some temperature sensors in a given area and their corresponding readings from every sort of time interval, I would like to know whether an abnormal value is due to an actual fault, or due to a faulty sensor. Of course, it would be a given that the training set would have such entries tagged with either sensor fault or system fault.



I have thought of just using something like linear regression but I would like it to work even if the system could not be modeled like that. Decision tree seemed to me like a more appropriate algorithm for this.



Lastly, there is also some consideration for the time it takes for training and classification as I wish to see if it can be used for systems which respond really quickly to such anomalies.



Sorry if it's a bit wordy but I wasn't sure how much information I should put since this is my first time posting (I'm not even sure if this is the right stack exchange to post this). Anyway, thanks in advance for the answers!










share|improve this question









$endgroup$




The gist is me wanting to separate system faults from sensor faults given some dataset from a wireless sensor network using a machine learning algorithm.



For instance, if I have some temperature sensors in a given area and their corresponding readings from every sort of time interval, I would like to know whether an abnormal value is due to an actual fault, or due to a faulty sensor. Of course, it would be a given that the training set would have such entries tagged with either sensor fault or system fault.



I have thought of just using something like linear regression but I would like it to work even if the system could not be modeled like that. Decision tree seemed to me like a more appropriate algorithm for this.



Lastly, there is also some consideration for the time it takes for training and classification as I wish to see if it can be used for systems which respond really quickly to such anomalies.



Sorry if it's a bit wordy but I wasn't sure how much information I should put since this is my first time posting (I'm not even sure if this is the right stack exchange to post this). Anyway, thanks in advance for the answers!







machine-learning decision-trees






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Apr 29 '18 at 7:42









AldazarAldazar

161




161





bumped to the homepage by Community 43 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 43 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.













  • $begingroup$
    Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
    $endgroup$
    – Sixiang.Hu
    Apr 29 '18 at 8:07










  • $begingroup$
    @Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
    $endgroup$
    – Aldazar
    Apr 29 '18 at 8:42










  • $begingroup$
    If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
    $endgroup$
    – Sixiang.Hu
    Apr 29 '18 at 8:51










  • $begingroup$
    But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
    $endgroup$
    – Aldazar
    Apr 29 '18 at 9:05










  • $begingroup$
    To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
    $endgroup$
    – davmor
    Aug 24 '18 at 3:20
















  • $begingroup$
    Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
    $endgroup$
    – Sixiang.Hu
    Apr 29 '18 at 8:07










  • $begingroup$
    @Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
    $endgroup$
    – Aldazar
    Apr 29 '18 at 8:42










  • $begingroup$
    If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
    $endgroup$
    – Sixiang.Hu
    Apr 29 '18 at 8:51










  • $begingroup$
    But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
    $endgroup$
    – Aldazar
    Apr 29 '18 at 9:05










  • $begingroup$
    To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
    $endgroup$
    – davmor
    Aug 24 '18 at 3:20















$begingroup$
Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:07




$begingroup$
Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:07












$begingroup$
@Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
$endgroup$
– Aldazar
Apr 29 '18 at 8:42




$begingroup$
@Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
$endgroup$
– Aldazar
Apr 29 '18 at 8:42












$begingroup$
If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:51




$begingroup$
If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:51












$begingroup$
But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
$endgroup$
– Aldazar
Apr 29 '18 at 9:05




$begingroup$
But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
$endgroup$
– Aldazar
Apr 29 '18 at 9:05












$begingroup$
To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
$endgroup$
– davmor
Aug 24 '18 at 3:20




$begingroup$
To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
$endgroup$
– davmor
Aug 24 '18 at 3:20










1 Answer
1






active

oldest

votes


















0












$begingroup$

You need to determine how to formulate your problem. I see it as having two aspects:
1. Detect an abnormal value (in temperature)
2. Determine whether abnormal value is due to sensor or system problem



The first is an anomaly detection problem and there is lots of literature on the topic, including tree-based methods on sensor data. One prominent approach is Isolation Forests, which is implemented in scikit-learn. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html



The second is fault localization. This is an area also well described in literature, and tree-based models were used even before machine learning became a thing.
A key here is to only train on anomalous data, then this should become a relatively simple classification problem assuming your have relevant features.






share|improve this answer









$endgroup$













    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "557"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f30997%2fis-it-feasible-to-use-decision-tree-algorithms-for-sensor-fault-detection%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    You need to determine how to formulate your problem. I see it as having two aspects:
    1. Detect an abnormal value (in temperature)
    2. Determine whether abnormal value is due to sensor or system problem



    The first is an anomaly detection problem and there is lots of literature on the topic, including tree-based methods on sensor data. One prominent approach is Isolation Forests, which is implemented in scikit-learn. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html



    The second is fault localization. This is an area also well described in literature, and tree-based models were used even before machine learning became a thing.
    A key here is to only train on anomalous data, then this should become a relatively simple classification problem assuming your have relevant features.






    share|improve this answer









    $endgroup$

















      0












      $begingroup$

      You need to determine how to formulate your problem. I see it as having two aspects:
      1. Detect an abnormal value (in temperature)
      2. Determine whether abnormal value is due to sensor or system problem



      The first is an anomaly detection problem and there is lots of literature on the topic, including tree-based methods on sensor data. One prominent approach is Isolation Forests, which is implemented in scikit-learn. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html



      The second is fault localization. This is an area also well described in literature, and tree-based models were used even before machine learning became a thing.
      A key here is to only train on anomalous data, then this should become a relatively simple classification problem assuming your have relevant features.






      share|improve this answer









      $endgroup$















        0












        0








        0





        $begingroup$

        You need to determine how to formulate your problem. I see it as having two aspects:
        1. Detect an abnormal value (in temperature)
        2. Determine whether abnormal value is due to sensor or system problem



        The first is an anomaly detection problem and there is lots of literature on the topic, including tree-based methods on sensor data. One prominent approach is Isolation Forests, which is implemented in scikit-learn. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html



        The second is fault localization. This is an area also well described in literature, and tree-based models were used even before machine learning became a thing.
        A key here is to only train on anomalous data, then this should become a relatively simple classification problem assuming your have relevant features.






        share|improve this answer









        $endgroup$



        You need to determine how to formulate your problem. I see it as having two aspects:
        1. Detect an abnormal value (in temperature)
        2. Determine whether abnormal value is due to sensor or system problem



        The first is an anomaly detection problem and there is lots of literature on the topic, including tree-based methods on sensor data. One prominent approach is Isolation Forests, which is implemented in scikit-learn. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html



        The second is fault localization. This is an area also well described in literature, and tree-based models were used even before machine learning became a thing.
        A key here is to only train on anomalous data, then this should become a relatively simple classification problem assuming your have relevant features.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 24 '18 at 13:42









        jonnorjonnor

        2826




        2826



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f30997%2fis-it-feasible-to-use-decision-tree-algorithms-for-sensor-fault-detection%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            ValueError: Error when checking input: expected conv2d_13_input to have shape (3, 150, 150) but got array with shape (150, 150, 3)2019 Community Moderator ElectionError when checking : expected dense_1_input to have shape (None, 5) but got array with shape (200, 1)Error 'Expected 2D array, got 1D array instead:'ValueError: Error when checking input: expected lstm_41_input to have 3 dimensions, but got array with shape (40000,100)ValueError: Error when checking target: expected dense_1 to have shape (7,) but got array with shape (1,)ValueError: Error when checking target: expected dense_2 to have shape (1,) but got array with shape (0,)Keras exception: ValueError: Error when checking input: expected conv2d_1_input to have shape (150, 150, 3) but got array with shape (256, 256, 3)Steps taking too long to completewhen checking input: expected dense_1_input to have shape (13328,) but got array with shape (317,)ValueError: Error when checking target: expected dense_3 to have shape (None, 1) but got array with shape (7715, 40000)Keras exception: Error when checking input: expected dense_input to have shape (2,) but got array with shape (1,)

            Ружовы пелікан Змест Знешні выгляд | Пашырэнне | Асаблівасці біялогіі | Літаратура | НавігацыяДагледжаная версіяправерана1 зменаДагледжаная версіяправерана1 змена/ 22697590 Сістэматыкана ВіківідахВыявына Вікісховішчы174693363011049382

            Illegal assignment from SObject to ContactFetching String, Id from Map - Illegal Assignment Id to Field / ObjectError: Compile Error: Illegal assignment from String to BooleanError: List has no rows for assignment to SObjectError on Test Class - System.QueryException: List has no rows for assignment to SObjectRemote action problemDML requires SObject or SObject list type error“Illegal assignment from List to List”Test Class Fail: Batch Class: System.QueryException: List has no rows for assignment to SObjectMapping to a user'List has no rows for assignment to SObject' Mystery