Is it feasible to use decision tree algorithms for sensor fault detection?Are decision tree algorithms linear or nonlinearHow does QUEST compare to other decision tree algorithms?Ordinal feature in decision treeclassification feature selectionUnderstanding decision tree conceptDecision tree orderingMulticollinearity in Decision TreeDisadvantage of decision treeUse prediction as feature for a decision treeClassify sensor data (multivariate time series) with Python's scikit-learn decision tree

How exactly does Hawking radiation decrease the mass of black holes?

Are there moral objections to a life motivated purely by money? How to sway a person from this lifestyle?

What is the term for a person whose job is to place products on shelves in stores?

"The cow" OR "a cow" OR "cows" in this context

Mistake in years of experience in resume?

How do I produce this symbol: Ϟ in pdfLaTeX?

What is this word supposed to be?

How to find if a column is referenced in a computed column?

Restricting the options of a lookup field, based on the value of another lookup field?

How to not starve gigantic beasts

My bank got bought out, am I now going to have to start filing tax returns in a different state?

Crossed out red box fitting tightly around image

Multiple fireplaces in an apartment building?

What makes accurate emulation of old systems a difficult task?

A faster way to compute the largest prime factor

Can a level 2 Warlock take one level in rogue, then continue advancing as a warlock?

Will I lose my paid in full property

Which big number is bigger?

What is the unit of time_lock_delta in LND?

How important is it that $TERM is correct?

Contradiction proof for inequality of P and NP?

Find the identical rows in a matrix

Is there any pythonic way to find average of specific tuple elements in array?

"Whatever a Russian does, they end up making the Kalashnikov gun"? Are there any similar proverbs in English?

Is it feasible to use decision tree algorithms for sensor fault detection?

Are decision tree algorithms linear or nonlinearHow does QUEST compare to other decision tree algorithms?Ordinal feature in decision treeclassification feature selectionUnderstanding decision tree conceptDecision tree orderingMulticollinearity in Decision TreeDisadvantage of decision treeUse prediction as feature for a decision treeClassify sensor data (multivariate time series) with Python's scikit-learn decision tree

The gist is me wanting to separate system faults from sensor faults given some dataset from a wireless sensor network using a machine learning algorithm.

For instance, if I have some temperature sensors in a given area and their corresponding readings from every sort of time interval, I would like to know whether an abnormal value is due to an actual fault, or due to a faulty sensor. Of course, it would be a given that the training set would have such entries tagged with either sensor fault or system fault.

I have thought of just using something like linear regression but I would like it to work even if the system could not be modeled like that. Decision tree seemed to me like a more appropriate algorithm for this.

Lastly, there is also some consideration for the time it takes for training and classification as I wish to see if it can be used for systems which respond really quickly to such anomalies.

Sorry if it's a bit wordy but I wasn't sure how much information I should put since this is my first time posting (I'm not even sure if this is the right stack exchange to post this). Anyway, thanks in advance for the answers!

asked Apr 29 '18 at 7:42

Aldazar

161

bumped to the homepage by Community♦ 43 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:07

$begingroup$
@Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
$endgroup$
– Aldazar
Apr 29 '18 at 8:42

$begingroup$
If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:51

$begingroup$
But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
$endgroup$
– Aldazar
Apr 29 '18 at 9:05

$begingroup$
To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
$endgroup$
– davmor
Aug 24 '18 at 3:20

add a comment |

The gist is me wanting to separate system faults from sensor faults given some dataset from a wireless sensor network using a machine learning algorithm.

Lastly, there is also some consideration for the time it takes for training and classification as I wish to see if it can be used for systems which respond really quickly to such anomalies.

asked Apr 29 '18 at 7:42

Aldazar

161

bumped to the homepage by Community♦ 43 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:07

$begingroup$
@Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
$endgroup$
– Aldazar
Apr 29 '18 at 8:42

$begingroup$
If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:51

$begingroup$
But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
$endgroup$
– Aldazar
Apr 29 '18 at 9:05

$begingroup$
To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
$endgroup$
– davmor
Aug 24 '18 at 3:20

add a comment |

The gist is me wanting to separate system faults from sensor faults given some dataset from a wireless sensor network using a machine learning algorithm.

Lastly, there is also some consideration for the time it takes for training and classification as I wish to see if it can be used for systems which respond really quickly to such anomalies.

asked Apr 29 '18 at 7:42

Aldazar

161

The gist is me wanting to separate system faults from sensor faults given some dataset from a wireless sensor network using a machine learning algorithm.

Lastly, there is also some consideration for the time it takes for training and classification as I wish to see if it can be used for systems which respond really quickly to such anomalies.

machine-learning decision-trees

asked Apr 29 '18 at 7:42

Aldazar

161

asked Apr 29 '18 at 7:42

Aldazar

161

asked Apr 29 '18 at 7:42

Aldazar

161

asked Apr 29 '18 at 7:42

Aldazar

161

asked Apr 29 '18 at 7:42

Aldazar

161

bumped to the homepage by Community♦ 43 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 43 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

$begingroup$
Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:07

$begingroup$
@Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
$endgroup$
– Aldazar
Apr 29 '18 at 8:42

$begingroup$
If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:51

$begingroup$
But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
$endgroup$
– Aldazar
Apr 29 '18 at 9:05

$begingroup$
To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
$endgroup$
– davmor
Aug 24 '18 at 3:20

add a comment |

$begingroup$
Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:07

$begingroup$
@Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy
$endgroup$
– Aldazar
Apr 29 '18 at 8:42

$begingroup$
If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.
$endgroup$
– Sixiang.Hu
Apr 29 '18 at 8:51

$begingroup$
But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.
$endgroup$
– Aldazar
Apr 29 '18 at 9:05

$begingroup$
To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without
$endgroup$
– davmor
Aug 24 '18 at 3:20

Welcome! To bring more detail, how 's the size of training data? Any requirement of accuracy of prediction?

– Sixiang.Hu
Apr 29 '18 at 8:07

@Sixiang.Hu I would say on the order of a few thousands, at the most. However, since this is supposed to be a general case thing, in the absolute worst case, I may be able to simulate a dataset by generating my own data using some appropriate mathematical model and then introducing some randomness to simulate noise. For the accuracy, since I will be trying to find out whether this method is feasible, there isn't a hard requirement, though it will be great if it reaches 90% or above accuracy

– Aldazar
Apr 29 '18 at 8:42

If your data is generated (so as response) , the tree will come up with the logic you define the response. Hence why not just use the logic you already have? Because the tree will just model your logic anyway.

– Sixiang.Hu
Apr 29 '18 at 8:51

But I want to see how it will perform if the model is unknown, so if I do generate the training set myself using a model, I'll just feed part of it to the tree as training since I'd like to see how close the model the tree will come up with to my actual one. But again, this is for the worst case only where I can't acquire an appropriate dataset.

– Aldazar
Apr 29 '18 at 9:05

To concord with the problem definition answer, it would be helpful to know if this is a steady state or transient system. Steady state could mean measurements are independent by sensor over time. That case would be easy, look at statistical process control methods for failure detection. Since the first guess is often correct (most times things are as they appear), and linear regression leaped to mind (absolute differences), I would test the distribution of the absolute differences by sensor over time (scaled) to see if you get a normal distribution. If so, you may have an easier answer without

– davmor
Aug 24 '18 at 3:20

add a comment |

1 Answer
1

active

oldest

votes

You need to determine how to formulate your problem. I see it as having two aspects:
1. Detect an abnormal value (in temperature)
2. Determine whether abnormal value is due to sensor or system problem

The first is an anomaly detection problem and there is lots of literature on the topic, including tree-based methods on sensor data. One prominent approach is Isolation Forests, which is implemented in scikit-learn. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html

The second is fault localization. This is an area also well described in literature, and tree-based models were used even before machine learning became a thing.
A key here is to only train on anomalous data, then this should become a relatively simple classification problem assuming your have relevant features.

answered Jun 24 '18 at 13:42

jonnor

2826

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f30997%2fis-it-feasible-to-use-decision-tree-algorithms-for-sensor-fault-detection%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

answered Jun 24 '18 at 13:42

jonnor

2826

add a comment |

answered Jun 24 '18 at 13:42

jonnor

2826

add a comment |

answered Jun 24 '18 at 13:42

jonnor

2826

answered Jun 24 '18 at 13:42

jonnor

2826

answered Jun 24 '18 at 13:42

jonnor

2826

answered Jun 24 '18 at 13:42

jonnor

2826

answered Jun 24 '18 at 13:42

jonnor

2826

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Hfrxdjt

bumped to the homepage by Community♦ 43 mins ago

bumped to the homepage by Community♦ 43 mins ago

bumped to the homepage by Community♦ 43 mins ago

bumped to the homepage by Community♦ 43 mins ago

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

bumped to the homepage by Community♦ 43 mins ago

bumped to the homepage by Community♦ 43 mins ago

bumped to the homepage by Community♦ 43 mins ago

bumped to the homepage by Community♦ 43 mins ago

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

1 Answer
1

1 Answer
1

1 Answer
1