Bayesian network: Difference between revisions
No edit summary |
No edit summary |
||
(6 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
The '''Bayesian network''' is a network probabilistic, graphical model that describes dependencies. | The '''Bayesian network''' is a network probabilistic, graphical model that describes dependencies. | ||
= Properties = | = Properties = | ||
Line 16: | Line 10: | ||
Each edge identifies a causal relation, usually temporal: something that happened in the future cannot cause something to happen in the past. As such, the graph is ''acyclic''. | Each edge identifies a causal relation, usually temporal: something that happened in the future cannot cause something to happen in the past. As such, the graph is ''acyclic''. | ||
Each node is a conditional probability: Given that the parents happen, what is the probability of the node event happening. | |||
= Joint probability = | = Joint probability = | ||
Line 24: | Line 20: | ||
P(x_1, x_2,\ldots,x_n)=\prod_{i=1}^nP(x_i|Parents(x_i)) | P(x_1, x_2,\ldots,x_n)=\prod_{i=1}^nP(x_i|Parents(x_i)) | ||
</math> | </math> | ||
By relaxing some assumptions, [[Naive Bayes]] reduces the computational complexity. | |||
= Training = | |||
After the structure of a Bayesian network is defined, its parameters (the conditional probabilities) are estimated by training data. The model structure is then refined and the parameters updated again. | |||
= Data-driven learning = | |||
Instead of using domain knowledge, one can use learning algorithms to automatically learn the dependencies between the variables using data. | |||
An example algorithm is BIC. | |||
= Analysis = | |||
Bayesian networks are interpretable. | |||
However, they are computationally expensive on higher dimensional data. We need to compute a large number of combinations of feature input probabilities. This is also the motivation behind [[Naive Bayes]] | |||
= Application = | |||
Bayesian networks have applications in machine learning tasks that deal with dependent features. Bayes theorem can be used in classification problems. | |||
An example is spam detection. Each node represents the conditional probability of a word being part of a spam email. |
Latest revision as of 18:55, 24 May 2024
The Bayesian network is a network probabilistic, graphical model that describes dependencies.
Properties
DAG
Bayesian networks are directed, acyclic graphs.
Each edge identifies a causal relation, usually temporal: something that happened in the future cannot cause something to happen in the past. As such, the graph is acyclic.
Each node is a conditional probability: Given that the parents happen, what is the probability of the node event happening.
Joint probability
Bayesian networks are primarily used to calculate the joint probability of an event given its dependencies. This can be done with the following formula
By relaxing some assumptions, Naive Bayes reduces the computational complexity.
Training
After the structure of a Bayesian network is defined, its parameters (the conditional probabilities) are estimated by training data. The model structure is then refined and the parameters updated again.
Data-driven learning
Instead of using domain knowledge, one can use learning algorithms to automatically learn the dependencies between the variables using data.
An example algorithm is BIC.
Analysis
Bayesian networks are interpretable.
However, they are computationally expensive on higher dimensional data. We need to compute a large number of combinations of feature input probabilities. This is also the motivation behind Naive Bayes
Application
Bayesian networks have applications in machine learning tasks that deal with dependent features. Bayes theorem can be used in classification problems.
An example is spam detection. Each node represents the conditional probability of a word being part of a spam email.