
<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://ricefriedegg.com:80/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Admin</id>
	<title>Rice Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="http://ricefriedegg.com:80/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Admin"/>
	<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php/Special:Contributions/Admin"/>
	<updated>2026-04-09T15:20:35Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Linear_first_order_ODE&amp;diff=498</id>
		<title>Linear first order ODE</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Linear_first_order_ODE&amp;diff=498"/>
		<updated>2024-04-08T21:15:25Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Differential Equations]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Linear first order [[Ordinary Differential Equation|ODE]]&#039;s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Homogeneous of g(t) = 0&lt;br /&gt;
&lt;br /&gt;
Constant coefficient if p(t) = a is a constant. Otherwise, variable&lt;br /&gt;
coefficient.&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
y&#039; + ty = 2 nonhomogeneous, var&lt;br /&gt;
y&#039; + 2y = 2 nonhomogeneous, constant&lt;br /&gt;
y&#039; + 2y = 0 homogeneous, constant&lt;br /&gt;
&lt;br /&gt;
Factor emthod 2.1&lt;br /&gt;
&lt;br /&gt;
y&#039; + p(t) y = g(t)&lt;br /&gt;
&lt;br /&gt;
Compute integrating factor&lt;br /&gt;
&lt;br /&gt;
mu(t) = exp(int p(t) dt)&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Ordinary_differential_equation&amp;diff=497</id>
		<title>Ordinary differential equation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Ordinary_differential_equation&amp;diff=497"/>
		<updated>2024-04-08T21:14:24Z</updated>

		<summary type="html">&lt;p&gt;Admin: Admin moved page Ordinary Differential Equations to Ordinary Differential Equation without leaving a redirect&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Differential Equations]]&lt;br /&gt;
&lt;br /&gt;
An &#039;&#039;&#039;ordinary differential equation (ODE)&#039;&#039;&#039; relates a function and its&lt;br /&gt;
derivatives. We usually use &amp;lt;math&amp;gt;y&amp;lt;/math&amp;gt; to denote the function and&lt;br /&gt;
&amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; to denote the variable.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Ordinary&#039;&#039; means that the equation has one variable, as opposed to&lt;br /&gt;
partial differential.&lt;br /&gt;
&lt;br /&gt;
There is &#039;&#039;no general solution&#039;&#039; to ODEs. We separate them by classes&lt;br /&gt;
and solve them individually.&lt;br /&gt;
&lt;br /&gt;
== Example ==&lt;br /&gt;
&lt;br /&gt;
An example of an ODE is the following&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
y&#039; = y&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The general solution of the above is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
y(t) = c e^t&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notably, the solution is &#039;&#039;homogeneous&#039;&#039;, meaning that &amp;lt;math&amp;gt;0&amp;lt;/math&amp;gt; is&lt;br /&gt;
a solution. This will probably be covered later.&lt;br /&gt;
&lt;br /&gt;
To get a unique solution, we need to apply additional conditions, such&lt;br /&gt;
as specifying a particular value&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{cases}&lt;br /&gt;
y&#039; = y \\&lt;br /&gt;
y(0) = y_0&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is called an &#039;&#039;initial value problem&#039;&#039;, in which a function is&lt;br /&gt;
generated from an initial value with another equation.&lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
Since the derivative can be described as the rate of change, and the&lt;br /&gt;
function itself is the state, ODEs arises as mathematical models of&lt;br /&gt;
systems whose &#039;&#039;rate of change depends on the state of the system&#039;&#039;. &lt;br /&gt;
&lt;br /&gt;
The following are brief descriptions of some applications of ODEs.&lt;br /&gt;
&lt;br /&gt;
# &#039;&#039;Radioactive decay&#039;&#039;, where the function is the (large) number of atoms.&lt;br /&gt;
#* Atoms decay at an average constant rate &amp;lt;math&amp;gt;r&amp;lt;/math&amp;gt;&lt;br /&gt;
#* &amp;lt;math&amp;gt;\frac{dN}{dt} = -rN&amp;lt;/math&amp;gt;&lt;br /&gt;
# &#039;&#039;Object falling under gravity&#039;&#039;, where the function is the velocity of the object&lt;br /&gt;
#* &amp;lt;math&amp;gt;\frac{dv}{dt} = g - \frac{\gamma v}{m}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Dimensions/Units ==&lt;br /&gt;
&lt;br /&gt;
The two sides of the equation must match in dimensions (aka. units).&lt;br /&gt;
&lt;br /&gt;
Consider radioactive decay.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{cases}&lt;br /&gt;
\frac{dN}{dt} = -rN \\&lt;br /&gt;
N(0) = N_0&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The solution comes to&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
N(t) = N_0 e^{-rt}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We use &#039;&#039;&#039;time constant&#039;&#039;&#039; &amp;lt;math&amp;gt;\tau&amp;lt;/math&amp;gt; to get a sense of how fast&lt;br /&gt;
it is decaying. Its units is time.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\tau = \frac{1}{r}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Equilibrium Solution ==&lt;br /&gt;
&lt;br /&gt;
Consider an object falling under gravity&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{cases}&lt;br /&gt;
\frac{dv}{dt} = g - \lambda v \\&lt;br /&gt;
v(0) = v_0&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We sometimes want the &#039;&#039;&#039;equilibrium solution&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
v(t) = v_*&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\frac{dv}{dt} = 0 = g - \lambda v_*&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Doing some math, we can eventually get&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
v(t) = v_* + (v_0 - v_*) e^{-\lambda t}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Classification =&lt;br /&gt;
&lt;br /&gt;
An ODE is &#039;&#039;&#039;linear&#039;&#039;&#039; if all terms are proportional to &amp;lt;math&amp;gt;y, y&#039;,&lt;br /&gt;
y&#039;&#039;. \ldots&amp;lt;/math&amp;gt; or are given functions of &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt;. This&lt;br /&gt;
distinction is especially useful since linear combination can be used to&lt;br /&gt;
construct solutions.&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;order&#039;&#039;&#039; of an ODE is the order of its highest derivative.&lt;br /&gt;
&lt;br /&gt;
In a &#039;&#039;&#039;scalar&#039;&#039;&#039;, there is only one unknown function &amp;lt;math&amp;gt;y(t)&amp;lt;/math&amp;gt;.&lt;br /&gt;
In a &#039;&#039;&#039;system&#039;&#039;&#039;, there are several, and you have to solve them&lt;br /&gt;
simultaneously.&lt;br /&gt;
&lt;br /&gt;
Here is a list of ODEs we study, from simple to complex:&lt;br /&gt;
* [[Linear First Order ODE]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=466</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=466"/>
		<updated>2024-04-02T22:09:39Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Arrays */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
&lt;br /&gt;
= Word =&lt;br /&gt;
Instead of one bit at a time, computers do operations on several at the same time. The &#039;&#039;&#039;word size&#039;&#039;&#039; of the computer is the number of bits representing each piece of data that it processes.&lt;br /&gt;
&lt;br /&gt;
Notably, since instructions are also &amp;quot;data&amp;quot; received by the CPU, each instruction is also constrained by the word size.&lt;br /&gt;
&lt;br /&gt;
Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).&lt;br /&gt;
&lt;br /&gt;
== Byte and Endianess ==&lt;br /&gt;
Most modern day computers are &#039;&#039;&#039;byte-addressable&#039;&#039;&#039;. The byte is the lowest unit of data that has an address and can be accessed at once.&lt;br /&gt;
&lt;br /&gt;
Since we are storing multiple bytes of a word across memory, the order in which we store them, &#039;&#039;&#039;endianess&#039;&#039;&#039;, needs to be specified.&lt;br /&gt;
&lt;br /&gt;
There are two choices: Big Endian and Little Endian. Let&#039;s consider how to store 0x10203040 in a 32-bit machine&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Big Endian&#039;&#039;&#039;, the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Little Endian&#039;&#039;&#039;, the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.&lt;br /&gt;
&lt;br /&gt;
Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.&lt;br /&gt;
&lt;br /&gt;
= Signed Integers =&lt;br /&gt;
While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.&lt;br /&gt;
&lt;br /&gt;
The first method to represent signed integers with bits is &#039;&#039;&#039;signed magnitude&#039;&#039;&#039;. Under this system, the first bit (&#039;&#039;sign bit&#039;&#039;) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.&lt;br /&gt;
&lt;br /&gt;
The second method is called &#039;&#039;&#039;2&#039;s complement&#039;&#039;&#039;, and it is more widely used due to several advantages covered later.&lt;br /&gt;
&lt;br /&gt;
In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2&#039;s complement.&lt;br /&gt;
&lt;br /&gt;
The advantage of using 2&#039;s complement over signed magnitude is twofold:&lt;br /&gt;
&lt;br /&gt;
* Normal binary arithmetic can be applied.&lt;br /&gt;
* There is no negative zero&lt;br /&gt;
&lt;br /&gt;
To negate signed numbers, flip all bits and add 1. This comes from math: &amp;lt;math&amp;gt;-2 = -4 + ((4 - 1) - 2) + 1&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Floating Point =&lt;br /&gt;
To store decimal values, the &#039;&#039;&#039;floating point&#039;&#039;&#039; representation is used. Similar to scientific notation, it uses a series of bits called the &#039;&#039;&#039;mantissa&#039;&#039;&#039; to store significant binary digits, and another series of bits called the &#039;&#039;&#039;exponent&#039;&#039;&#039; to store the order of magnitude of the number. The final calculation is something like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;1.m\times2^{e}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To handle negative numbers, the first bit is used as a sign bit.&lt;br /&gt;
&lt;br /&gt;
To handle negative exponents, all exponents are subtracted by a constant to center the range of exponents on 0. For example, when 8 bits is used to represent the exponent (as is usually the case), the range of possible values is 0 to 255, so 127 is subtracted from the exponent so that the actual range is -127 to 128.&lt;br /&gt;
&lt;br /&gt;
= Arrays =&lt;br /&gt;
The simplest structured group of values is an &#039;&#039;&#039;array&#039;&#039;&#039;, in which multiple values of the same type is stored consecutively in a block of memory. We then keep track of the address of the first element in the array and the size of each element so that we can find the n-th element easily.&lt;br /&gt;
&lt;br /&gt;
For example, consider an array of 10 32-bit integers &amp;lt;c&amp;gt;arr&amp;lt;/c&amp;gt;. To access the 6th element, I can simply move 5 integer sizes after the first element to get the address I want:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sh&amp;gt;&lt;br /&gt;
*(&amp;amp;arr + 5*sizeof(int))&lt;br /&gt;
&amp;lt;/sh&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notably, this is also a major reason why array indices start at 0 in most programming languages.&lt;br /&gt;
&lt;br /&gt;
= Struct =&lt;br /&gt;
A &#039;&#039;&#039;struct&#039;&#039;&#039; boils down to an array that can have different types of values, each taking up different sizes.&lt;br /&gt;
&lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=465</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=465"/>
		<updated>2024-04-02T22:06:59Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Arrays */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
&lt;br /&gt;
= Word =&lt;br /&gt;
Instead of one bit at a time, computers do operations on several at the same time. The &#039;&#039;&#039;word size&#039;&#039;&#039; of the computer is the number of bits representing each piece of data that it processes.&lt;br /&gt;
&lt;br /&gt;
Notably, since instructions are also &amp;quot;data&amp;quot; received by the CPU, each instruction is also constrained by the word size.&lt;br /&gt;
&lt;br /&gt;
Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).&lt;br /&gt;
&lt;br /&gt;
== Byte and Endianess ==&lt;br /&gt;
Most modern day computers are &#039;&#039;&#039;byte-addressable&#039;&#039;&#039;. The byte is the lowest unit of data that has an address and can be accessed at once.&lt;br /&gt;
&lt;br /&gt;
Since we are storing multiple bytes of a word across memory, the order in which we store them, &#039;&#039;&#039;endianess&#039;&#039;&#039;, needs to be specified.&lt;br /&gt;
&lt;br /&gt;
There are two choices: Big Endian and Little Endian. Let&#039;s consider how to store 0x10203040 in a 32-bit machine&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Big Endian&#039;&#039;&#039;, the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Little Endian&#039;&#039;&#039;, the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.&lt;br /&gt;
&lt;br /&gt;
Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.&lt;br /&gt;
&lt;br /&gt;
= Signed Integers =&lt;br /&gt;
While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.&lt;br /&gt;
&lt;br /&gt;
The first method to represent signed integers with bits is &#039;&#039;&#039;signed magnitude&#039;&#039;&#039;. Under this system, the first bit (&#039;&#039;sign bit&#039;&#039;) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.&lt;br /&gt;
&lt;br /&gt;
The second method is called &#039;&#039;&#039;2&#039;s complement&#039;&#039;&#039;, and it is more widely used due to several advantages covered later.&lt;br /&gt;
&lt;br /&gt;
In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2&#039;s complement.&lt;br /&gt;
&lt;br /&gt;
The advantage of using 2&#039;s complement over signed magnitude is twofold:&lt;br /&gt;
&lt;br /&gt;
* Normal binary arithmetic can be applied.&lt;br /&gt;
* There is no negative zero&lt;br /&gt;
&lt;br /&gt;
To negate signed numbers, flip all bits and add 1. This comes from math: &amp;lt;math&amp;gt;-2 = -4 + ((4 - 1) - 2) + 1&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Floating Point =&lt;br /&gt;
To store decimal values, the &#039;&#039;&#039;floating point&#039;&#039;&#039; representation is used. Similar to scientific notation, it uses a series of bits called the &#039;&#039;&#039;mantissa&#039;&#039;&#039; to store significant binary digits, and another series of bits called the &#039;&#039;&#039;exponent&#039;&#039;&#039; to store the order of magnitude of the number. The final calculation is something like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;1.m\times2^{e}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To handle negative numbers, the first bit is used as a sign bit.&lt;br /&gt;
&lt;br /&gt;
To handle negative exponents, all exponents are subtracted by a constant to center the range of exponents on 0. For example, when 8 bits is used to represent the exponent (as is usually the case), the range of possible values is 0 to 255, so 127 is subtracted from the exponent so that the actual range is -127 to 128.&lt;br /&gt;
&lt;br /&gt;
= Arrays =&lt;br /&gt;
The simplest structured group of values is an &#039;&#039;&#039;array&#039;&#039;&#039;, in which multiple values of the same type is stored consecutively in a block of memory. We then keep track of the address of the first element in the array and the size of each element so that we can find the n-th element easily.&lt;br /&gt;
&lt;br /&gt;
For example, consider an array of 10 32-bit integers &amp;lt;c&amp;gt;arr&amp;lt;/c&amp;gt;. To access the 6th element, I can simply move 5 integer sizes after the first element to get the address I want:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;sh&amp;gt;&lt;br /&gt;
*(&amp;amp;arr + 5*sizeof(int))&lt;br /&gt;
&amp;lt;/sh&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notably, this is also a major reason why array indices start at 0.&lt;br /&gt;
&lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=464</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=464"/>
		<updated>2024-04-02T21:59:27Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Floating Point */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
&lt;br /&gt;
= Word =&lt;br /&gt;
Instead of one bit at a time, computers do operations on several at the same time. The &#039;&#039;&#039;word size&#039;&#039;&#039; of the computer is the number of bits representing each piece of data that it processes.&lt;br /&gt;
&lt;br /&gt;
Notably, since instructions are also &amp;quot;data&amp;quot; received by the CPU, each instruction is also constrained by the word size.&lt;br /&gt;
&lt;br /&gt;
Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).&lt;br /&gt;
&lt;br /&gt;
== Byte and Endianess ==&lt;br /&gt;
Most modern day computers are &#039;&#039;&#039;byte-addressable&#039;&#039;&#039;. The byte is the lowest unit of data that has an address and can be accessed at once.&lt;br /&gt;
&lt;br /&gt;
Since we are storing multiple bytes of a word across memory, the order in which we store them, &#039;&#039;&#039;endianess&#039;&#039;&#039;, needs to be specified.&lt;br /&gt;
&lt;br /&gt;
There are two choices: Big Endian and Little Endian. Let&#039;s consider how to store 0x10203040 in a 32-bit machine&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Big Endian&#039;&#039;&#039;, the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Little Endian&#039;&#039;&#039;, the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.&lt;br /&gt;
&lt;br /&gt;
Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.&lt;br /&gt;
&lt;br /&gt;
= Signed Integers =&lt;br /&gt;
While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.&lt;br /&gt;
&lt;br /&gt;
The first method to represent signed integers with bits is &#039;&#039;&#039;signed magnitude&#039;&#039;&#039;. Under this system, the first bit (&#039;&#039;sign bit&#039;&#039;) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.&lt;br /&gt;
&lt;br /&gt;
The second method is called &#039;&#039;&#039;2&#039;s complement&#039;&#039;&#039;, and it is more widely used due to several advantages covered later.&lt;br /&gt;
&lt;br /&gt;
In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2&#039;s complement.&lt;br /&gt;
&lt;br /&gt;
The advantage of using 2&#039;s complement over signed magnitude is twofold:&lt;br /&gt;
&lt;br /&gt;
* Normal binary arithmetic can be applied.&lt;br /&gt;
* There is no negative zero&lt;br /&gt;
&lt;br /&gt;
To negate signed numbers, flip all bits and add 1. This comes from math: &amp;lt;math&amp;gt;-2 = -4 + ((4 - 1) - 2) + 1&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Floating Point =&lt;br /&gt;
To store decimal values, the &#039;&#039;&#039;floating point&#039;&#039;&#039; representation is used. Similar to scientific notation, it uses a series of bits called the &#039;&#039;&#039;mantissa&#039;&#039;&#039; to store significant binary digits, and another series of bits called the &#039;&#039;&#039;exponent&#039;&#039;&#039; to store the order of magnitude of the number. The final calculation is something like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;1.m\times2^{e}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To handle negative numbers, the first bit is used as a sign bit.&lt;br /&gt;
&lt;br /&gt;
To handle negative exponents, all exponents are subtracted by a constant to center the range of exponents on 0. For example, when 8 bits is used to represent the exponent (as is usually the case), the range of possible values is 0 to 255, so 127 is subtracted from the exponent so that the actual range is -127 to 128.&lt;br /&gt;
&lt;br /&gt;
= Arrays =&lt;br /&gt;
The simplest structured group of values is an &#039;&#039;&#039;array&#039;&#039;&#039;, in which multiple values of the same type is stored consecutively in a block of memory. We then keep track of the address of the first element in the array and the size of each element so that we can find the n-th element easily.&lt;br /&gt;
&lt;br /&gt;
For example, consider an array of 10 32-bit integers &#039;&#039;arr&#039;&#039;. To access the 6th element, I take the address &lt;br /&gt;
&lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=463</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=463"/>
		<updated>2024-04-02T21:45:34Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Arrays */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
&lt;br /&gt;
= Word =&lt;br /&gt;
Instead of one bit at a time, computers do operations on several at the same time. The &#039;&#039;&#039;word size&#039;&#039;&#039; of the computer is the number of bits representing each piece of data that it processes.&lt;br /&gt;
&lt;br /&gt;
Notably, since instructions are also &amp;quot;data&amp;quot; received by the CPU, each instruction is also constrained by the word size.&lt;br /&gt;
&lt;br /&gt;
Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).&lt;br /&gt;
&lt;br /&gt;
== Byte and Endianess ==&lt;br /&gt;
Most modern day computers are &#039;&#039;&#039;byte-addressable&#039;&#039;&#039;. The byte is the lowest unit of data that has an address and can be accessed at once.&lt;br /&gt;
&lt;br /&gt;
Since we are storing multiple bytes of a word across memory, the order in which we store them, &#039;&#039;&#039;endianess&#039;&#039;&#039;, needs to be specified.&lt;br /&gt;
&lt;br /&gt;
There are two choices: Big Endian and Little Endian. Let&#039;s consider how to store 0x10203040 in a 32-bit machine&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Big Endian&#039;&#039;&#039;, the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Little Endian&#039;&#039;&#039;, the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.&lt;br /&gt;
&lt;br /&gt;
Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.&lt;br /&gt;
&lt;br /&gt;
= Signed Integers =&lt;br /&gt;
While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.&lt;br /&gt;
&lt;br /&gt;
The first method to represent signed integers with bits is &#039;&#039;&#039;signed magnitude&#039;&#039;&#039;. Under this system, the first bit (&#039;&#039;sign bit&#039;&#039;) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.&lt;br /&gt;
&lt;br /&gt;
The second method is called &#039;&#039;&#039;2&#039;s complement&#039;&#039;&#039;, and it is more widely used due to several advantages covered later.&lt;br /&gt;
&lt;br /&gt;
In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2&#039;s complement.&lt;br /&gt;
&lt;br /&gt;
The advantage of using 2&#039;s complement over signed magnitude is twofold:&lt;br /&gt;
&lt;br /&gt;
* Normal binary arithmetic can be applied.&lt;br /&gt;
* There is no negative zero&lt;br /&gt;
&lt;br /&gt;
To negate signed numbers, flip all bits and add 1. This comes from math: &amp;lt;math&amp;gt;-2 = -4 + ((4 - 1) - 2) + 1&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Floating Point =&lt;br /&gt;
To store decimal values, the &#039;&#039;&#039;floating point&#039;&#039;&#039; representation is used. Similar to scientific notation, it uses a series of bits called the &#039;&#039;&#039;mantissa&#039;&#039;&#039; to store significant binary digits, and another series of bits called the &#039;&#039;&#039;exponent&#039;&#039;&#039; to store the order of magnitude of the number. The final calculation is something like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;1.m\times2^{e}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To handle negative numbers, the first bit is used as a sign bit.&lt;br /&gt;
&lt;br /&gt;
To handle negative exponents, all exponents are subtracted by a constant to center the range of exponents on 0. For example, when 8 bits is used to represent the exponent (as is usually the case), the range of possible values is 0 to 255, so 127 is subtracted from the exponent so that the actual range is -127 to 128.&lt;br /&gt;
&lt;br /&gt;
= Arrays =&lt;br /&gt;
The simplest structured group of values is an &#039;&#039;&#039;array&#039;&#039;&#039;, in which multiple values of the same type is stored consecutively in a block of memory. We then keep track of the address of the first element in the array and the size of each element so that we can find the n-th element easily.&lt;br /&gt;
&lt;br /&gt;
For example, consider an array of 10 32-bit integers &#039;&#039;arr&#039;&#039;. To access the 6th element, I take the address &lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=462</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=462"/>
		<updated>2024-04-02T21:38:34Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Byte and Endianess */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
&lt;br /&gt;
= Word =&lt;br /&gt;
Instead of one bit at a time, computers do operations on several at the same time. The &#039;&#039;&#039;word size&#039;&#039;&#039; of the computer is the number of bits representing each piece of data that it processes.&lt;br /&gt;
&lt;br /&gt;
Notably, since instructions are also &amp;quot;data&amp;quot; received by the CPU, each instruction is also constrained by the word size.&lt;br /&gt;
&lt;br /&gt;
Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).&lt;br /&gt;
&lt;br /&gt;
== Byte and Endianess ==&lt;br /&gt;
Most modern day computers are &#039;&#039;&#039;byte-addressable&#039;&#039;&#039;. The byte is the lowest unit of data that has an address and can be accessed at once.&lt;br /&gt;
&lt;br /&gt;
Since we are storing multiple bytes of a word across memory, the order in which we store them, &#039;&#039;&#039;endianess&#039;&#039;&#039;, needs to be specified.&lt;br /&gt;
&lt;br /&gt;
There are two choices: Big Endian and Little Endian. Let&#039;s consider how to store 0x10203040 in a 32-bit machine&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Big Endian&#039;&#039;&#039;, the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Little Endian&#039;&#039;&#039;, the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.&lt;br /&gt;
&lt;br /&gt;
Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.&lt;br /&gt;
&lt;br /&gt;
= Signed Integers =&lt;br /&gt;
While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.&lt;br /&gt;
&lt;br /&gt;
The first method to represent signed integers with bits is &#039;&#039;&#039;signed magnitude&#039;&#039;&#039;. Under this system, the first bit (&#039;&#039;sign bit&#039;&#039;) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.&lt;br /&gt;
&lt;br /&gt;
The second method is called &#039;&#039;&#039;2&#039;s complement&#039;&#039;&#039;, and it is more widely used due to several advantages covered later.&lt;br /&gt;
&lt;br /&gt;
In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2&#039;s complement.&lt;br /&gt;
&lt;br /&gt;
The advantage of using 2&#039;s complement over signed magnitude is twofold:&lt;br /&gt;
&lt;br /&gt;
* Normal binary arithmetic can be applied.&lt;br /&gt;
* There is no negative zero&lt;br /&gt;
&lt;br /&gt;
To negate signed numbers, flip all bits and add 1. This comes from math: &amp;lt;math&amp;gt;-2 = -4 + ((4 - 1) - 2) + 1&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Floating Point =&lt;br /&gt;
To store decimal values, the &#039;&#039;&#039;floating point&#039;&#039;&#039; representation is used. Similar to scientific notation, it uses a series of bits called the &#039;&#039;&#039;mantissa&#039;&#039;&#039; to store significant binary digits, and another series of bits called the &#039;&#039;&#039;exponent&#039;&#039;&#039; to store the order of magnitude of the number. The final calculation is something like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;1.m\times2^{e}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To handle negative numbers, the first bit is used as a sign bit.&lt;br /&gt;
&lt;br /&gt;
To handle negative exponents, all exponents are subtracted by a constant to center the range of exponents on 0. For example, when 8 bits is used to represent the exponent (as is usually the case), the range of possible values is 0 to 255, so 127 is subtracted from the exponent so that the actual range is -127 to 128.&lt;br /&gt;
&lt;br /&gt;
= Arrays =&lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=461</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=461"/>
		<updated>2024-04-02T21:32:02Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Signed Integers */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
&lt;br /&gt;
= Word =&lt;br /&gt;
Instead of one bit at a time, computers do operations on several at the same time. The &#039;&#039;&#039;word size&#039;&#039;&#039; of the computer is the number of bits representing each piece of data that it processes.&lt;br /&gt;
&lt;br /&gt;
Notably, since instructions are also &amp;quot;data&amp;quot; received by the CPU, each instruction is also constrained by the word size.&lt;br /&gt;
&lt;br /&gt;
Word size is determined by the CPU. For example, a 32-bit CPU has a word size of 32 bits (or 4 bytes).&lt;br /&gt;
&lt;br /&gt;
== Byte and Endianess ==&lt;br /&gt;
Most modern day computers are &#039;&#039;&#039;byte-addressable&#039;&#039;&#039;. The byte is the lowest unit of data that has an address and can be accessed at once.&lt;br /&gt;
&lt;br /&gt;
Since we are storing multiple bytes of a word across memory, the order in which we store them, &#039;&#039;&#039;endianess&#039;&#039;&#039;, needs to be specified.&lt;br /&gt;
&lt;br /&gt;
There are two choices: Big Endian and Little Endian. Let&#039;s consider how to store 0x10203040 in a 32-bit machine&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Big Endian&#039;&#039;&#039;, the most significant byte is stored at the lowest part of an address (i.e. big end first). Addresses would look something like 0x10, 0x20, 0x30, 0x40. BE is used on the internet.&lt;br /&gt;
&lt;br /&gt;
In &#039;&#039;&#039;Little Endian&#039;&#039;&#039;, the least significant byte is stored at the lowest part of an address (i.e. little end first). Addresses would look something like 0x40, 0x30, 0x20, 0x10. LE is used on intel machines.&lt;br /&gt;
&lt;br /&gt;
Besides different CPUs using different endianess to run instructions, most file formats specify endianess to support different machines. For example, a Unicode text file has a BOM (byte order mark) at the start to denote whether the file is BE or LE.&lt;br /&gt;
&lt;br /&gt;
= Signed Integers =&lt;br /&gt;
While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.&lt;br /&gt;
&lt;br /&gt;
The first method to represent signed integers with bits is &#039;&#039;&#039;signed magnitude&#039;&#039;&#039;. Under this system, the first bit (&#039;&#039;sign bit&#039;&#039;) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.&lt;br /&gt;
&lt;br /&gt;
The second method is called &#039;&#039;&#039;2&#039;s complement&#039;&#039;&#039;, and it is more widely used due to several advantages covered later.&lt;br /&gt;
&lt;br /&gt;
In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2&#039;s complement.&lt;br /&gt;
&lt;br /&gt;
The advantage of using 2&#039;s complement over signed magnitude is twofold:&lt;br /&gt;
&lt;br /&gt;
* Normal binary arithmetic can be applied.&lt;br /&gt;
* There is no negative zero&lt;br /&gt;
&lt;br /&gt;
To negate signed numbers, flip all bits and add 1. This comes from math: &amp;lt;math&amp;gt;-2 = -4 + ((4 - 1) - 2) + 1&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Arrays =&lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=460</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=460"/>
		<updated>2024-04-02T20:33:06Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
&lt;br /&gt;
= Signed Integers =&lt;br /&gt;
While a series of bits naturally represent a binary number, negative numbers are a bit more complicated.&lt;br /&gt;
&lt;br /&gt;
The first method to represent signed integers with bits is &#039;&#039;&#039;signed magnitude&#039;&#039;&#039;. Under this system, the first bit (&#039;&#039;sign bit&#039;&#039;) is used as a negative sign; If it is 0, then the number is positive, otherwise it is negative.&lt;br /&gt;
&lt;br /&gt;
The second method is called &#039;&#039;&#039;2&#039;s complement&#039;&#039;&#039;, and it is more widely used due to several advantages covered later.&lt;br /&gt;
&lt;br /&gt;
In this system, think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2&#039;s complement.&lt;br /&gt;
&lt;br /&gt;
The advantage of using 2&#039;s complement over signed magnitude is twofold:&lt;br /&gt;
&lt;br /&gt;
* Normal binary arithmetic can be applied.&lt;br /&gt;
* There is no negative zero&lt;br /&gt;
&lt;br /&gt;
To negate signed numbers, flip all bits and add 1. This comes from math: &amp;lt;math&amp;gt;-2 = -4 + ((4 - 1) - 2) + 1&amp;lt;/math&amp;gt;&lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=459</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=459"/>
		<updated>2024-04-02T20:23:24Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2&#039;s complement&#039;&#039;&#039; is a method of representing integers in computer science.&lt;br /&gt;
&lt;br /&gt;
To calculate the value of a number represented by 2&#039;s complement, just think of the first bit in the bit pattern as negative. For example, 101 would be 4 + 1 = 5 as an unsigned number, but -4 + 1 = -3 as a signed number represented by 2&#039;s complement.&lt;br /&gt;
&lt;br /&gt;
The advantage of using 2&#039;s complement over signed magnitude is twofold:&lt;br /&gt;
&lt;br /&gt;
* Easier arithmetic operations&lt;br /&gt;
* Greater range of values (since there is no -0)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* To &#039;&#039;&#039;negate&#039;&#039;&#039; signed numbers, flip all bits and add 1. This comes from math: &amp;lt;math&amp;gt;-2 = -4 + ((4 - 1) - 2) + 1&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Category:Computer_Architecture&amp;diff=458</id>
		<title>Category:Computer Architecture</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Category:Computer_Architecture&amp;diff=458"/>
		<updated>2024-04-02T20:22:38Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created page with &amp;quot;Category:Computer Science&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Computer Science]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=457</id>
		<title>Information Representation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Information_Representation&amp;diff=457"/>
		<updated>2024-04-02T20:22:24Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created page with &amp;quot;Inside a computer, everything is bits. The topic of &amp;#039;&amp;#039;&amp;#039;information representation&amp;#039;&amp;#039;&amp;#039; discusses how different information inside a computer program actually looks like in the background. Category:Computer Architecture&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Inside a computer, everything is bits. The topic of &#039;&#039;&#039;information representation&#039;&#039;&#039; discusses how different information inside a computer program actually looks like in the background.&lt;br /&gt;
[[Category:Computer Architecture]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Bitwise_Operation&amp;diff=456</id>
		<title>Bitwise Operation</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Bitwise_Operation&amp;diff=456"/>
		<updated>2024-04-02T20:12:01Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created page with &amp;quot;A &amp;#039;&amp;#039;&amp;#039;bitwise operation&amp;#039;&amp;#039;&amp;#039; is an operation that is done on a series of bits. There are many bitwise operators, most of them being pretty self explanatory. I&amp;#039;ll note down anything that seems interesting to me.  &amp;#039;&amp;#039;Exclusive or (XOR)&amp;#039;&amp;#039; operator can be used to flip singular bits, since against 1, XOR flips a bit, whereas against 0, XOR does nothing. Category:Computer Science&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A &#039;&#039;&#039;bitwise operation&#039;&#039;&#039; is an operation that is done on a series of bits. There are many bitwise operators, most of them being pretty self explanatory. I&#039;ll note down anything that seems interesting to me.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Exclusive or (XOR)&#039;&#039; operator can be used to flip singular bits, since against 1, XOR flips a bit, whereas against 0, XOR does nothing.&lt;br /&gt;
[[Category:Computer Science]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=437</id>
		<title>Topic: Godot</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=437"/>
		<updated>2024-03-26T20:20:37Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Godot&#039;&#039;&#039; is a FOSS game engine. This is the main page; subpages will detail my notes. At the time of writing, Godot is on version 4.2. Most information comes from the [https://docs.godotengine.org/en/stable/getting_started/introduction/key_concepts_overview.html official documentation].&lt;br /&gt;
&lt;br /&gt;
{{Special:PrefixIndex/Godot/}}&lt;br /&gt;
[[Category:Contents]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Imperative_Programming&amp;diff=436</id>
		<title>Imperative Programming</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Imperative_Programming&amp;diff=436"/>
		<updated>2024-03-23T21:37:19Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Programming Paradigm]]&lt;br /&gt;
&#039;&#039;&#039;Imperative programming&#039;&#039;&#039; is a [[Programming Paradigm|programming paradigm]] that uses statements to change a program&#039;s state. Like a recipe, it focuses on describing how a program operates step by step. This includes [[Procedural Programming|procedural programming]] and [[Object-Oriented Programming|object-oriented programming]]. It is the &#039;&#039;original&#039;&#039; form of programming in that at the most basic level, computers are machines that take in one instruction at a time and output results.&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Imperative_Programming&amp;diff=435</id>
		<title>Imperative Programming</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Imperative_Programming&amp;diff=435"/>
		<updated>2024-03-23T21:34:58Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Programming Paradigm]]&lt;br /&gt;
&#039;&#039;&#039;Imperative programming&#039;&#039;&#039; is a [[Programming Paradigm|programming paradigm]] that uses statements to change a program&#039;s state. Like a recipe, it focuses on describing how a program operates step by step. This includes [[Procedural Programming|procedural programming]] and [[Object-Oriented Programming|object-oriented programming]].&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Category:Programming_Paradigm&amp;diff=434</id>
		<title>Category:Programming Paradigm</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Category:Programming_Paradigm&amp;diff=434"/>
		<updated>2024-03-23T21:24:11Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Computer Science]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Imperative_Programming&amp;diff=433</id>
		<title>Imperative Programming</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Imperative_Programming&amp;diff=433"/>
		<updated>2024-03-23T21:23:54Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created page with &amp;quot;Category:Programming Paradigm&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Programming Paradigm]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Programming_Paradigm&amp;diff=432</id>
		<title>Programming Paradigm</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Programming_Paradigm&amp;diff=432"/>
		<updated>2024-03-23T21:22:07Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;This page comes primarily from self-research&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;&#039;programming paradigm&#039;&#039;&#039; is a relatively-highly abstracted model to organize/structure the implementation of a computer program.&lt;br /&gt;
&lt;br /&gt;
At the most basic level of a computer program, we have simple instructions running. As a program get more complex, practices and abstractions such as functions or objects provide a way to organize a program such that it is easier to maintain and collaborate on. We group these practices and abstractions into paradigms to study and analyze them.&lt;br /&gt;
&lt;br /&gt;
* Each paradigm has strengths and weaknesses&lt;br /&gt;
* Tools (notably programming languages) can often be classified as supporting one or more paradigms&lt;br /&gt;
[[Category:Programming Paradigm]]&lt;br /&gt;
[[Category:Computer Science]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Category:Programming_Paradigm&amp;diff=431</id>
		<title>Category:Programming Paradigm</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Category:Programming_Paradigm&amp;diff=431"/>
		<updated>2024-03-23T21:21:29Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created blank page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Programming_Paradigm&amp;diff=430</id>
		<title>Programming Paradigm</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Programming_Paradigm&amp;diff=430"/>
		<updated>2024-03-23T21:21:17Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;This page comes primarily from self-research&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;&#039;programming paradigm&#039;&#039;&#039; is a relatively-highly abstracted model to organize/structure the implementation of a computer program.&lt;br /&gt;
&lt;br /&gt;
At the most basic level of a computer program, we have simple instructions running. As a program get more complex, practices and abstractions such as functions or objects provide a way to organize a program such that it is easier to maintain and collaborate on. We group these practices and abstractions into paradigms to study and analyze them.&lt;br /&gt;
&lt;br /&gt;
* Each paradigm has strengths and weaknesses&lt;br /&gt;
* Tools (notably programming languages) can often be classified as supporting one or more paradigms&lt;br /&gt;
[[Category:Programming Paradigm]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Programming_Paradigm&amp;diff=429</id>
		<title>Programming Paradigm</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Programming_Paradigm&amp;diff=429"/>
		<updated>2024-03-23T21:20:32Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created page with &amp;quot;&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;This page comes primarily from self-research  A &amp;#039;&amp;#039;&amp;#039;programming paradigm&amp;#039;&amp;#039;&amp;#039; is a relatively-highly abstracted model to organize/structure the implementation of a computer program.  At the most basic level of a computer program, we have simple instructions running. As a program get more complex, practices and abstractions such as functions or objects provide a way to organize a program such that it is easier to maintain and collaborate on. We group these...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;nowiki&amp;gt;*&amp;lt;/nowiki&amp;gt;This page comes primarily from self-research&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;&#039;programming paradigm&#039;&#039;&#039; is a relatively-highly abstracted model to organize/structure the implementation of a computer program.&lt;br /&gt;
&lt;br /&gt;
At the most basic level of a computer program, we have simple instructions running. As a program get more complex, practices and abstractions such as functions or objects provide a way to organize a program such that it is easier to maintain and collaborate on. We group these practices and abstractions into paradigms to study and analyze them.&lt;br /&gt;
&lt;br /&gt;
* Each paradigm has strengths and weaknesses&lt;br /&gt;
* Tools (notably programming languages) can often be classified as supporting one or more paradigms&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot/Key_Concepts&amp;diff=428</id>
		<title>Topic: Godot/Key Concepts</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot/Key_Concepts&amp;diff=428"/>
		<updated>2024-03-23T21:05:27Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;In Godot, a game is a &#039;&#039;&#039;tree&#039;&#039;&#039; of &#039;&#039;&#039;nodes&#039;&#039;&#039; grouped in to &#039;&#039;&#039;scenes&#039;&#039;&#039; and communicate with each other using &#039;&#039;&#039;signals&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;&#039;node&#039;&#039;&#039; is the most basic element.&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;&#039;scene&#039;&#039;&#039; is a reusable element of the game; anything from an item to a map. They are made from groups of nodes and other scenes.&lt;br /&gt;
&lt;br /&gt;
All of a game&#039;s scenes create a &#039;&#039;&#039;scene tree&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Nodes emit a &#039;&#039;&#039;signal&#039;&#039;&#039; when some event happens. This is the primary way for nodes and scenes to communicate.&lt;br /&gt;
&lt;br /&gt;
= Node =&lt;br /&gt;
A &#039;&#039;&#039;node&#039;&#039;&#039; is the most basic element of a Godot game. All nodes consists of the following characteristics:&lt;br /&gt;
&lt;br /&gt;
* Has a name&lt;br /&gt;
* Has editable properties&lt;br /&gt;
* Receives callbacks to update every frame&lt;br /&gt;
* Extendable with new properties and functions&lt;br /&gt;
* Can contain other nodes&lt;br /&gt;
&lt;br /&gt;
= Sources =&lt;br /&gt;
&lt;br /&gt;
* https://docs.godotengine.org/en/stable/getting_started/step_by_step/nodes_and_scenes.html&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=427</id>
		<title>Topic: Godot</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=427"/>
		<updated>2024-03-23T20:50:40Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Godot&#039;&#039;&#039; is a FOSS game engine. This is the main page; subpages will detail my notes. At the time of writing, Godot is on version 4.2. Most information comes from the [https://docs.godotengine.org/en/stable/getting_started/introduction/key_concepts_overview.html official documentation].&lt;br /&gt;
&lt;br /&gt;
{{Special:PrefixIndex/Godot/}}&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=426</id>
		<title>Topic: Godot</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=426"/>
		<updated>2024-03-23T20:48:54Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Godot&#039;&#039;&#039; is a FOSS game engine. This is the main page; subpages will detail my notes.&lt;br /&gt;
&lt;br /&gt;
{{Special:PrefixIndex/Godot/}}&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot/Key_Concepts&amp;diff=425</id>
		<title>Topic: Godot/Key Concepts</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot/Key_Concepts&amp;diff=425"/>
		<updated>2024-03-23T20:48:16Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created page with &amp;quot;Test&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Test&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=424</id>
		<title>Topic: Godot</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=424"/>
		<updated>2024-03-23T20:47:43Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Godot&#039;&#039;&#039; is a FOSS game engine. This is the main page; subpages will detail my notes.&lt;br /&gt;
&lt;br /&gt;
{{Special:PrefixIndex/Help:Godot/}}&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=423</id>
		<title>Topic: Godot</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Topic:_Godot&amp;diff=423"/>
		<updated>2024-03-23T20:43:23Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created page with &amp;quot;&amp;#039;&amp;#039;&amp;#039;Godot&amp;#039;&amp;#039;&amp;#039; is a FOSS game engine. This is the main page; subpages will detail my notes.&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Godot&#039;&#039;&#039; is a FOSS game engine. This is the main page; subpages will detail my notes.&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Bivariate&amp;diff=402</id>
		<title>Bivariate</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Bivariate&amp;diff=402"/>
		<updated>2024-03-19T19:02:51Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Regression Effect */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Distribution (Statistics)]][[Category:Statistics]]&lt;br /&gt;
&#039;&#039;&#039;Bivariate&#039;&#039;&#039; data consider two variables instead of the usual one; each value of one of the variables is paired with a value of the other variable. We will be using &amp;lt;math&amp;gt;X, Y&amp;lt;/math&amp;gt; to denote the two random variables throughout this page.&lt;br /&gt;
&lt;br /&gt;
= Summary Statistics =&lt;br /&gt;
To summarize bivariate data, we use covariance and correlation in addition to the statistics detailed in [[Summary Statistics]].&lt;br /&gt;
&lt;br /&gt;
== Covariance ==&lt;br /&gt;
The &#039;&#039;&#039;covariance&#039;&#039;&#039; measures the total variation of two RVs and their centers. It indicates the relationship of two variables whenever one changes, measuring how much the two vary together.&lt;br /&gt;
&lt;br /&gt;
We have &#039;&#039;sample covariance&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;s^2_{X, Y} = \hat{cov}(X, Y) = \frac{1}{n - 1} \sum(x_i - \bar{x}) (y_i - \bar{y}) = \frac{1}{n - 1} \left( \sum x_i y_i - n \bar{x} \bar{y} \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A good way of thinking about covariance is by cases:&lt;br /&gt;
&lt;br /&gt;
If x &#039;&#039;increases&#039;&#039; as y &#039;&#039;increases&#039;&#039;, the signs of both terms of the covariance calculation is the same. Therefore, covariance is &#039;&#039;positive&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
If x &#039;&#039;decreases&#039;&#039; as y &#039;&#039;increases&#039;&#039;, the signs are different. Therefore, covariance is &#039;&#039;negative&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
If x does not clearly vary with y, the signs are sometimes different, sometimes the same. Overall, it should cancel out to &#039;&#039;zero.&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Correlation ==&lt;br /&gt;
The &#039;&#039;&#039;correlation&#039;&#039;&#039; of two random variables measures the &#039;&#039;&#039;line&#039;&#039;&#039;&lt;br /&gt;
dependent&#039;&#039;&#039; between &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Cor(X, Y) = \rho = \frac{Cov(X,Y)}{sd(X) sd(Y)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Correlation is always between -1 and 1. When r = 1, the relationship between X and Y is &#039;&#039;&#039;perfect positive linear&#039;&#039;&#039;. When r = -1, it is &#039;&#039;&#039;perfect negative linear&#039;&#039;&#039;. If it is 0, there is no linear relationship. This doesn&#039;t mean that there is no relationship. Notably, any symmetric scatter plot has a correlation of 0.&lt;br /&gt;
&lt;br /&gt;
= Bivariate Normal =&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;bivariate normal&#039;&#039;&#039; (aka. bivariate gaussian) is one special type&lt;br /&gt;
of continuous random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;(X, Y)&amp;lt;/math&amp;gt; is &#039;&#039;bivariate normal&#039;&#039; if&lt;br /&gt;
&lt;br /&gt;
# The marginal PDF of both X and Y are normal&lt;br /&gt;
# For any &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt;, the condition PDF of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;X = x&amp;lt;/math&amp;gt; is Normal&lt;br /&gt;
** Works the other way around: Bivariate gaussian means that condition is satisfied&lt;br /&gt;
&lt;br /&gt;
== Predicting Y given X ==&lt;br /&gt;
&lt;br /&gt;
Given bivariate normal, we can predict one variable given another.&lt;br /&gt;
Let us try estimating the expected Y given X is x&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(Y| X = x)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are three main methods&lt;br /&gt;
* Scatter plot approximation&lt;br /&gt;
* Joint PDF&lt;br /&gt;
* 5 statistics&lt;br /&gt;
&lt;br /&gt;
=== 5 Parameters ===&lt;br /&gt;
&lt;br /&gt;
We need to know 5 parameters about &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;E(X), sd(X), E(Y), sd(Y), \rho&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If &amp;lt;math&amp;gt;X, Y&amp;lt;/math&amp;gt; follows bivariate normal distribution, then we&lt;br /&gt;
have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\left( \frac{E(Y|X = x) - E(Y)}{sd(Y)} \right) = \rho \left( \frac{x -&lt;br /&gt;
E(X)}{sd(X)} \right)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The left side is the &#039;&#039;predicted Z-score for Y&#039;&#039;, and the right side is&lt;br /&gt;
&#039;&#039;the product of correlation and Z-score of X = x&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The variance is given by&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(Y | X = x) = (1 - \rho^2) Var(Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Due to the range of &amp;lt;math&amp;gt;\rho&amp;lt;/math&amp;gt;, the variance of Y given X is&lt;br /&gt;
always smaller than the actual variance. The standard deviation is just&lt;br /&gt;
rooted that.&lt;br /&gt;
&lt;br /&gt;
== Regression Effect ==&lt;br /&gt;
The &#039;&#039;&#039;regression effect&#039;&#039;&#039; is the phenomenon that the best prediction&lt;br /&gt;
of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;X = x&amp;lt;/math&amp;gt; is less rare for&lt;br /&gt;
&amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; than &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt;; Future predictions regress to&lt;br /&gt;
mediocrity.&lt;br /&gt;
&lt;br /&gt;
If the sample is random (or at least somewhat random), it is unlikely that a subject that has an unlikely score in &#039;&#039;x&#039;&#039; got another unlikely score in &#039;&#039;y&#039;&#039;. Therefore, the expectation of &#039;&#039;Y&#039;&#039; given &#039;&#039;X&#039;&#039; is close to the mean. It&#039;s barely elaborated on in class, go look at [[wikipedia:Regression_toward_the_mean|Wikipedia]]&lt;br /&gt;
&lt;br /&gt;
When you plot all the predicted &amp;lt;math&amp;gt;E(Y|X = x)&amp;lt;/math&amp;gt;, you get the&lt;br /&gt;
&#039;&#039;&#039;linear regression line&#039;&#039;&#039;. The regression effect can be demonstrated&lt;br /&gt;
by also plotting the SD line (where the correlation is not applied).&lt;br /&gt;
&lt;br /&gt;
= Linear Regression =&lt;br /&gt;
&lt;br /&gt;
== Assumption ==&lt;br /&gt;
&lt;br /&gt;
# X and Y have a linear relationship&lt;br /&gt;
# A random sample of pairs was taken&lt;br /&gt;
# All pairs of data are independent&lt;br /&gt;
# The variance of the error is constant. &amp;lt;math&amp;gt;Var(\epsilon) = \sigma_\epsilon^2&amp;lt;/math&amp;gt;&lt;br /&gt;
# The average of the errors is zero. &amp;lt;math&amp;gt;E(\epsilon) = 0&amp;lt;/math&amp;gt;&lt;br /&gt;
# The errors are normally distributed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\varepsilon \sim^{iid} N(0, \sigma_\epsilon^2), Y_i \sim^{iid} N(\beta_0&lt;br /&gt;
+ \beta_1 x_i, \sigma_\epsilon^2)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Procedure ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
y_i = \beta_0 + \beta_1 x_i + \epsilon_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the &amp;lt;math&amp;gt;\beta_0, \beta_1&amp;lt;/math&amp;gt; are &#039;&#039;&#039;regression&lt;br /&gt;
coefficients&#039;&#039;&#039; (slope, intercept) based on the population, and&lt;br /&gt;
&amp;lt;math&amp;gt;\epsilon_i&amp;lt;/math&amp;gt; is error for the i-th subject.&lt;br /&gt;
&lt;br /&gt;
We want to estimate the regression coefficients.&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;\hat{y_i}&amp;lt;/math&amp;gt; be an estimation of &amp;lt;math&amp;gt;y_i&amp;lt;/math&amp;gt;; a&lt;br /&gt;
prediction at &amp;lt;math&amp;gt;X = x&amp;lt;/math&amp;gt;, with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\hat{y_i} = \hat{\beta_0} + \hat{\beta_1} x_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can measure the vertical error &amp;lt;math&amp;gt;e_i = y_i - \hat{y_i}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The overall error is the sum of squared errors &amp;lt;math&amp;gt;SSE = \sum_i^n&lt;br /&gt;
e_i^2&amp;lt;/math&amp;gt;. The best fit line is the line minimizing SSE.&lt;br /&gt;
&lt;br /&gt;
Using calculus, we can find that the line has the following scope and&lt;br /&gt;
intercept:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\hat{\beta_1} = r \frac{s_y}{s_x}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;r&amp;lt;/math&amp;gt; is the strength of linear relationship, and&lt;br /&gt;
&amp;lt;math&amp;gt;s_x, s_y&amp;lt;/math&amp;gt; is the deviations of the sample. They are&lt;br /&gt;
basically the sample versions of &amp;lt;math&amp;gt;\rho, \sigma&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\hat{\beta_0} = \bar{Y} - \hat{\beta_1} \bar{X}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interpretation ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\beta_1&amp;lt;/math&amp;gt; (the slope) is the estimated change in&lt;br /&gt;
&amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; when &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; changes by one unit.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\beta_0&amp;lt;/math&amp;gt; (the intercept) is the estimated average of&lt;br /&gt;
&amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; when &amp;lt;math&amp;gt;X = 0&amp;lt;/math&amp;gt;. If &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; cannot be 0,&lt;br /&gt;
this may not have a practical meaning.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;r^2&amp;lt;/math&amp;gt; (&#039;&#039;&#039;coefficient of determination&#039;&#039;&#039;) measures how good&lt;br /&gt;
the line fits the data.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
r^2 = \frac{\sum (\hat{y_i} - \bar{Y})^2 }{\sum (y_i - \bar{Y})^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The bottom is total variance. The top is reduced. The value is the&lt;br /&gt;
proportion of variance in &amp;lt;math&amp;gt;y&amp;lt;/math&amp;gt; that is explained by the linear&lt;br /&gt;
relationship between &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;.&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=401</id>
		<title>Sampling Distribution</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=401"/>
		<updated>2024-03-19T17:07:29Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* T-Distribution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Let there be &amp;lt;math&amp;gt;Y_1, Y_2, \ldots, Y_n &amp;lt;/math&amp;gt;, where each&lt;br /&gt;
&amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is a randomv variable from the population.&lt;br /&gt;
&lt;br /&gt;
Every Y have the same mean and distribution that we don&#039;t know.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(Y_i) = \mu, Var(Y_i) = \sigma^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We then have the sample mean&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\bar{Y} = \frac{1}{n} \sum_{i = 1}^n Y_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The sample mean is expected to be &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; through a pretty easy&lt;br /&gt;
direct proof&lt;br /&gt;
&lt;br /&gt;
The variance of the sample mean is &amp;lt;math&amp;gt;\frac{\sigma^2}{n}&amp;lt;/math&amp;gt;, also&lt;br /&gt;
through a pretty easy direct proof.&lt;br /&gt;
&lt;br /&gt;
= Central limit theorem =&lt;br /&gt;
The &#039;&#039;&#039;central limit theorem&#039;&#039;&#039; states that the distribution of the sample mean follows normal distribution.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As long as the following two conditions are satisfied, CLT applies, regardless of the population&#039;s distribution.&lt;br /&gt;
&lt;br /&gt;
# The population distribution of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is normal, &#039;&#039;or&#039;&#039;&lt;br /&gt;
# The sample size for each &amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is large &amp;lt;math&amp;gt;n&amp;gt;30&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By extension, we also have the distribution of the sum.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;S \sim N(\mu_S = n\mu, \sigma_S = \sqrt{n \sigma})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;S = \sum Y_i &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Proportion Approximation ==&lt;br /&gt;
The sampling distribution of the proportion of success of &#039;&#039;n&#039;&#039; bernoulli random variables (&amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt;) can also be approximated to a normal distribution under the CLT.&lt;br /&gt;
&lt;br /&gt;
Consider [[Discrete Random Variable#Bernoulli|bernoulli]] random variables&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Y_1 \ldots Y_n \sim E(Y_i) = p, Var(Y_i) = p(1 - p)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The proportion of success &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is the sum over the count, so the expected probability is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;E(\hat{p}) = E \left(\frac{1}{n} \sum Y_i \right) = p&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the variance of &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Var \left(\frac{1}{n} \sum Y_i \right) = \frac{1}{n^2} Var \left(\sum Y_i \right)  = \frac{p (1 - p)}{n}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With a large sample size &#039;&#039;n&#039;&#039;, we can appoximate this to a normal distribution. Notably, the criteria for a large &#039;&#039;n&#039;&#039; is different from that of the continuous random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;np &amp;gt; 5, n(1 - p) &amp;gt; 5&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\hat{p} \sim N \left( p, \frac{p(1-p)}{n} \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reasoning behind the weird criteria relates to the [[Discrete Random Variable|binomial distribution]]. It&#039;s not very elaborated on in the lecture, but &amp;lt;math&amp;gt;np&amp;lt;/math&amp;gt; is the mean of the binomial (i.e. the expected number of successes). The criteria essentially makes sure that no negative values are plausible in the approximation; with a small mean and a large variance, the left side of a normal approximation goes into the negative, but bernoulli/binomial must always be positive.&lt;br /&gt;
&lt;br /&gt;
== Binomial Approximation ==&lt;br /&gt;
I&#039;m short on time. This is based on the above section and a bit of math.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Y \sim N(np, np(1 - p))&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Confidence Interval =&lt;br /&gt;
&#039;&#039;&#039;Estimation&#039;&#039;&#039; is the guess for the unknown parameter. A &#039;&#039;&#039;point estimate&#039;&#039;&#039; is a &amp;quot;best guess&amp;quot; of the population parameter, where as the &#039;&#039;&#039;confidence interval&#039;&#039;&#039; is the range of reasonable values that are intended to contain the &#039;&#039;&#039;parameter of interest&#039;&#039;&#039; with a certain &#039;&#039;&#039;degree of confidence&#039;&#039;&#039;, calculated with&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;(point estimate - margin of error, point estimate + margin of error)&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Standard Error ==&lt;br /&gt;
The &#039;&#039;&#039;standard error&#039;&#039;&#039; measures how much error we expect to make when estimating &amp;lt;math&amp;gt;\mu_Y&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;\bar{y}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Constructing CIs ==&lt;br /&gt;
By CLT, &amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n} )&amp;lt;/math&amp;gt;. The&lt;br /&gt;
confidence interval is the range of plausible &amp;lt;math&amp;gt;\bar{Y}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
If we define the middle 90% to be plausible, to find the&lt;br /&gt;
confidence interval, simply find the 5th and 95th percentile.&lt;br /&gt;
&lt;br /&gt;
Generalized, if we want a confidence interval of the middle &amp;lt;math&amp;gt;(1 - \alpha) 100%&amp;lt;/math&amp;gt;, have a confidence interval of&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\bar{y}&amp;lt;/math&amp;gt; is the sample mean and &amp;lt;math&amp;gt;Z_{x}&amp;lt;/math&amp;gt; is the z score of the x-th percentile.&lt;br /&gt;
&lt;br /&gt;
= T-Distribution =&lt;br /&gt;
[[File:T distribution table.png|thumb|T distribution table]]&lt;br /&gt;
&lt;br /&gt;
CLT is based on the population variance. Since we don&#039;t know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;, we&lt;br /&gt;
have to use the sample variance &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt; to estimate it. This&lt;br /&gt;
introduces more uncertainty, accounted for by the &#039;&#039;&#039;t-distribution.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
T-distribution is the distribution of sample mean based on population&lt;br /&gt;
mean, sample variance and &#039;&#039;degrees of freedom&#039;&#039; (covered later). It&lt;br /&gt;
looks very similar to normal distribution.&lt;br /&gt;
&lt;br /&gt;
When the sample size &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; is small, there is greater&lt;br /&gt;
uncertainty in the estimates. T-di&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
t_{\alpha/2} &amp;gt; Z_{\alpha/2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The spread of t-distribution depends on the &#039;&#039;&#039;degrees of freedom&#039;&#039;&#039;,&lt;br /&gt;
which is based on sample size. When looking up the table, round down df.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = n - 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As the sample size increases, degrees of freedom increase, the spread of&lt;br /&gt;
t-distribution decreases, and t-distribution approaches normal&lt;br /&gt;
distribution.&lt;br /&gt;
&lt;br /&gt;
Based on CLT and normal distribution, we had the confidence interval&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, based on T-distribution, we have the CI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm t_{\alpha / 2} \frac{s}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Find Sample Size ====&lt;br /&gt;
To calculate sample size needed depending on desired&lt;br /&gt;
error margin and sample variance by assuming that &amp;lt;math&amp;gt;\upsilon = \infty&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
n = \frac{Z^2_{\alpha/2} s^2}{E^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We want to always &#039;&#039;round up&#039;&#039; to stay within the error margin.&lt;br /&gt;
&lt;br /&gt;
I don&#039;t really know why.&lt;br /&gt;
&lt;br /&gt;
= Sampling Distribution of Difference =&lt;br /&gt;
&lt;br /&gt;
By linear combination of RVs, sampling distribution of &amp;lt;math&amp;gt;\bar{Y_1} - \bar{Y_2}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 ( \bar{Y_1} - \bar{Y_2} ) \sim N(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2})&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, we do not know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;. If the CLT assumptions hold, then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2 / n_1)^2 }{n_1 - 1} + \frac{(s_2^2 / n_2)^2 }{n_2 - 1} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Trust me bro. Remember to round down to use t-table. With this degree of freedom, we can use sample variance to estimate the distribution.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Sample Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=400</id>
		<title>Sampling Distribution</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=400"/>
		<updated>2024-03-19T17:04:32Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Confidence Interval */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Let there be &amp;lt;math&amp;gt;Y_1, Y_2, \ldots, Y_n &amp;lt;/math&amp;gt;, where each&lt;br /&gt;
&amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is a randomv variable from the population.&lt;br /&gt;
&lt;br /&gt;
Every Y have the same mean and distribution that we don&#039;t know.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(Y_i) = \mu, Var(Y_i) = \sigma^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We then have the sample mean&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\bar{Y} = \frac{1}{n} \sum_{i = 1}^n Y_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The sample mean is expected to be &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; through a pretty easy&lt;br /&gt;
direct proof&lt;br /&gt;
&lt;br /&gt;
The variance of the sample mean is &amp;lt;math&amp;gt;\frac{\sigma^2}{n}&amp;lt;/math&amp;gt;, also&lt;br /&gt;
through a pretty easy direct proof.&lt;br /&gt;
&lt;br /&gt;
= Central limit theorem =&lt;br /&gt;
The &#039;&#039;&#039;central limit theorem&#039;&#039;&#039; states that the distribution of the sample mean follows normal distribution.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As long as the following two conditions are satisfied, CLT applies, regardless of the population&#039;s distribution.&lt;br /&gt;
&lt;br /&gt;
# The population distribution of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is normal, &#039;&#039;or&#039;&#039;&lt;br /&gt;
# The sample size for each &amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is large &amp;lt;math&amp;gt;n&amp;gt;30&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By extension, we also have the distribution of the sum.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;S \sim N(\mu_S = n\mu, \sigma_S = \sqrt{n \sigma})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;S = \sum Y_i &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Proportion Approximation ==&lt;br /&gt;
The sampling distribution of the proportion of success of &#039;&#039;n&#039;&#039; bernoulli random variables (&amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt;) can also be approximated to a normal distribution under the CLT.&lt;br /&gt;
&lt;br /&gt;
Consider [[Discrete Random Variable#Bernoulli|bernoulli]] random variables&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Y_1 \ldots Y_n \sim E(Y_i) = p, Var(Y_i) = p(1 - p)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The proportion of success &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is the sum over the count, so the expected probability is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;E(\hat{p}) = E \left(\frac{1}{n} \sum Y_i \right) = p&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the variance of &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Var \left(\frac{1}{n} \sum Y_i \right) = \frac{1}{n^2} Var \left(\sum Y_i \right)  = \frac{p (1 - p)}{n}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With a large sample size &#039;&#039;n&#039;&#039;, we can appoximate this to a normal distribution. Notably, the criteria for a large &#039;&#039;n&#039;&#039; is different from that of the continuous random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;np &amp;gt; 5, n(1 - p) &amp;gt; 5&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\hat{p} \sim N \left( p, \frac{p(1-p)}{n} \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reasoning behind the weird criteria relates to the [[Discrete Random Variable|binomial distribution]]. It&#039;s not very elaborated on in the lecture, but &amp;lt;math&amp;gt;np&amp;lt;/math&amp;gt; is the mean of the binomial (i.e. the expected number of successes). The criteria essentially makes sure that no negative values are plausible in the approximation; with a small mean and a large variance, the left side of a normal approximation goes into the negative, but bernoulli/binomial must always be positive.&lt;br /&gt;
&lt;br /&gt;
== Binomial Approximation ==&lt;br /&gt;
I&#039;m short on time. This is based on the above section and a bit of math.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Y \sim N(np, np(1 - p))&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Confidence Interval =&lt;br /&gt;
&#039;&#039;&#039;Estimation&#039;&#039;&#039; is the guess for the unknown parameter. A &#039;&#039;&#039;point estimate&#039;&#039;&#039; is a &amp;quot;best guess&amp;quot; of the population parameter, where as the &#039;&#039;&#039;confidence interval&#039;&#039;&#039; is the range of reasonable values that are intended to contain the &#039;&#039;&#039;parameter of interest&#039;&#039;&#039; with a certain &#039;&#039;&#039;degree of confidence&#039;&#039;&#039;, calculated with&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;(point estimate - margin of error, point estimate + margin of error)&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Standard Error ==&lt;br /&gt;
The &#039;&#039;&#039;standard error&#039;&#039;&#039; measures how much error we expect to make when estimating &amp;lt;math&amp;gt;\mu_Y&amp;lt;/math&amp;gt; by &amp;lt;math&amp;gt;\bar{y}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Constructing CIs ==&lt;br /&gt;
By CLT, &amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n} )&amp;lt;/math&amp;gt;. The&lt;br /&gt;
confidence interval is the range of plausible &amp;lt;math&amp;gt;\bar{Y}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
If we define the middle 90% to be plausible, to find the&lt;br /&gt;
confidence interval, simply find the 5th and 95th percentile.&lt;br /&gt;
&lt;br /&gt;
Generalized, if we want a confidence interval of the middle &amp;lt;math&amp;gt;(1 - \alpha) 100%&amp;lt;/math&amp;gt;, have a confidence interval of&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\bar{y}&amp;lt;/math&amp;gt; is the sample mean and &amp;lt;math&amp;gt;Z_{x}&amp;lt;/math&amp;gt; is the z score of the x-th percentile.&lt;br /&gt;
&lt;br /&gt;
= T-Distribution =&lt;br /&gt;
[[File:T distribution table.png|thumb|T distribution table]]&lt;br /&gt;
&lt;br /&gt;
CLT has several restrictions, the biggest one being a large sample size.&lt;br /&gt;
&#039;&#039;&#039;T-&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Since we don&#039;t know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;, we&lt;br /&gt;
have to use the sample variance &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt; to estimate it. This&lt;br /&gt;
introduces more uncertainty, accounted for by the &#039;&#039;&#039;t-distribution.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
T-distribution is the distribution of sample mean based on population&lt;br /&gt;
mean, sample variance and &#039;&#039;degrees of freedom&#039;&#039; (covered later). It&lt;br /&gt;
looks very similar to normal distribution.&lt;br /&gt;
&lt;br /&gt;
When the sample size &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; is small, there is greater&lt;br /&gt;
uncertainty in the estimates. T-di&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
t_{\alpha/2} &amp;gt; Z_{\alpha/2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The spread of t-distribution depends on the &#039;&#039;&#039;degrees of freedom&#039;&#039;&#039;,&lt;br /&gt;
which is based on sample size. When looking up the table, round down df.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = n - 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As the sample size increases, degrees of freedom increase, the spread of&lt;br /&gt;
t-distribution decreases, and t-distribution approaches normal&lt;br /&gt;
distribution.&lt;br /&gt;
&lt;br /&gt;
Based on CLT and normal distribution, we had the confidence interval&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, based on T-distribution, we have the CI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm t_{\alpha / 2} \frac{s}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Find Sample Size ====&lt;br /&gt;
To calculate sample size needed depending on desired&lt;br /&gt;
error margin and sample variance by assuming that &amp;lt;math&amp;gt;\upsilon = \infty&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
n = \frac{Z^2_{\alpha/2} s^2}{E^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We want to always &#039;&#039;round up&#039;&#039; to stay within the error margin.&lt;br /&gt;
&lt;br /&gt;
I don&#039;t really know why.&lt;br /&gt;
&lt;br /&gt;
= Sampling Distribution of Difference =&lt;br /&gt;
&lt;br /&gt;
By linear combination of RVs, sampling distribution of &amp;lt;math&amp;gt;\bar{Y_1} - \bar{Y_2}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 ( \bar{Y_1} - \bar{Y_2} ) \sim N(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2})&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, we do not know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;. If the CLT assumptions hold, then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2 / n_1)^2 }{n_1 - 1} + \frac{(s_2^2 / n_2)^2 }{n_2 - 1} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Trust me bro. Remember to round down to use t-table. With this degree of freedom, we can use sample variance to estimate the distribution.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Sample Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=399</id>
		<title>Sampling Distribution</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=399"/>
		<updated>2024-03-19T16:59:02Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Binomial Normal Approximation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Let there be &amp;lt;math&amp;gt;Y_1, Y_2, \ldots, Y_n &amp;lt;/math&amp;gt;, where each&lt;br /&gt;
&amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is a randomv variable from the population.&lt;br /&gt;
&lt;br /&gt;
Every Y have the same mean and distribution that we don&#039;t know.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(Y_i) = \mu, Var(Y_i) = \sigma^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We then have the sample mean&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\bar{Y} = \frac{1}{n} \sum_{i = 1}^n Y_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The sample mean is expected to be &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; through a pretty easy&lt;br /&gt;
direct proof&lt;br /&gt;
&lt;br /&gt;
The variance of the sample mean is &amp;lt;math&amp;gt;\frac{\sigma^2}{n}&amp;lt;/math&amp;gt;, also&lt;br /&gt;
through a pretty easy direct proof.&lt;br /&gt;
&lt;br /&gt;
= Central limit theorem =&lt;br /&gt;
The &#039;&#039;&#039;central limit theorem&#039;&#039;&#039; states that the distribution of the sample mean follows normal distribution.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As long as the following two conditions are satisfied, CLT applies, regardless of the population&#039;s distribution.&lt;br /&gt;
&lt;br /&gt;
# The population distribution of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is normal, &#039;&#039;or&#039;&#039;&lt;br /&gt;
# The sample size for each &amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is large &amp;lt;math&amp;gt;n&amp;gt;30&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By extension, we also have the distribution of the sum.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;S \sim N(\mu_S = n\mu, \sigma_S = \sqrt{n \sigma})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;S = \sum Y_i &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Proportion Approximation ==&lt;br /&gt;
The sampling distribution of the proportion of success of &#039;&#039;n&#039;&#039; bernoulli random variables (&amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt;) can also be approximated to a normal distribution under the CLT.&lt;br /&gt;
&lt;br /&gt;
Consider [[Discrete Random Variable#Bernoulli|bernoulli]] random variables&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Y_1 \ldots Y_n \sim E(Y_i) = p, Var(Y_i) = p(1 - p)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The proportion of success &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is the sum over the count, so the expected probability is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;E(\hat{p}) = E \left(\frac{1}{n} \sum Y_i \right) = p&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the variance of &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Var \left(\frac{1}{n} \sum Y_i \right) = \frac{1}{n^2} Var \left(\sum Y_i \right)  = \frac{p (1 - p)}{n}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With a large sample size &#039;&#039;n&#039;&#039;, we can appoximate this to a normal distribution. Notably, the criteria for a large &#039;&#039;n&#039;&#039; is different from that of the continuous random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;np &amp;gt; 5, n(1 - p) &amp;gt; 5&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\hat{p} \sim N \left( p, \frac{p(1-p)}{n} \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reasoning behind the weird criteria relates to the [[Discrete Random Variable|binomial distribution]]. It&#039;s not very elaborated on in the lecture, but &amp;lt;math&amp;gt;np&amp;lt;/math&amp;gt; is the mean of the binomial (i.e. the expected number of successes). The criteria essentially makes sure that no negative values are plausible in the approximation; with a small mean and a large variance, the left side of a normal approximation goes into the negative, but bernoulli/binomial must always be positive.&lt;br /&gt;
&lt;br /&gt;
== Binomial Approximation ==&lt;br /&gt;
I&#039;m short on time. This is based on the above section and a bit of math.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Y \sim N(np, np(1 - p))&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Confidence Interval =&lt;br /&gt;
&#039;&#039;&#039;Estimation&#039;&#039;&#039; is the guess for the unknown parameter. A &#039;&#039;&#039;point estimate&#039;&#039;&#039; is a &amp;quot;best guess&amp;quot; of the population parameter, where as the &#039;&#039;&#039;confidence interval&#039;&#039;&#039; is the range of reasonable values that are intended to contain the &#039;&#039;&#039;parameter of interest&#039;&#039;&#039; with a certain &#039;&#039;&#039;degree of confidence&#039;&#039;&#039;, calculated with&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;(point estimate - margin of error, point estimate + margin of error)&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== Constructing CIs ====&lt;br /&gt;
By CLT, &amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n} )&amp;lt;/math&amp;gt;. The&lt;br /&gt;
confidence interval is the range of plausible &amp;lt;math&amp;gt;\bar{Y}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
If we define the middle 90% to be plausible, to find the&lt;br /&gt;
confidence interval, simply find the 5th and 95th percentile.&lt;br /&gt;
&lt;br /&gt;
Generalized, if we want a confidence interval of the middle &amp;lt;math&amp;gt;(1 - \alpha) 100%&amp;lt;/math&amp;gt;, have a confidence interval of&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\bar{y}&amp;lt;/math&amp;gt; is the sample mean and &amp;lt;math&amp;gt;Z_{x}&amp;lt;/math&amp;gt; is the z score of the x-th percentile.&lt;br /&gt;
&lt;br /&gt;
= T-Distribution =&lt;br /&gt;
[[File:T distribution table.png|thumb|T distribution table]]&lt;br /&gt;
&lt;br /&gt;
CLT has several restrictions, the biggest one being a large sample size.&lt;br /&gt;
&#039;&#039;&#039;T-&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Since we don&#039;t know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;, we&lt;br /&gt;
have to use the sample variance &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt; to estimate it. This&lt;br /&gt;
introduces more uncertainty, accounted for by the &#039;&#039;&#039;t-distribution.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
T-distribution is the distribution of sample mean based on population&lt;br /&gt;
mean, sample variance and &#039;&#039;degrees of freedom&#039;&#039; (covered later). It&lt;br /&gt;
looks very similar to normal distribution.&lt;br /&gt;
&lt;br /&gt;
When the sample size &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; is small, there is greater&lt;br /&gt;
uncertainty in the estimates. T-di&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
t_{\alpha/2} &amp;gt; Z_{\alpha/2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The spread of t-distribution depends on the &#039;&#039;&#039;degrees of freedom&#039;&#039;&#039;,&lt;br /&gt;
which is based on sample size. When looking up the table, round down df.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = n - 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As the sample size increases, degrees of freedom increase, the spread of&lt;br /&gt;
t-distribution decreases, and t-distribution approaches normal&lt;br /&gt;
distribution.&lt;br /&gt;
&lt;br /&gt;
Based on CLT and normal distribution, we had the confidence interval&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, based on T-distribution, we have the CI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm t_{\alpha / 2} \frac{s}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Find Sample Size ====&lt;br /&gt;
To calculate sample size needed depending on desired&lt;br /&gt;
error margin and sample variance by assuming that &amp;lt;math&amp;gt;\upsilon = \infty&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
n = \frac{Z^2_{\alpha/2} s^2}{E^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We want to always &#039;&#039;round up&#039;&#039; to stay within the error margin.&lt;br /&gt;
&lt;br /&gt;
I don&#039;t really know why.&lt;br /&gt;
&lt;br /&gt;
= Sampling Distribution of Difference =&lt;br /&gt;
&lt;br /&gt;
By linear combination of RVs, sampling distribution of &amp;lt;math&amp;gt;\bar{Y_1} - \bar{Y_2}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 ( \bar{Y_1} - \bar{Y_2} ) \sim N(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2})&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, we do not know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;. If the CLT assumptions hold, then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2 / n_1)^2 }{n_1 - 1} + \frac{(s_2^2 / n_2)^2 }{n_2 - 1} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Trust me bro. Remember to round down to use t-table. With this degree of freedom, we can use sample variance to estimate the distribution.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Sample Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=398</id>
		<title>Sampling Distribution</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=398"/>
		<updated>2024-03-19T16:41:52Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Binomial Normal Approximation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Let there be &amp;lt;math&amp;gt;Y_1, Y_2, \ldots, Y_n &amp;lt;/math&amp;gt;, where each&lt;br /&gt;
&amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is a randomv variable from the population.&lt;br /&gt;
&lt;br /&gt;
Every Y have the same mean and distribution that we don&#039;t know.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(Y_i) = \mu, Var(Y_i) = \sigma^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We then have the sample mean&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\bar{Y} = \frac{1}{n} \sum_{i = 1}^n Y_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The sample mean is expected to be &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; through a pretty easy&lt;br /&gt;
direct proof&lt;br /&gt;
&lt;br /&gt;
The variance of the sample mean is &amp;lt;math&amp;gt;\frac{\sigma^2}{n}&amp;lt;/math&amp;gt;, also&lt;br /&gt;
through a pretty easy direct proof.&lt;br /&gt;
&lt;br /&gt;
= Central limit theorem =&lt;br /&gt;
The &#039;&#039;&#039;central limit theorem&#039;&#039;&#039; states that the distribution of the sample mean follows normal distribution.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As long as the following two conditions are satisfied, CLT applies, regardless of the population&#039;s distribution.&lt;br /&gt;
&lt;br /&gt;
# The population distribution of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is normal, &#039;&#039;or&#039;&#039;&lt;br /&gt;
# The sample size for each &amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is large &amp;lt;math&amp;gt;n&amp;gt;30&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By extension, we also have the distribution of the sum.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;S \sim N(\mu_S = n\mu, \sigma_S = \sqrt{n \sigma})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;S = \sum Y_i &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Confidence Interval =&lt;br /&gt;
&#039;&#039;&#039;Estimation&#039;&#039;&#039; is the guess for the unknown parameter. A &#039;&#039;&#039;point estimate&#039;&#039;&#039; is a &amp;quot;best guess&amp;quot; of the population parameter, where as the &#039;&#039;&#039;confidence interval&#039;&#039;&#039; is the range of reasonable values that are intended to contain the &#039;&#039;&#039;parameter of interest&#039;&#039;&#039; with a certain &#039;&#039;&#039;degree of confidence&#039;&#039;&#039;, calculated with&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;(point estimate - margin of error, point estimate + margin of error)&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== Constructing CIs ====&lt;br /&gt;
By CLT, &amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n} )&amp;lt;/math&amp;gt;. The&lt;br /&gt;
confidence interval is the range of plausible &amp;lt;math&amp;gt;\bar{Y}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
If we define the middle 90% to be plausible, to find the&lt;br /&gt;
confidence interval, simply find the 5th and 95th percentile.&lt;br /&gt;
&lt;br /&gt;
Generalized, if we want a confidence interval of the middle &amp;lt;math&amp;gt;(1 - \alpha) 100%&amp;lt;/math&amp;gt;, have a confidence interval of&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\bar{y}&amp;lt;/math&amp;gt; is the sample mean and &amp;lt;math&amp;gt;Z_{x}&amp;lt;/math&amp;gt; is the z score of the x-th percentile.&lt;br /&gt;
&lt;br /&gt;
= Binomial Normal Approximation =&lt;br /&gt;
The sampling distribution of the proportion of success of &#039;&#039;n&#039;&#039; bernoulli random variables (&amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt;) can also be approximated to a normal distribution under the CLT.&lt;br /&gt;
&lt;br /&gt;
Consider [[Discrete Random Variable#Bernoulli|bernoulli]] random variables&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Y_1 \ldots Y_n \sim E(Y_i) = p, Var(Y_i) = p(1 - p)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The proportion of success &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is the sum over the count, so the expected probability is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;E(\hat{p}) = E \left(\frac{1}{n} \sum Y_i \right) = p&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the variance of &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Var \left(\frac{1}{n} \sum Y_i \right) = \frac{1}{n^2} Var \left(\sum Y_i \right)  = \frac{p (1 - p)}{n}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With a large sample size &#039;&#039;n&#039;&#039;, we can appoximate this to a normal distribution. Notably, the criteria for a large &#039;&#039;n&#039;&#039; is different from that of the continuous random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;np &amp;gt; 5, n(1 - p) &amp;gt; 5&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\hat{p} \sim N \left( p, \frac{p(1-p)}{n} \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reasoning behind the weird criteria relates to the [[Discrete Random Variable|binomial distribution]]. It&#039;s not very elaborated on in the lecture, but &amp;lt;math&amp;gt;np&amp;lt;/math&amp;gt; is the mean of the binomial (i.e. the expected number of successes). The criteria essentially makes sure that no negative values are plausible in the approximation; with a small mean and a large variance, the left side of a normal approximation goes into the negative, but bernoulli/binomial should always be positive by common sense.&lt;br /&gt;
&lt;br /&gt;
= T-Distribution =&lt;br /&gt;
[[File:T distribution table.png|thumb|T distribution table]]&lt;br /&gt;
&lt;br /&gt;
CLT has several restrictions, the biggest one being a large sample size.&lt;br /&gt;
&#039;&#039;&#039;T-&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Since we don&#039;t know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;, we&lt;br /&gt;
have to use the sample variance &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt; to estimate it. This&lt;br /&gt;
introduces more uncertainty, accounted for by the &#039;&#039;&#039;t-distribution.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
T-distribution is the distribution of sample mean based on population&lt;br /&gt;
mean, sample variance and &#039;&#039;degrees of freedom&#039;&#039; (covered later). It&lt;br /&gt;
looks very similar to normal distribution.&lt;br /&gt;
&lt;br /&gt;
When the sample size &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; is small, there is greater&lt;br /&gt;
uncertainty in the estimates. T-di&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
t_{\alpha/2} &amp;gt; Z_{\alpha/2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The spread of t-distribution depends on the &#039;&#039;&#039;degrees of freedom&#039;&#039;&#039;,&lt;br /&gt;
which is based on sample size. When looking up the table, round down df.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = n - 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As the sample size increases, degrees of freedom increase, the spread of&lt;br /&gt;
t-distribution decreases, and t-distribution approaches normal&lt;br /&gt;
distribution.&lt;br /&gt;
&lt;br /&gt;
Based on CLT and normal distribution, we had the confidence interval&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, based on T-distribution, we have the CI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm t_{\alpha / 2} \frac{s}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Find Sample Size ====&lt;br /&gt;
To calculate sample size needed depending on desired&lt;br /&gt;
error margin and sample variance by assuming that &amp;lt;math&amp;gt;\upsilon = \infty&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
n = \frac{Z^2_{\alpha/2} s^2}{E^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We want to always &#039;&#039;round up&#039;&#039; to stay within the error margin.&lt;br /&gt;
&lt;br /&gt;
I don&#039;t really know why.&lt;br /&gt;
&lt;br /&gt;
= Sampling Distribution of Difference =&lt;br /&gt;
&lt;br /&gt;
By linear combination of RVs, sampling distribution of &amp;lt;math&amp;gt;\bar{Y_1} - \bar{Y_2}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 ( \bar{Y_1} - \bar{Y_2} ) \sim N(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2})&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, we do not know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;. If the CLT assumptions hold, then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2 / n_1)^2 }{n_1 - 1} + \frac{(s_2^2 / n_2)^2 }{n_2 - 1} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Trust me bro. Remember to round down to use t-table. With this degree of freedom, we can use sample variance to estimate the distribution.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Sample Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=397</id>
		<title>Discrete Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=397"/>
		<updated>2024-03-19T16:34:58Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Bernoulli */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Statistics]]&lt;br /&gt;
[[Category:Distribution (Statistics)]]&lt;br /&gt;
A random variable is &#039;&#039;&#039;discrete&#039;&#039;&#039; if the values it can take on within an interval is &#039;&#039;finite&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
= PMF and CDF =&lt;br /&gt;
The &#039;&#039;&#039;probability mass function (PMF)&#039;&#039;&#039; describes the probability distribution over a discrete random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;p(x) = P(X = x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;cumulative distribution function (CDF)&#039;&#039;&#039; specifies the probability of an observation being equal to or less than a given value.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;F(x) = P(X \leq x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We usually have tables for these in the case of discrete random variables.&lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Expected value (mean):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = E(X) = \sum x_i P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Distributions =&lt;br /&gt;
&lt;br /&gt;
== Bernoulli ==&lt;br /&gt;
The &#039;&#039;&#039;bernoulli distribution&#039;&#039;&#039; describes the random variable of an experiment that has two outcomes and is performed once. The outcomes are either &#039;&#039;success&#039;&#039; or &#039;&#039;failure&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Bernoulli(p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== PMF ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
p(1) = p, p(0) = 1 - p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Statistics ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2_X = p (1 - p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Binomial ==&lt;br /&gt;
&lt;br /&gt;
Repeating a bernoulli experiment &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; times and we get a &#039;&#039;&#039;binomial random variable&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Consider an experiment with exactly two possible outcomes, conducted n times independently. The variable of interest &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; is the number of successful trials. The distribution relies on the number of trials and the probability of success.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Binomial(n, p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== PMF ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
(^n_x) p^x (1 - p)^{n - x}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Statistics ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = np&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2 = np (1 - p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Poisson =&lt;br /&gt;
The &#039;&#039;&#039;poisson distribution&#039;&#039;&#039; is used when we know the &#039;&#039;average rate of occurrence for a particular event over a particular time period.&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* There must be &#039;&#039;&#039;fixed interval&#039;&#039;&#039; of the time or space&lt;br /&gt;
* Events happen with a &#039;&#039;&#039;known average rate&#039;&#039;&#039; independent of time or the last event.&lt;br /&gt;
* The average rate of occurrence per unit of time/sace is the &#039;&#039;&#039;rate parameter&#039;&#039;&#039; &amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Poisson distribution approximates binomial distribution when &#039;&#039;n&#039;&#039; is large and &#039;&#039;p&#039;&#039; is small, used to model rare events. Normally it is used to measure the number of events in a unit time, whereas [[Continuous Random Variable#Exponential Distribution|exponential distribution]] models the amount of waiting time until an event.&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleepy I&#039;ll write the details later... zzz...&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=396</id>
		<title>Sampling Distribution</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=396"/>
		<updated>2024-03-19T16:26:15Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Confidence Interval */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Let there be &amp;lt;math&amp;gt;Y_1, Y_2, \ldots, Y_n &amp;lt;/math&amp;gt;, where each&lt;br /&gt;
&amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is a randomv variable from the population.&lt;br /&gt;
&lt;br /&gt;
Every Y have the same mean and distribution that we don&#039;t know.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(Y_i) = \mu, Var(Y_i) = \sigma^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We then have the sample mean&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\bar{Y} = \frac{1}{n} \sum_{i = 1}^n Y_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The sample mean is expected to be &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; through a pretty easy&lt;br /&gt;
direct proof&lt;br /&gt;
&lt;br /&gt;
The variance of the sample mean is &amp;lt;math&amp;gt;\frac{\sigma^2}{n}&amp;lt;/math&amp;gt;, also&lt;br /&gt;
through a pretty easy direct proof.&lt;br /&gt;
&lt;br /&gt;
= Central limit theorem =&lt;br /&gt;
The &#039;&#039;&#039;central limit theorem&#039;&#039;&#039; states that the distribution of the sample mean follows normal distribution.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As long as the following two conditions are satisfied, CLT applies, regardless of the population&#039;s distribution.&lt;br /&gt;
&lt;br /&gt;
# The population distribution of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is normal, &#039;&#039;or&#039;&#039;&lt;br /&gt;
# The sample size for each &amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is large &amp;lt;math&amp;gt;n&amp;gt;30&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By extension, we also have the distribution of the sum.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;S \sim N(\mu_S = n\mu, \sigma_S = \sqrt{n \sigma})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;S = \sum Y_i &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Confidence Interval =&lt;br /&gt;
&#039;&#039;&#039;Estimation&#039;&#039;&#039; is the guess for the unknown parameter. A &#039;&#039;&#039;point estimate&#039;&#039;&#039; is a &amp;quot;best guess&amp;quot; of the population parameter, where as the &#039;&#039;&#039;confidence interval&#039;&#039;&#039; is the range of reasonable values that are intended to contain the &#039;&#039;&#039;parameter of interest&#039;&#039;&#039; with a certain &#039;&#039;&#039;degree of confidence&#039;&#039;&#039;, calculated with&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;(point estimate - margin of error, point estimate + margin of error)&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== Constructing CIs ====&lt;br /&gt;
By CLT, &amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n} )&amp;lt;/math&amp;gt;. The&lt;br /&gt;
confidence interval is the range of plausible &amp;lt;math&amp;gt;\bar{Y}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
If we define the middle 90% to be plausible, to find the&lt;br /&gt;
confidence interval, simply find the 5th and 95th percentile.&lt;br /&gt;
&lt;br /&gt;
Generalized, if we want a confidence interval of the middle &amp;lt;math&amp;gt;(1 - \alpha) 100%&amp;lt;/math&amp;gt;, have a confidence interval of&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\bar{y}&amp;lt;/math&amp;gt; is the sample mean and &amp;lt;math&amp;gt;Z_{x}&amp;lt;/math&amp;gt; is the z score of the x-th percentile.&lt;br /&gt;
&lt;br /&gt;
= Binomial Normal Approximation =&lt;br /&gt;
The sampling distribution of the mean of &#039;&#039;n&#039;&#039; bernoulli random variables (&amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt;) can also be approximated to a normal distribution under the CLT.&lt;br /&gt;
&lt;br /&gt;
Consider [[Discrete Random Variable#Bernoulli|bernoulli]] random variables&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Y_1 \ldots Y_n \sim E(Y_i) = p, Var(Y_i) = p(1 - p)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The probability of success &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is the sum over the count, so the expected probability is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;E(\hat{p}) = E \left(\frac{1}{n} \sum Y_i \right) = p&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the variance of &amp;lt;math&amp;gt;\hat{p}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;Var \left(\frac{1}{n} \sum Y_i \right) = \frac{1}{n^2} Var \left(\sum Y_i \right)  = \frac{p (1 - p)}{n}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With a large sample size &#039;&#039;n&#039;&#039;, we can appoximate this to a normal distribution. Notably, the criteria for a large &#039;&#039;n&#039;&#039; is different from that of the continuous random variable.&lt;br /&gt;
&lt;br /&gt;
= T-Distribution =&lt;br /&gt;
[[File:T distribution table.png|thumb|T distribution table]]&lt;br /&gt;
&lt;br /&gt;
CLT has several restrictions, the biggest one being a large sample size.&lt;br /&gt;
&#039;&#039;&#039;T-&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Since we don&#039;t know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;, we&lt;br /&gt;
have to use the sample variance &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt; to estimate it. This&lt;br /&gt;
introduces more uncertainty, accounted for by the &#039;&#039;&#039;t-distribution.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
T-distribution is the distribution of sample mean based on population&lt;br /&gt;
mean, sample variance and &#039;&#039;degrees of freedom&#039;&#039; (covered later). It&lt;br /&gt;
looks very similar to normal distribution.&lt;br /&gt;
&lt;br /&gt;
When the sample size &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; is small, there is greater&lt;br /&gt;
uncertainty in the estimates. T-di&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
t_{\alpha/2} &amp;gt; Z_{\alpha/2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The spread of t-distribution depends on the &#039;&#039;&#039;degrees of freedom&#039;&#039;&#039;,&lt;br /&gt;
which is based on sample size. When looking up the table, round down df.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = n - 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As the sample size increases, degrees of freedom increase, the spread of&lt;br /&gt;
t-distribution decreases, and t-distribution approaches normal&lt;br /&gt;
distribution.&lt;br /&gt;
&lt;br /&gt;
Based on CLT and normal distribution, we had the confidence interval&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, based on T-distribution, we have the CI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm t_{\alpha / 2} \frac{s}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Find Sample Size ====&lt;br /&gt;
To calculate sample size needed depending on desired&lt;br /&gt;
error margin and sample variance by assuming that &amp;lt;math&amp;gt;\upsilon = \infty&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
n = \frac{Z^2_{\alpha/2} s^2}{E^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We want to always &#039;&#039;round up&#039;&#039; to stay within the error margin.&lt;br /&gt;
&lt;br /&gt;
I don&#039;t really know why.&lt;br /&gt;
&lt;br /&gt;
= Sampling Distribution of Difference =&lt;br /&gt;
&lt;br /&gt;
By linear combination of RVs, sampling distribution of &amp;lt;math&amp;gt;\bar{Y_1} - \bar{Y_2}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 ( \bar{Y_1} - \bar{Y_2} ) \sim N(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2})&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, we do not know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;. If the CLT assumptions hold, then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2 / n_1)^2 }{n_1 - 1} + \frac{(s_2^2 / n_2)^2 }{n_2 - 1} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Trust me bro. Remember to round down to use t-table. With this degree of freedom, we can use sample variance to estimate the distribution.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Sample Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=395</id>
		<title>Discrete Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=395"/>
		<updated>2024-03-19T16:14:18Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Binomial */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Statistics]]&lt;br /&gt;
[[Category:Distribution (Statistics)]]&lt;br /&gt;
A random variable is &#039;&#039;&#039;discrete&#039;&#039;&#039; if the values it can take on within an interval is &#039;&#039;finite&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
= PMF and CDF =&lt;br /&gt;
The &#039;&#039;&#039;probability mass function (PMF)&#039;&#039;&#039; describes the probability distribution over a discrete random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;p(x) = P(X = x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;cumulative distribution function (CDF)&#039;&#039;&#039; specifies the probability of an observation being equal to or less than a given value.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;F(x) = P(X \leq x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We usually have tables for these in the case of discrete random variables.&lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Expected value (mean):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = E(X) = \sum x_i P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Distributions =&lt;br /&gt;
&lt;br /&gt;
== Bernoulli ==&lt;br /&gt;
The &#039;&#039;&#039;bernoulli distribution&#039;&#039;&#039; describes the random variable of an experiment that has two outcomes and is performed once. The outcomes are either &#039;&#039;success&#039;&#039; or &#039;&#039;failure&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Bernoulli(p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== PMF ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
p(1) = p, p(0) = 1 - p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Statistics ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2_X = p (1 - p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Binomial ==&lt;br /&gt;
&lt;br /&gt;
Repeating a bernoulli experiment &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; times and we get a &#039;&#039;&#039;binomial random variable&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Consider an experiment with exactly two possible outcomes, conducted n times independently.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Binomial(n, p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleep I&#039;ll write the details later. It should be on the equation sheet.&lt;br /&gt;
&lt;br /&gt;
=== PMF ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
(^n_x) p^x (1 - p)^{n - x}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Statistics ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = np&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2 = np (1 - p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Poisson =&lt;br /&gt;
The &#039;&#039;&#039;poisson distribution&#039;&#039;&#039; is used when we know the &#039;&#039;average rate of occurrence for a particular event over a particular time period.&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* There must be &#039;&#039;&#039;fixed interval&#039;&#039;&#039; of the time or space&lt;br /&gt;
* Events happen with a &#039;&#039;&#039;known average rate&#039;&#039;&#039; independent of time or the last event.&lt;br /&gt;
* The average rate of occurrence per unit of time/sace is the &#039;&#039;&#039;rate parameter&#039;&#039;&#039; &amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Poisson distribution approximates binomial distribution when &#039;&#039;n&#039;&#039; is large and &#039;&#039;p&#039;&#039; is small, used to model rare events. Normally it is used to measure the number of events in a unit time, whereas [[Continuous Random Variable#Exponential Distribution|exponential distribution]] models the amount of waiting time until an event.&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleepy I&#039;ll write the details later... zzz...&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=394</id>
		<title>Sampling Distribution</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Sampling_Distribution&amp;diff=394"/>
		<updated>2024-03-19T16:03:03Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Central limit theorem */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Let there be &amp;lt;math&amp;gt;Y_1, Y_2, \ldots, Y_n &amp;lt;/math&amp;gt;, where each&lt;br /&gt;
&amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is a randomv variable from the population.&lt;br /&gt;
&lt;br /&gt;
Every Y have the same mean and distribution that we don&#039;t know.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(Y_i) = \mu, Var(Y_i) = \sigma^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We then have the sample mean&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\bar{Y} = \frac{1}{n} \sum_{i = 1}^n Y_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The sample mean is expected to be &amp;lt;math&amp;gt;\mu&amp;lt;/math&amp;gt; through a pretty easy&lt;br /&gt;
direct proof&lt;br /&gt;
&lt;br /&gt;
The variance of the sample mean is &amp;lt;math&amp;gt;\frac{\sigma^2}{n}&amp;lt;/math&amp;gt;, also&lt;br /&gt;
through a pretty easy direct proof.&lt;br /&gt;
&lt;br /&gt;
= Central limit theorem =&lt;br /&gt;
The &#039;&#039;&#039;central limit theorem&#039;&#039;&#039; states that the distribution of the sample mean follows normal distribution.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As long as the following two conditions are satisfied, CLT applies, regardless of the population&#039;s distribution.&lt;br /&gt;
&lt;br /&gt;
# The population distribution of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is normal, &#039;&#039;or&#039;&#039;&lt;br /&gt;
# The sample size for each &amp;lt;math&amp;gt;Y_i&amp;lt;/math&amp;gt; is large &amp;lt;math&amp;gt;n&amp;gt;30&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By extension, we also have the distribution of the sum.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;S \sim N(\mu_S = n\mu, \sigma_S = \sqrt{n \sigma})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;S = \sum Y_i &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Confidence Interval =&lt;br /&gt;
&#039;&#039;&#039;Estimation&#039;&#039;&#039; is the guess for the unknown parameter. A &#039;&#039;&#039;point estimate&#039;&#039;&#039; is a &amp;quot;best guess&amp;quot; of the population parameter, where as the &#039;&#039;&#039;confidence interval&#039;&#039;&#039; is the range of reasonable values that are intended to contain the &#039;&#039;&#039;parameter of interest&#039;&#039;&#039; with a certain &#039;&#039;&#039;degree of confidence&#039;&#039;&#039;, calculated with&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;(point estimate - margin of error, point estimate + margin of error)&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==== Constructing CIs ====&lt;br /&gt;
By CLT, &amp;lt;math&amp;gt;\bar{Y} \sim N(\mu, \frac{\sigma^2}{n} )&amp;lt;/math&amp;gt;. The&lt;br /&gt;
confidence interval is the range of plausible &amp;lt;math&amp;gt;\bar{Y}&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
If we define the middle 90% to be plausible, to find the&lt;br /&gt;
confidence interval, simply find the 5th and 95th percentile.&lt;br /&gt;
&lt;br /&gt;
Generalized, if we want a confidence interval of the middle &amp;lt;math&amp;gt;(1 - \alpha) 100%&amp;lt;/math&amp;gt;, have a confidence interval of&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\bar{y}&amp;lt;/math&amp;gt; is the sample mean and &amp;lt;math&amp;gt;Z_{x}&amp;lt;/math&amp;gt; is the z score of the x-th percentile.&lt;br /&gt;
&lt;br /&gt;
= T-Distribution =&lt;br /&gt;
[[File:T distribution table.png|thumb|T distribution table]]&lt;br /&gt;
&lt;br /&gt;
CLT has several restrictions, the biggest one being a large sample size.&lt;br /&gt;
&#039;&#039;&#039;T-&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Since we don&#039;t know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;, we&lt;br /&gt;
have to use the sample variance &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt; to estimate it. This&lt;br /&gt;
introduces more uncertainty, accounted for by the &#039;&#039;&#039;t-distribution.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
T-distribution is the distribution of sample mean based on population&lt;br /&gt;
mean, sample variance and &#039;&#039;degrees of freedom&#039;&#039; (covered later). It&lt;br /&gt;
looks very similar to normal distribution.&lt;br /&gt;
&lt;br /&gt;
When the sample size &amp;lt;math&amp;gt;n&amp;lt;/math&amp;gt; is small, there is greater&lt;br /&gt;
uncertainty in the estimates. T-di&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
t_{\alpha/2} &amp;gt; Z_{\alpha/2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The spread of t-distribution depends on the &#039;&#039;&#039;degrees of freedom&#039;&#039;&#039;,&lt;br /&gt;
which is based on sample size. When looking up the table, round down df.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = n - 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As the sample size increases, degrees of freedom increase, the spread of&lt;br /&gt;
t-distribution decreases, and t-distribution approaches normal&lt;br /&gt;
distribution.&lt;br /&gt;
&lt;br /&gt;
Based on CLT and normal distribution, we had the confidence interval&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm Z_{\alpha / 2} \frac{\sigma}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now, based on T-distribution, we have the CI&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 \bar{Y} \pm t_{\alpha / 2} \frac{s}{ \sqrt{n} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Find Sample Size ====&lt;br /&gt;
To calculate sample size needed depending on desired&lt;br /&gt;
error margin and sample variance by assuming that &amp;lt;math&amp;gt;\upsilon = \infty&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
n = \frac{Z^2_{\alpha/2} s^2}{E^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We want to always &#039;&#039;round up&#039;&#039; to stay within the error margin.&lt;br /&gt;
&lt;br /&gt;
I don&#039;t really know why.&lt;br /&gt;
&lt;br /&gt;
= Sampling Distribution of Difference =&lt;br /&gt;
&lt;br /&gt;
By linear combination of RVs, sampling distribution of &amp;lt;math&amp;gt;\bar{Y_1} - \bar{Y_2}&amp;lt;/math&amp;gt; is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
 ( \bar{Y_1} - \bar{Y_2} ) \sim N(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2})&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, we do not know the population variance &amp;lt;math&amp;gt;\sigma^2&amp;lt;/math&amp;gt;. If the CLT assumptions hold, then we have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\upsilon = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2 }{ \frac{(s_1^2 / n_1)^2 }{n_1 - 1} + \frac{(s_2^2 / n_2)^2 }{n_2 - 1} }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Trust me bro. Remember to round down to use t-table. With this degree of freedom, we can use sample variance to estimate the distribution.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Sample Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Continuous_Random_Variable&amp;diff=393</id>
		<title>Continuous Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Continuous_Random_Variable&amp;diff=393"/>
		<updated>2024-03-19T07:50:09Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Normal Random Variable */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Distribution (Statistics)]]&lt;br /&gt;
Continuous random variables have an inifinite number of values for any&lt;br /&gt;
given interval. While similar, the approach to analysis is very&lt;br /&gt;
different from discrete variables&lt;br /&gt;
* Summation becomes integration&lt;br /&gt;
* Probability becomes area under a curve&lt;br /&gt;
&lt;br /&gt;
= Probability Density Function &amp;lt;math&amp;gt; f(x) &amp;lt;/math&amp;gt; =&lt;br /&gt;
&lt;br /&gt;
The probability density function (pdf) maps a continuous variable to a&lt;br /&gt;
probability density.&lt;br /&gt;
&lt;br /&gt;
As the name &amp;quot;density&amp;quot; suggests, the area under the pdf curve between a&lt;br /&gt;
range is the probability of the variable being in that range.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(c \leq x \leq d) = \int_c^d f(x) dx = F(d) - F(c)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Total area under the curve must be &amp;lt;math&amp;gt; 1 &amp;lt;/math&amp;gt;, as chances of&lt;br /&gt;
events happening is 100% if the range includes all possible events.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\int_{-\infty}^\infty f(x) dx = 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is no area under a single point&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(X = a) = 0&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Mean and Variance =&lt;br /&gt;
&lt;br /&gt;
The mean and variance calculations are pretty much the same as that of&lt;br /&gt;
[[Discrete Random Variable|discrete random variables]], except the summations are swapped out for&lt;br /&gt;
integrals.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(X) = \mu_X = \int_{-\infty}^\infty x f(x) dx&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(X) = \sigma^2_X = \int_{-\infty}^\infty (x - \mu_X)^2 f(x) dx&lt;br /&gt;
= \int_{-\infty}^\infty x^2 f(x) dx - \mu_X^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Median and Percentile =&lt;br /&gt;
&lt;br /&gt;
The a-th percentileis the point at which a percent the area under the&lt;br /&gt;
curve is to one side. You want &amp;lt;math&amp;gt; P(X \leq x) &amp;lt;/math&amp;gt; to be a%, the&lt;br /&gt;
calculation of which is in the page above.&lt;br /&gt;
&lt;br /&gt;
By the same logic, the quartiles are at 25%, 50%, and 75% accordingly.&lt;br /&gt;
&lt;br /&gt;
= Uniform Distribution &amp;lt;math&amp;gt; X \sim Uniform(a, b) &amp;lt;/math&amp;gt; =&lt;br /&gt;
Uniform random variable is described by two parameters: &amp;lt;math&amp;gt; a &amp;lt;/math&amp;gt;&lt;br /&gt;
is minimum, and &amp;lt;math&amp;gt; b &amp;lt;/math&amp;gt; is maximum. It has a rectangular&lt;br /&gt;
distribution, where every point has the same probability density.&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(x) = \begin{cases}&lt;br /&gt;
    \frac{ 1 }{ b - a } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    0 &amp;amp; \text{otherwise}&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== CDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
F(x) = \begin{cases}&lt;br /&gt;
    0 &amp;amp; x &amp;lt; a \\&lt;br /&gt;
    \frac{ x - a }{ b - a } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    1 &amp;amp; x &amp;gt; b&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Mean ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu_X = \frac{ a + b }{ 2 }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Variance ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2 = \frac{ 1 }{ 12 } (b - a)^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Exponential Distribution =&lt;br /&gt;
&lt;br /&gt;
The exponential distribution models events that occurs&lt;br /&gt;
* Continuously&lt;br /&gt;
* Independently&lt;br /&gt;
* At a constant average rate&lt;br /&gt;
&lt;br /&gt;
It takes in one parameter: &amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt;, the &#039;&#039;&#039;rate parameter.&#039;&#039;&#039; Defined by the mean below, it is the &#039;&#039;average rate per unit time/space.&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Exponential distribution has the &#039;&#039;&#039;memoryless property&#039;&#039;&#039;: the&lt;br /&gt;
probability to an event does not change no matter how much time has&lt;br /&gt;
passed.&lt;br /&gt;
&lt;br /&gt;
In probability terms, the probability that we must wait an&lt;br /&gt;
additional &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; units given that we have waited &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt;&lt;br /&gt;
units&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(T &amp;gt; t + s | T &amp;gt; s) = P(T &amp;gt; t) = e^{-\lambda t}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notably, it models time until some event has happened, in contrast to [[Discrete Random Variable#Poisson|poisson distribution]], which measures the number of events in a unit time.&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(x) = \begin{cases}&lt;br /&gt;
    \lambda e ^{ - \lambda x } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    0 &amp;amp; \text{otherwise}&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== CDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
F(x) = 1 - e^{- \lambda x}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Mean ====&lt;br /&gt;
&lt;br /&gt;
Integration by parts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu_X = \frac{1}{\lambda}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Variance ====&lt;br /&gt;
&lt;br /&gt;
Integration by parts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2 = \frac{ 1 }{ \lambda^2 }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Exponential and Poisson ==&lt;br /&gt;
&lt;br /&gt;
Exponential distribution and poisson RVs are related:&lt;br /&gt;
* &amp;lt;math&amp;gt;X \sim Poisson(\lambda)&amp;lt;/math&amp;gt;: the number of events in a unit time&lt;br /&gt;
* &amp;lt;math&amp;gt;X \sim Exp(\lambda)&amp;lt;/math&amp;gt;: waiting time until an event&lt;br /&gt;
&lt;br /&gt;
= Normal Random Variable =&lt;br /&gt;
[[File:Z score table.png|thumb|Z score table]]&lt;br /&gt;
&#039;&#039;&#039;Normal random variables&#039;&#039;&#039; (aka. Gaussian RV) are the most widly used continuous RV in&lt;br /&gt;
statistics, characterizing many natural phenomenons. It is the famous&lt;br /&gt;
bell curve.&lt;br /&gt;
&lt;br /&gt;
They are characterized by two parameters: mean and variance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Y \sim N(\mu_Y, \sigma^2_Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Normal random variables are perfectly symmetric at the mean.&lt;br /&gt;
&lt;br /&gt;
==== Standardizing Normal Distribution ====&lt;br /&gt;
&lt;br /&gt;
Standardization of a data means to make its mean 0 and its standard&lt;br /&gt;
deviation 1. We do this by subtracting the mean and dividing by the&lt;br /&gt;
standard deviation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Z = \frac{Y - \mu}{\sigma}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Intuitively, this moves the dataset and changes the scale. We do this to&lt;br /&gt;
simplify probability calculations.&lt;br /&gt;
&lt;br /&gt;
==== Z score ====&lt;br /&gt;
&lt;br /&gt;
The z-score is the number of standard deviations above or below the&lt;br /&gt;
mean. A positive z score is above, and a negative is below.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
z = \frac{y - \mu}{\sigma}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
The pdf for normal random variable is the following.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{1}{2} \frac{(y -&lt;br /&gt;
\mu)^2}{\sigma^2}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After standardizing the normal RV, we can use the following instead.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{1}{2} z^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;z&amp;lt;/math&amp;gt; is the z-score covered in the last section.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Quantiles ====&lt;br /&gt;
&lt;br /&gt;
Quantiles are points dividing the range of a probability&lt;br /&gt;
distribution. Quartiles and precentiles are types of quantiles.&lt;br /&gt;
&lt;br /&gt;
For normal distributions, there are special points (critical values)&lt;br /&gt;
that correspond to particular probabilities: &amp;lt;math&amp;gt;z_a&amp;lt;/math&amp;gt;, where&lt;br /&gt;
&amp;lt;math&amp;gt;a&amp;lt;/math&amp;gt; is the probability in the right tail.&lt;br /&gt;
&lt;br /&gt;
==== Standard Normal Table ====&lt;br /&gt;
&lt;br /&gt;
The standard normal table calculate lower tail values based on the&lt;br /&gt;
standard normal distribution (i.e. area under the curve left of the&lt;br /&gt;
point).&lt;br /&gt;
&lt;br /&gt;
==== Linear Combinations of Independent Normal RV ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
W = aX + bY&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
W \sim N(a\mu)X + b\mu_y, a^2 \sigma^2_X + b^2 \sigma^2_y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other distributions =&lt;br /&gt;
&lt;br /&gt;
[[Two Numerical RVs]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Continuous_Random_Variable&amp;diff=392</id>
		<title>Continuous Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Continuous_Random_Variable&amp;diff=392"/>
		<updated>2024-03-19T07:49:17Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Exponential Distribution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Distribution (Statistics)]]&lt;br /&gt;
Continuous random variables have an inifinite number of values for any&lt;br /&gt;
given interval. While similar, the approach to analysis is very&lt;br /&gt;
different from discrete variables&lt;br /&gt;
* Summation becomes integration&lt;br /&gt;
* Probability becomes area under a curve&lt;br /&gt;
&lt;br /&gt;
= Probability Density Function &amp;lt;math&amp;gt; f(x) &amp;lt;/math&amp;gt; =&lt;br /&gt;
&lt;br /&gt;
The probability density function (pdf) maps a continuous variable to a&lt;br /&gt;
probability density.&lt;br /&gt;
&lt;br /&gt;
As the name &amp;quot;density&amp;quot; suggests, the area under the pdf curve between a&lt;br /&gt;
range is the probability of the variable being in that range.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(c \leq x \leq d) = \int_c^d f(x) dx = F(d) - F(c)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Total area under the curve must be &amp;lt;math&amp;gt; 1 &amp;lt;/math&amp;gt;, as chances of&lt;br /&gt;
events happening is 100% if the range includes all possible events.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\int_{-\infty}^\infty f(x) dx = 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is no area under a single point&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(X = a) = 0&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Mean and Variance =&lt;br /&gt;
&lt;br /&gt;
The mean and variance calculations are pretty much the same as that of&lt;br /&gt;
[[Discrete Random Variable|discrete random variables]], except the summations are swapped out for&lt;br /&gt;
integrals.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(X) = \mu_X = \int_{-\infty}^\infty x f(x) dx&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(X) = \sigma^2_X = \int_{-\infty}^\infty (x - \mu_X)^2 f(x) dx&lt;br /&gt;
= \int_{-\infty}^\infty x^2 f(x) dx - \mu_X^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Median and Percentile =&lt;br /&gt;
&lt;br /&gt;
The a-th percentileis the point at which a percent the area under the&lt;br /&gt;
curve is to one side. You want &amp;lt;math&amp;gt; P(X \leq x) &amp;lt;/math&amp;gt; to be a%, the&lt;br /&gt;
calculation of which is in the page above.&lt;br /&gt;
&lt;br /&gt;
By the same logic, the quartiles are at 25%, 50%, and 75% accordingly.&lt;br /&gt;
&lt;br /&gt;
= Uniform Distribution &amp;lt;math&amp;gt; X \sim Uniform(a, b) &amp;lt;/math&amp;gt; =&lt;br /&gt;
Uniform random variable is described by two parameters: &amp;lt;math&amp;gt; a &amp;lt;/math&amp;gt;&lt;br /&gt;
is minimum, and &amp;lt;math&amp;gt; b &amp;lt;/math&amp;gt; is maximum. It has a rectangular&lt;br /&gt;
distribution, where every point has the same probability density.&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(x) = \begin{cases}&lt;br /&gt;
    \frac{ 1 }{ b - a } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    0 &amp;amp; \text{otherwise}&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== CDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
F(x) = \begin{cases}&lt;br /&gt;
    0 &amp;amp; x &amp;lt; a \\&lt;br /&gt;
    \frac{ x - a }{ b - a } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    1 &amp;amp; x &amp;gt; b&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Mean ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu_X = \frac{ a + b }{ 2 }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Variance ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2 = \frac{ 1 }{ 12 } (b - a)^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Exponential Distribution =&lt;br /&gt;
&lt;br /&gt;
The exponential distribution models events that occurs&lt;br /&gt;
* Continuously&lt;br /&gt;
* Independently&lt;br /&gt;
* At a constant average rate&lt;br /&gt;
&lt;br /&gt;
It takes in one parameter: &amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt;, the &#039;&#039;&#039;rate parameter.&#039;&#039;&#039; Defined by the mean below, it is the &#039;&#039;average rate per unit time/space.&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Exponential distribution has the &#039;&#039;&#039;memoryless property&#039;&#039;&#039;: the&lt;br /&gt;
probability to an event does not change no matter how much time has&lt;br /&gt;
passed.&lt;br /&gt;
&lt;br /&gt;
In probability terms, the probability that we must wait an&lt;br /&gt;
additional &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; units given that we have waited &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt;&lt;br /&gt;
units&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(T &amp;gt; t + s | T &amp;gt; s) = P(T &amp;gt; t) = e^{-\lambda t}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notably, it models time until some event has happened, in contrast to [[Discrete Random Variable#Poisson|poisson distribution]], which measures the number of events in a unit time.&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(x) = \begin{cases}&lt;br /&gt;
    \lambda e ^{ - \lambda x } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    0 &amp;amp; \text{otherwise}&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== CDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
F(x) = 1 - e^{- \lambda x}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Mean ====&lt;br /&gt;
&lt;br /&gt;
Integration by parts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu_X = \frac{1}{\lambda}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Variance ====&lt;br /&gt;
&lt;br /&gt;
Integration by parts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2 = \frac{ 1 }{ \lambda^2 }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Exponential and Poisson ==&lt;br /&gt;
&lt;br /&gt;
Exponential distribution and poisson RVs are related:&lt;br /&gt;
* &amp;lt;math&amp;gt;X \sim Poisson(\lambda)&amp;lt;/math&amp;gt;: the number of events in a unit time&lt;br /&gt;
* &amp;lt;math&amp;gt;X \sim Exp(\lambda)&amp;lt;/math&amp;gt;: waiting time until an event&lt;br /&gt;
&lt;br /&gt;
= Normal Random Variable =&lt;br /&gt;
[[File:Z score table.png|thumb|Z score table]]&lt;br /&gt;
Normal random variables are the most widly used continuous RV in&lt;br /&gt;
statistics, characterizing many natural phenomenons. It is the famous&lt;br /&gt;
bell curve.&lt;br /&gt;
&lt;br /&gt;
They are characterized by two parameters: mean and variance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Y \sim N(\mu_Y, \sigma^2_Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Normal random variables are perfectly symmetric at the mean.&lt;br /&gt;
&lt;br /&gt;
==== Standardizing Normal Distribution ====&lt;br /&gt;
&lt;br /&gt;
Standardization of a data means to make its mean 0 and its standard&lt;br /&gt;
deviation 1. We do this by subtracting the mean and dividing by the&lt;br /&gt;
standard deviation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Z = \frac{Y - \mu}{\sigma}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Intuitively, this moves the dataset and changes the scale. We do this to&lt;br /&gt;
simplify probability calculations.&lt;br /&gt;
&lt;br /&gt;
==== Z score ====&lt;br /&gt;
&lt;br /&gt;
The z-score is the number of standard deviations above or below the&lt;br /&gt;
mean. A positive z score is above, and a negative is below.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
z = \frac{y - \mu}{\sigma}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
The pdf for normal random variable is the following.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{1}{2} \frac{(y -&lt;br /&gt;
\mu)^2}{\sigma^2}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After standardizing the normal RV, we can use the following instead.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{1}{2} z^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;z&amp;lt;/math&amp;gt; is the z-score covered in the last section.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Quantiles ====&lt;br /&gt;
&lt;br /&gt;
Quantiles are points dividing the range of a probability&lt;br /&gt;
distribution. Quartiles and precentiles are types of quantiles.&lt;br /&gt;
&lt;br /&gt;
For normal distributions, there are special points (critical values)&lt;br /&gt;
that correspond to particular probabilities: &amp;lt;math&amp;gt;z_a&amp;lt;/math&amp;gt;, where&lt;br /&gt;
&amp;lt;math&amp;gt;a&amp;lt;/math&amp;gt; is the probability in the right tail.&lt;br /&gt;
&lt;br /&gt;
==== Standard Normal Table ====&lt;br /&gt;
&lt;br /&gt;
The standard normal table calculate lower tail values based on the&lt;br /&gt;
standard normal distribution (i.e. area under the curve left of the&lt;br /&gt;
point).&lt;br /&gt;
&lt;br /&gt;
==== Linear Combinations of Independent Normal RV ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
W = aX + bY&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
W \sim N(a\mu)X + b\mu_y, a^2 \sigma^2_X + b^2 \sigma^2_y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other distributions =&lt;br /&gt;
&lt;br /&gt;
[[Two Numerical RVs]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=391</id>
		<title>Discrete Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=391"/>
		<updated>2024-03-19T07:46:34Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Poisson */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Statistics]]&lt;br /&gt;
[[Category:Distribution (Statistics)]]&lt;br /&gt;
A random variable is &#039;&#039;&#039;discrete&#039;&#039;&#039; if the values it can take on within an interval is &#039;&#039;finite&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
= PMF and CDF =&lt;br /&gt;
The &#039;&#039;&#039;probability mass function (PMF)&#039;&#039;&#039; describes the probability distribution over a discrete random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;p(x) = P(X = x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;cumulative distribution function (CDF)&#039;&#039;&#039; specifies the probability of an observation being equal to or less than a given value.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;F(x) = P(X \leq x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We usually have tables for these in the case of discrete random variables.&lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Expected value (mean):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = E(X) = \sum x_i P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Distributions =&lt;br /&gt;
&lt;br /&gt;
== Bernoulli ==&lt;br /&gt;
The &#039;&#039;&#039;bernoulli distribution&#039;&#039;&#039; describes the random variable of an experiment that has two outcomes and is performed once. The outcomes are either &#039;&#039;success&#039;&#039; or &#039;&#039;failure&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Bernoulli(p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== PMF ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
p(1) = p, p(0) = 1 - p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Statistics ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2_X = p (1 - p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Binomial ==&lt;br /&gt;
&lt;br /&gt;
Repeating a bernoulli experiment &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; times and we get a &#039;&#039;&#039;binomial random variable&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Consider an experiment with exactly two possible outcomes, conducted n times independently.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Binomial(n, p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleep I&#039;ll write the details later. It should be on the equation sheet.&lt;br /&gt;
&lt;br /&gt;
= Poisson =&lt;br /&gt;
The &#039;&#039;&#039;poisson distribution&#039;&#039;&#039; is used when we know the &#039;&#039;average rate of occurrence for a particular event over a particular time period.&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* There must be &#039;&#039;&#039;fixed interval&#039;&#039;&#039; of the time or space&lt;br /&gt;
* Events happen with a &#039;&#039;&#039;known average rate&#039;&#039;&#039; independent of time or the last event.&lt;br /&gt;
* The average rate of occurrence per unit of time/sace is the &#039;&#039;&#039;rate parameter&#039;&#039;&#039; &amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Poisson distribution approximates binomial distribution when &#039;&#039;n&#039;&#039; is large and &#039;&#039;p&#039;&#039; is small, used to model rare events. Normally it is used to measure the number of events in a unit time, whereas [[Continuous Random Variable#Exponential Distribution|exponential distribution]] models the amount of waiting time until an event.&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleepy I&#039;ll write the details later... zzz...&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Continuous_Random_Variable&amp;diff=390</id>
		<title>Continuous Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Continuous_Random_Variable&amp;diff=390"/>
		<updated>2024-03-19T07:46:12Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Probability Distribution Function                         f         (         x         )                 {\displaystyle f(x)}     */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Distribution (Statistics)]]&lt;br /&gt;
Continuous random variables have an inifinite number of values for any&lt;br /&gt;
given interval. While similar, the approach to analysis is very&lt;br /&gt;
different from discrete variables&lt;br /&gt;
* Summation becomes integration&lt;br /&gt;
* Probability becomes area under a curve&lt;br /&gt;
&lt;br /&gt;
= Probability Density Function &amp;lt;math&amp;gt; f(x) &amp;lt;/math&amp;gt; =&lt;br /&gt;
&lt;br /&gt;
The probability density function (pdf) maps a continuous variable to a&lt;br /&gt;
probability density.&lt;br /&gt;
&lt;br /&gt;
As the name &amp;quot;density&amp;quot; suggests, the area under the pdf curve between a&lt;br /&gt;
range is the probability of the variable being in that range.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(c \leq x \leq d) = \int_c^d f(x) dx = F(d) - F(c)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Total area under the curve must be &amp;lt;math&amp;gt; 1 &amp;lt;/math&amp;gt;, as chances of&lt;br /&gt;
events happening is 100% if the range includes all possible events.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\int_{-\infty}^\infty f(x) dx = 1&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There is no area under a single point&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(X = a) = 0&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Mean and Variance =&lt;br /&gt;
&lt;br /&gt;
The mean and variance calculations are pretty much the same as that of&lt;br /&gt;
[[Discrete Random Variable|discrete random variables]], except the summations are swapped out for&lt;br /&gt;
integrals.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(X) = \mu_X = \int_{-\infty}^\infty x f(x) dx&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(X) = \sigma^2_X = \int_{-\infty}^\infty (x - \mu_X)^2 f(x) dx&lt;br /&gt;
= \int_{-\infty}^\infty x^2 f(x) dx - \mu_X^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Median and Percentile =&lt;br /&gt;
&lt;br /&gt;
The a-th percentileis the point at which a percent the area under the&lt;br /&gt;
curve is to one side. You want &amp;lt;math&amp;gt; P(X \leq x) &amp;lt;/math&amp;gt; to be a%, the&lt;br /&gt;
calculation of which is in the page above.&lt;br /&gt;
&lt;br /&gt;
By the same logic, the quartiles are at 25%, 50%, and 75% accordingly.&lt;br /&gt;
&lt;br /&gt;
= Uniform Distribution &amp;lt;math&amp;gt; X \sim Uniform(a, b) &amp;lt;/math&amp;gt; =&lt;br /&gt;
Uniform random variable is described by two parameters: &amp;lt;math&amp;gt; a &amp;lt;/math&amp;gt;&lt;br /&gt;
is minimum, and &amp;lt;math&amp;gt; b &amp;lt;/math&amp;gt; is maximum. It has a rectangular&lt;br /&gt;
distribution, where every point has the same probability density.&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(x) = \begin{cases}&lt;br /&gt;
    \frac{ 1 }{ b - a } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    0 &amp;amp; \text{otherwise}&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== CDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
F(x) = \begin{cases}&lt;br /&gt;
    0 &amp;amp; x &amp;lt; a \\&lt;br /&gt;
    \frac{ x - a }{ b - a } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    1 &amp;amp; x &amp;gt; b&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Mean ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu_X = \frac{ a + b }{ 2 }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Variance ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2 = \frac{ 1 }{ 12 } (b - a)^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Exponential Distribution =&lt;br /&gt;
&lt;br /&gt;
The exponential distribution models events that occurs&lt;br /&gt;
* Continuously&lt;br /&gt;
* Independently&lt;br /&gt;
* At a constant average rate&lt;br /&gt;
&lt;br /&gt;
It takes in one parameter: &amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt;, the rate parameter. It&lt;br /&gt;
is defined by the mean below.&lt;br /&gt;
&lt;br /&gt;
Exponential distribution has the &#039;&#039;&#039;memoryless property&#039;&#039;&#039;: the&lt;br /&gt;
probability to an event does not change no matter how much time has&lt;br /&gt;
passed.&lt;br /&gt;
&lt;br /&gt;
In probability terms, the probability that we must wait an&lt;br /&gt;
additional &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; units given that we have waited &amp;lt;math&amp;gt;s&amp;lt;/math&amp;gt;&lt;br /&gt;
units&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P(T &amp;gt; t + s | T &amp;gt; s) = P(T &amp;gt; t) = e^{-\lambda t}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notably, it models time until some event has happened, in contrast to [[Discrete Random Variable#Poisson|poisson distribution]], which measures the number of events in a unit time.&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(x) = \begin{cases}&lt;br /&gt;
    \lambda e ^{ - \lambda x } &amp;amp; a \leq x \leq b \\&lt;br /&gt;
    0 &amp;amp; \text{otherwise}&lt;br /&gt;
\end{cases}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== CDF ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
F(x) = 1 - e^{- \lambda x}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Mean ====&lt;br /&gt;
&lt;br /&gt;
Integration by parts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu_X = \frac{1}{\lambda}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Variance ====&lt;br /&gt;
&lt;br /&gt;
Integration by parts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2 = \frac{ 1 }{ \lambda^2 }&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Exponential and Poisson ==&lt;br /&gt;
&lt;br /&gt;
Exponential distribution and poisson RVs are related:&lt;br /&gt;
* &amp;lt;math&amp;gt;X \sim Poisson(\lambda)&amp;lt;/math&amp;gt;: the number of events in a unit time&lt;br /&gt;
* &amp;lt;math&amp;gt;X \sim Exp(\lambda)&amp;lt;/math&amp;gt;: waiting time until an event&lt;br /&gt;
&lt;br /&gt;
= Normal Random Variable =&lt;br /&gt;
[[File:Z score table.png|thumb|Z score table]]&lt;br /&gt;
Normal random variables are the most widly used continuous RV in&lt;br /&gt;
statistics, characterizing many natural phenomenons. It is the famous&lt;br /&gt;
bell curve.&lt;br /&gt;
&lt;br /&gt;
They are characterized by two parameters: mean and variance.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Y \sim N(\mu_Y, \sigma^2_Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Normal random variables are perfectly symmetric at the mean.&lt;br /&gt;
&lt;br /&gt;
==== Standardizing Normal Distribution ====&lt;br /&gt;
&lt;br /&gt;
Standardization of a data means to make its mean 0 and its standard&lt;br /&gt;
deviation 1. We do this by subtracting the mean and dividing by the&lt;br /&gt;
standard deviation:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Z = \frac{Y - \mu}{\sigma}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Intuitively, this moves the dataset and changes the scale. We do this to&lt;br /&gt;
simplify probability calculations.&lt;br /&gt;
&lt;br /&gt;
==== Z score ====&lt;br /&gt;
&lt;br /&gt;
The z-score is the number of standard deviations above or below the&lt;br /&gt;
mean. A positive z score is above, and a negative is below.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
z = \frac{y - \mu}{\sigma}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== PDF ====&lt;br /&gt;
&lt;br /&gt;
The pdf for normal random variable is the following.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{1}{2} \frac{(y -&lt;br /&gt;
\mu)^2}{\sigma^2}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After standardizing the normal RV, we can use the following instead.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{1}{2} z^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;z&amp;lt;/math&amp;gt; is the z-score covered in the last section.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f(z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Quantiles ====&lt;br /&gt;
&lt;br /&gt;
Quantiles are points dividing the range of a probability&lt;br /&gt;
distribution. Quartiles and precentiles are types of quantiles.&lt;br /&gt;
&lt;br /&gt;
For normal distributions, there are special points (critical values)&lt;br /&gt;
that correspond to particular probabilities: &amp;lt;math&amp;gt;z_a&amp;lt;/math&amp;gt;, where&lt;br /&gt;
&amp;lt;math&amp;gt;a&amp;lt;/math&amp;gt; is the probability in the right tail.&lt;br /&gt;
&lt;br /&gt;
==== Standard Normal Table ====&lt;br /&gt;
&lt;br /&gt;
The standard normal table calculate lower tail values based on the&lt;br /&gt;
standard normal distribution (i.e. area under the curve left of the&lt;br /&gt;
point).&lt;br /&gt;
&lt;br /&gt;
==== Linear Combinations of Independent Normal RV ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
W = aX + bY&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
W \sim N(a\mu)X + b\mu_y, a^2 \sigma^2_X + b^2 \sigma^2_y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Other distributions =&lt;br /&gt;
&lt;br /&gt;
[[Two Numerical RVs]]&lt;br /&gt;
&lt;br /&gt;
[[Category:Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=389</id>
		<title>Discrete Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=389"/>
		<updated>2024-03-19T07:37:05Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Binomial */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Statistics]]&lt;br /&gt;
[[Category:Distribution (Statistics)]]&lt;br /&gt;
A random variable is &#039;&#039;&#039;discrete&#039;&#039;&#039; if the values it can take on within an interval is &#039;&#039;finite&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
= PMF and CDF =&lt;br /&gt;
The &#039;&#039;&#039;probability mass function (PMF)&#039;&#039;&#039; describes the probability distribution over a discrete random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;p(x) = P(X = x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;cumulative distribution function (CDF)&#039;&#039;&#039; specifies the probability of an observation being equal to or less than a given value.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;F(x) = P(X \leq x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We usually have tables for these in the case of discrete random variables.&lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Expected value (mean):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = E(X) = \sum x_i P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Distributions =&lt;br /&gt;
&lt;br /&gt;
== Bernoulli ==&lt;br /&gt;
The &#039;&#039;&#039;bernoulli distribution&#039;&#039;&#039; describes the random variable of an experiment that has two outcomes and is performed once. The outcomes are either &#039;&#039;success&#039;&#039; or &#039;&#039;failure&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Bernoulli(p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== PMF ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
p(1) = p, p(0) = 1 - p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Statistics ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2_X = p (1 - p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Binomial ==&lt;br /&gt;
&lt;br /&gt;
Repeating a bernoulli experiment &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; times and we get a &#039;&#039;&#039;binomial random variable&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Consider an experiment with exactly two possible outcomes, conducted n times independently.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Binomial(n, p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleep I&#039;ll write the details later. It should be on the equation sheet.&lt;br /&gt;
&lt;br /&gt;
= Poisson =&lt;br /&gt;
The &#039;&#039;&#039;poisson distribution&#039;&#039;&#039; is used when we know the &#039;&#039;average rate of occurrence for a particular event over a particular time period.&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* There must be &#039;&#039;&#039;fixed interval&#039;&#039;&#039; of the time or space&lt;br /&gt;
* Events happen with a &#039;&#039;&#039;known average rate&#039;&#039;&#039; independent of time or the last event.&lt;br /&gt;
* The average rate of occurrence per unit of time/sace is the &#039;&#039;&#039;rate parameter&#039;&#039;&#039; &amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Poisson distribution approximates binomial distribution when &#039;&#039;n&#039;&#039; is large and &#039;&#039;p&#039;&#039; is small, used to model rare events.&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleepy I&#039;ll write the details later... zzz...&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=388</id>
		<title>Discrete Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Discrete_Random_Variable&amp;diff=388"/>
		<updated>2024-03-19T07:32:27Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Statistics]]&lt;br /&gt;
[[Category:Distribution (Statistics)]]&lt;br /&gt;
A random variable is &#039;&#039;&#039;discrete&#039;&#039;&#039; if the values it can take on within an interval is &#039;&#039;finite&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
= PMF and CDF =&lt;br /&gt;
The &#039;&#039;&#039;probability mass function (PMF)&#039;&#039;&#039; describes the probability distribution over a discrete random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;p(x) = P(X = x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;cumulative distribution function (CDF)&#039;&#039;&#039; specifies the probability of an observation being equal to or less than a given value.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;F(x) = P(X \leq x)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We usually have tables for these in the case of discrete random variables.&lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Expected value (mean):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = E(X) = \sum x_i P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Distributions =&lt;br /&gt;
&lt;br /&gt;
== Bernoulli ==&lt;br /&gt;
The &#039;&#039;&#039;bernoulli distribution&#039;&#039;&#039; describes the random variable of an experiment that has two outcomes and is performed once. The outcomes are either &#039;&#039;success&#039;&#039; or &#039;&#039;failure&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Bernoulli(p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== PMF ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
p(1) = p, p(0) = 1 - p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Statistics ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\mu = p&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\sigma^2_X = p (1 - p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Binomial ==&lt;br /&gt;
&lt;br /&gt;
Repeating a bernoulli experiment &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; times and we get a &#039;&#039;&#039;binomial random variable&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
Consider an experiment with exactly two possible outcomes, conducted n times independently.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
X \sim Binomial(n, p)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleep I&#039;ll write the details later. It should be on the equation sheet.&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=387</id>
		<title>Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=387"/>
		<updated>2024-03-19T06:39:01Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Linear Combinations */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A &#039;&#039;&#039;random&#039;&#039;&#039; variable is a numerical variable whose outcome is the result of a random process (i.e. we don&#039;t know what will happen for certain). Notably, the numerical interpretation of the outcome of an [[Probability#Experiment and Events|experiment]] is a random variable. They come in two types, deserving their own pages. See &lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Random variables has several statistics that we care about. See pages [[Continuous Random Variable]] and [[Discrete Random Variable]] for how to calculate these values.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the average outcome of a random variable, we look at the &#039;&#039;&#039;expected value&#039;&#039;&#039; (mean): a weighted average of the possible outcomes.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the variability of a random variable, we look at the &#039;&#039;&#039;variance&#039;&#039;&#039; and the &#039;&#039;&#039;standard deviation&#039;&#039;&#039;: the latter being the expected difference from mean, and the former being the square of the latter (not really used for interpretation, more for math).&lt;br /&gt;
&lt;br /&gt;
= Properties of Statistics =&lt;br /&gt;
I&#039;m just gonna start writing equations.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(c) = c&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX) = aE(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX + c) = aE(X) + c&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(X) = E((X - \mu)^2) = E(X^2) - E(X)^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Cool name for this: the &#039;&#039;law of the unconscious statistician&#039;&#039; for how obvious it is&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(g(X)) = \sum g(x_i) P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(c) = 0&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX) = a^2 Var(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX + c) = a^2 Var(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Linear Combinations =&lt;br /&gt;
A &#039;&#039;&#039;linear combination&#039;&#039;&#039; of random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is given as&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;aX + bY&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &#039;&#039;a&#039;&#039; and &#039;&#039;b&#039;&#039; are constants. They are &#039;&#039;not&#039;&#039; [[bivariate]]&lt;br /&gt;
&lt;br /&gt;
The expectation is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX + bY) = aE(X) + bE(Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
whereas the variance is&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX + bY) = a^2 Var(X) + ab Cov(X,Y) + b^2 Var(Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I&#039;m sleepy so I&#039;m not gonna derive this one.&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=386</id>
		<title>Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=386"/>
		<updated>2024-03-19T06:38:04Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Linear Combinations */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A &#039;&#039;&#039;random&#039;&#039;&#039; variable is a numerical variable whose outcome is the result of a random process (i.e. we don&#039;t know what will happen for certain). Notably, the numerical interpretation of the outcome of an [[Probability#Experiment and Events|experiment]] is a random variable. They come in two types, deserving their own pages. See &lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Random variables has several statistics that we care about. See pages [[Continuous Random Variable]] and [[Discrete Random Variable]] for how to calculate these values.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the average outcome of a random variable, we look at the &#039;&#039;&#039;expected value&#039;&#039;&#039; (mean): a weighted average of the possible outcomes.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the variability of a random variable, we look at the &#039;&#039;&#039;variance&#039;&#039;&#039; and the &#039;&#039;&#039;standard deviation&#039;&#039;&#039;: the latter being the expected difference from mean, and the former being the square of the latter (not really used for interpretation, more for math).&lt;br /&gt;
&lt;br /&gt;
= Properties of Statistics =&lt;br /&gt;
I&#039;m just gonna start writing equations.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(c) = c&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX) = aE(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX + c) = aE(X) + c&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(X) = E((X - \mu)^2) = E(X^2) - E(X)^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Cool name for this: the &#039;&#039;law of the unconscious statistician&#039;&#039; for how obvious it is&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(g(X)) = \sum g(x_i) P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(c) = 0&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX) = a^2 Var(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX + c) = a^2 Var(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Linear Combinations =&lt;br /&gt;
A &#039;&#039;&#039;linear combination&#039;&#039;&#039; of random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is given as&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;aX + bY&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &#039;&#039;a&#039;&#039; and &#039;&#039;b&#039;&#039; are constants. They are &#039;&#039;not&#039;&#039; [[bivariate]]&lt;br /&gt;
&lt;br /&gt;
The expectation is&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX + bY) = aE(X) + bE(Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX + bY) = a^2 Var(X) + ab Cov(X,Y) + b^2 Var(Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=385</id>
		<title>Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=385"/>
		<updated>2024-03-19T06:36:49Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A &#039;&#039;&#039;random&#039;&#039;&#039; variable is a numerical variable whose outcome is the result of a random process (i.e. we don&#039;t know what will happen for certain). Notably, the numerical interpretation of the outcome of an [[Probability#Experiment and Events|experiment]] is a random variable. They come in two types, deserving their own pages. See &lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Random variables has several statistics that we care about. See pages [[Continuous Random Variable]] and [[Discrete Random Variable]] for how to calculate these values.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the average outcome of a random variable, we look at the &#039;&#039;&#039;expected value&#039;&#039;&#039; (mean): a weighted average of the possible outcomes.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the variability of a random variable, we look at the &#039;&#039;&#039;variance&#039;&#039;&#039; and the &#039;&#039;&#039;standard deviation&#039;&#039;&#039;: the latter being the expected difference from mean, and the former being the square of the latter (not really used for interpretation, more for math).&lt;br /&gt;
&lt;br /&gt;
= Properties of Statistics =&lt;br /&gt;
I&#039;m just gonna start writing equations.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(c) = c&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX) = aE(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX + c) = aE(X) + c&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(X) = E((X - \mu)^2) = E(X^2) - E(X)^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Cool name for this: the &#039;&#039;law of the unconscious statistician&#039;&#039; for how obvious it is&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(g(X)) = \sum g(x_i) P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(c) = 0&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX) = a^2 Var(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX + c) = a^2 Var(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Linear Combinations =&lt;br /&gt;
A &#039;&#039;&#039;linear combination&#039;&#039;&#039; of random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; is given as&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;aX + bY&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &#039;&#039;a&#039;&#039; and &#039;&#039;b&#039;&#039; are constants. They are &#039;&#039;not&#039;&#039; [[bivariate]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=384</id>
		<title>Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=384"/>
		<updated>2024-03-19T06:35:04Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Statistics */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A &#039;&#039;&#039;random&#039;&#039;&#039; variable is a numerical variable whose outcome is the result of a random process (i.e. we don&#039;t know what will happen for certain). Notably, the numerical interpretation of the outcome of an [[Probability#Experiment and Events|experiment]] is a random variable. They come in two types, deserving their own pages. See &lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Random variables has several statistics that we care about. See pages [[Continuous Random Variable]] and [[Discrete Random Variable]] for how to calculate these values.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the average outcome of a random variable, we look at the &#039;&#039;&#039;expected value&#039;&#039;&#039; (mean): a weighted average of the possible outcomes.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the variability of a random variable, we look at the &#039;&#039;&#039;variance&#039;&#039;&#039; and the &#039;&#039;&#039;standard deviation&#039;&#039;&#039;: the latter being the expected difference from mean, and the former being the square of the latter (not really used for interpretation, more for math).&lt;br /&gt;
&lt;br /&gt;
= Properties of Statistics =&lt;br /&gt;
I&#039;m just gonna start writing equations.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(c) = c&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX) = aE(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(aX + c) = aE(X) + c&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(X) = E((X - \mu)^2) = E(X^2) - E(X)^2&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Cool name for this: the &#039;&#039;law of the unconscious statistician&#039;&#039; for how obvious it is&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(g(X)) = \sum g(x_i) P(X = x_i)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(c) = 0&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX) = a^2 Var(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(aX + c) = a^2 Var(X)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=383</id>
		<title>Random Variable</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Random_Variable&amp;diff=383"/>
		<updated>2024-03-19T06:30:05Z</updated>

		<summary type="html">&lt;p&gt;Admin: Created page with &amp;quot;A &amp;#039;&amp;#039;&amp;#039;random&amp;#039;&amp;#039;&amp;#039; variable is a numerical variable whose outcome is the result of a random process (i.e. we don&amp;#039;t know what will happen for certain). Notably, the numerical interpretation of the outcome of an experiment is a random variable. They come in two types, deserving their own pages. See   = Statistics = Random variables has several statistics that we care about. See pages Continuous Random Variable and Discrete Random Var...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A &#039;&#039;&#039;random&#039;&#039;&#039; variable is a numerical variable whose outcome is the result of a random process (i.e. we don&#039;t know what will happen for certain). Notably, the numerical interpretation of the outcome of an [[Probability#Experiment and Events|experiment]] is a random variable. They come in two types, deserving their own pages. See &lt;br /&gt;
&lt;br /&gt;
= Statistics =&lt;br /&gt;
Random variables has several statistics that we care about. See pages [[Continuous Random Variable]] and [[Discrete Random Variable]] for how to calculate these values.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the average outcome of a random variable, we look at the &#039;&#039;&#039;expected value&#039;&#039;&#039; (mean): a weighted average of the possible outcomes.&lt;br /&gt;
&lt;br /&gt;
When we are interested in the variability of a random variable, we look at the &#039;&#039;&#039;variance&#039;&#039;&#039; and the &#039;&#039;&#039;standard deviation&#039;&#039;&#039;: the latter being the expected difference from mean, and the former being the square of the latter (not really used for interpretation, more for math).&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Variable_(Statistics)&amp;diff=382</id>
		<title>Variable (Statistics)</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Variable_(Statistics)&amp;diff=382"/>
		<updated>2024-03-19T06:25:29Z</updated>

		<summary type="html">&lt;p&gt;Admin: /* Random Variable */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;In statistics, a &#039;&#039;&#039;variable&#039;&#039;&#039; is a characteristic of a subject that varies in a &#039;&#039;non-random&#039;&#039; way.&lt;br /&gt;
&lt;br /&gt;
= Overview and Related Definitions =&lt;br /&gt;
&lt;br /&gt;
At the top level of statistics, we investigate a &#039;&#039;&#039;population&#039;&#039;&#039;: a set of units that we are interested in studying.&lt;br /&gt;
&lt;br /&gt;
Populations are almost always impossible to study due to their massive size and other constraints. Therefore, we take a &#039;&#039;&#039;sample&#039;&#039;&#039;: a subset of the population. &lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;&#039;subject&#039;&#039;&#039; is a unit that we study in a population or a sample, and a &#039;&#039;&#039;variable&#039;&#039;&#039; is a particular characteristic of the subject that we are interested in studying.&lt;br /&gt;
&lt;br /&gt;
= Types of Variables =&lt;br /&gt;
&lt;br /&gt;
There are two types of variables, each with two sub-categories that have useful properties:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Quantitative/Numerical&#039;&#039;&#039; variables are measured with numbers. Their sum has meaning.&lt;br /&gt;
** &#039;&#039;&#039;Continuous&#039;&#039;&#039; numerical variables can theoretically take on any number within an interval, whereas&lt;br /&gt;
** &#039;&#039;&#039;Discrete&#039;&#039;&#039; numerical variables have natural gaps&lt;br /&gt;
* &#039;&#039;&#039;Qualitative/Categorical&#039;&#039;&#039; variables are measured as labels. Their sum does not have meaning.&lt;br /&gt;
** &#039;&#039;&#039;Ordinal&#039;&#039;&#039; categorical variables have a natural ordering, whereas&lt;br /&gt;
** &#039;&#039;&#039;Nominal&#039;&#039;&#039; categorical variables do not&lt;br /&gt;
&lt;br /&gt;
It is possible for a categorical variable to be denoted with numbers; a common example would be an ID number. The biggest difference between categorical variables denoted as numbers and numerical variables is the fact that the sum/mean of categorical variables does not have meaning, whereas that of numerical variables do.&lt;br /&gt;
&lt;br /&gt;
= Notation =&lt;br /&gt;
&lt;br /&gt;
A capitalized character (usually &amp;lt;math&amp;gt;X, Y, Z&amp;lt;/math&amp;gt;) is used to denote &#039;&#039;all possible values&#039;&#039; of a variable. This is called a &#039;&#039;&#039;major&#039;&#039;&#039;. When we say &amp;quot;variable&amp;quot;, we usually mean this.&lt;br /&gt;
&lt;br /&gt;
A lower case character corresponding to the major is used to denote a specific value of that major (such as &amp;lt;math&amp;gt;x, y, z&amp;lt;/math&amp;gt;. This is called a &#039;&#039;&#039;statistic&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
= Random Variable =&lt;br /&gt;
A &#039;&#039;&#039;random&#039;&#039;&#039; variable is a numerical variable whose outcome is the result of a random process (i.e. we don&#039;t know what will happen for certain). See [[Random Variable]].&lt;br /&gt;
[[Category:Statistics]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
	<entry>
		<id>http://ricefriedegg.com:80/mediawiki/index.php?title=Bivariate&amp;diff=381</id>
		<title>Bivariate</title>
		<link rel="alternate" type="text/html" href="http://ricefriedegg.com:80/mediawiki/index.php?title=Bivariate&amp;diff=381"/>
		<updated>2024-03-19T06:24:03Z</updated>

		<summary type="html">&lt;p&gt;Admin: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[Category:Distribution (Statistics)]][[Category:Statistics]]&lt;br /&gt;
&#039;&#039;&#039;Bivariate&#039;&#039;&#039; data consider two variables instead of the usual one; each value of one of the variables is paired with a value of the other variable. We will be using &amp;lt;math&amp;gt;X, Y&amp;lt;/math&amp;gt; to denote the two random variables throughout this page.&lt;br /&gt;
&lt;br /&gt;
= Summary Statistics =&lt;br /&gt;
To summarize bivariate data, we use covariance and correlation in addition to the statistics detailed in [[Summary Statistics]].&lt;br /&gt;
&lt;br /&gt;
== Covariance ==&lt;br /&gt;
The &#039;&#039;&#039;covariance&#039;&#039;&#039; measures the total variation of two RVs and their centers. It indicates the relationship of two variables whenever one changes, measuring how much the two vary together.&lt;br /&gt;
&lt;br /&gt;
We have &#039;&#039;sample covariance&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;s^2_{X, Y} = \hat{cov}(X, Y) = \frac{1}{n - 1} \sum(x_i - \bar{x}) (y_i - \bar{y}) = \frac{1}{n - 1} \left( \sum x_i y_i - n \bar{x} \bar{y} \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A good way of thinking about covariance is by cases:&lt;br /&gt;
&lt;br /&gt;
If x &#039;&#039;increases&#039;&#039; as y &#039;&#039;increases&#039;&#039;, the signs of both terms of the covariance calculation is the same. Therefore, covariance is &#039;&#039;positive&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
If x &#039;&#039;decreases&#039;&#039; as y &#039;&#039;increases&#039;&#039;, the signs are different. Therefore, covariance is &#039;&#039;negative&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
If x does not clearly vary with y, the signs are sometimes different, sometimes the same. Overall, it should cancel out to &#039;&#039;zero.&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Correlation ==&lt;br /&gt;
The &#039;&#039;&#039;correlation&#039;&#039;&#039; of two random variables measures the &#039;&#039;&#039;line&#039;&#039;&#039;&lt;br /&gt;
dependent&#039;&#039;&#039; between &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Cor(X, Y) = \rho = \frac{Cov(X,Y)}{sd(X) sd(Y)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Correlation is always between -1 and 1. When r = 1, the relationship between X and Y is &#039;&#039;&#039;perfect positive linear&#039;&#039;&#039;. When r = -1, it is &#039;&#039;&#039;perfect negative linear&#039;&#039;&#039;. If it is 0, there is no linear relationship. This doesn&#039;t mean that there is no relationship. Notably, any symmetric scatter plot has a correlation of 0.&lt;br /&gt;
&lt;br /&gt;
= Bivariate Normal =&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;bivariate normal&#039;&#039;&#039; (aka. bivariate gaussian) is one special type&lt;br /&gt;
of continuous random variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;(X, Y)&amp;lt;/math&amp;gt; is &#039;&#039;bivariate normal&#039;&#039; if&lt;br /&gt;
&lt;br /&gt;
# The marginal PDF of both X and Y are normal&lt;br /&gt;
# For any &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt;, the condition PDF of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;X = x&amp;lt;/math&amp;gt; is Normal&lt;br /&gt;
** Works the other way around: Bivariate gaussian means that condition is satisfied&lt;br /&gt;
&lt;br /&gt;
== Predicting Y given X ==&lt;br /&gt;
&lt;br /&gt;
Given bivariate normal, we can predict one variable given another.&lt;br /&gt;
Let us try estimating the expected Y given X is x&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
E(Y| X = x)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are three main methods&lt;br /&gt;
* Scatter plot approximation&lt;br /&gt;
* Joint PDF&lt;br /&gt;
* 5 statistics&lt;br /&gt;
&lt;br /&gt;
=== 5 Parameters ===&lt;br /&gt;
&lt;br /&gt;
We need to know 5 parameters about &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;E(X), sd(X), E(Y), sd(Y), \rho&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If &amp;lt;math&amp;gt;X, Y&amp;lt;/math&amp;gt; follows bivariate normal distribution, then we&lt;br /&gt;
have&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\left( \frac{E(Y|X = x) - E(Y)}{sd(Y)} \right) = \rho \left( \frac{x -&lt;br /&gt;
E(X)}{sd(X)} \right)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The left side is the &#039;&#039;predicted Z-score for Y&#039;&#039;, and the right side is&lt;br /&gt;
&#039;&#039;the product of correlation and Z-score of X = x&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The variance is given by&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Var(Y | X = x) = (1 - \rho^2) Var(Y)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Due to the range of &amp;lt;math&amp;gt;\rho&amp;lt;/math&amp;gt;, the variance of Y given X is&lt;br /&gt;
always smaller than the actual variance. The standard deviation is just&lt;br /&gt;
rooted that.&lt;br /&gt;
&lt;br /&gt;
== Regression Effect ==&lt;br /&gt;
[[File:Regression Effect Scatter Plot.png|thumb|Regression effect demonstrated by SD line and Regression line]]&lt;br /&gt;
The &#039;&#039;&#039;regression effect&#039;&#039;&#039; is the phenomenon that the best prediction&lt;br /&gt;
of &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; given &amp;lt;math&amp;gt;X = x&amp;lt;/math&amp;gt; is less rare for&lt;br /&gt;
&amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; than &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt;; Future predictions regress to&lt;br /&gt;
mediocrity.&lt;br /&gt;
&lt;br /&gt;
When you plot all the predicted &amp;lt;math&amp;gt;E(Y|X = x)&amp;lt;/math&amp;gt;, you get the&lt;br /&gt;
&#039;&#039;&#039;linear regression line&#039;&#039;&#039;. The regression effect can be demonstrated&lt;br /&gt;
by also plotting the SD line (where the correlation is not applied).&lt;br /&gt;
&lt;br /&gt;
= Linear Regression =&lt;br /&gt;
&lt;br /&gt;
== Assumption ==&lt;br /&gt;
&lt;br /&gt;
# X and Y have a linear relationship&lt;br /&gt;
# A random sample of pairs was taken&lt;br /&gt;
# All pairs of data are independent&lt;br /&gt;
# The variance of the error is constant. &amp;lt;math&amp;gt;Var(\epsilon) = \sigma_\epsilon^2&amp;lt;/math&amp;gt;&lt;br /&gt;
# The average of the errors is zero. &amp;lt;math&amp;gt;E(\epsilon) = 0&amp;lt;/math&amp;gt;&lt;br /&gt;
# The errors are normally distributed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\varepsilon \sim^{iid} N(0, \sigma_\epsilon^2), Y_i \sim^{iid} N(\beta_0&lt;br /&gt;
+ \beta_1 x_i, \sigma_\epsilon^2)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Procedure ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
y_i = \beta_0 + \beta_1 x_i + \epsilon_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the &amp;lt;math&amp;gt;\beta_0, \beta_1&amp;lt;/math&amp;gt; are &#039;&#039;&#039;regression&lt;br /&gt;
coefficients&#039;&#039;&#039; (slope, intercept) based on the population, and&lt;br /&gt;
&amp;lt;math&amp;gt;\epsilon_i&amp;lt;/math&amp;gt; is error for the i-th subject.&lt;br /&gt;
&lt;br /&gt;
We want to estimate the regression coefficients.&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;\hat{y_i}&amp;lt;/math&amp;gt; be an estimation of &amp;lt;math&amp;gt;y_i&amp;lt;/math&amp;gt;; a&lt;br /&gt;
prediction at &amp;lt;math&amp;gt;X = x&amp;lt;/math&amp;gt;, with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\hat{y_i} = \hat{\beta_0} + \hat{\beta_1} x_i&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can measure the vertical error &amp;lt;math&amp;gt;e_i = y_i - \hat{y_i}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The overall error is the sum of squared errors &amp;lt;math&amp;gt;SSE = \sum_i^n&lt;br /&gt;
e_i^2&amp;lt;/math&amp;gt;. The best fit line is the line minimizing SSE.&lt;br /&gt;
&lt;br /&gt;
Using calculus, we can find that the line has the following scope and&lt;br /&gt;
intercept:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\hat{\beta_1} = r \frac{s_y}{s_x}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;r&amp;lt;/math&amp;gt; is the strength of linear relationship, and&lt;br /&gt;
&amp;lt;math&amp;gt;s_x, s_y&amp;lt;/math&amp;gt; is the deviations of the sample. They are&lt;br /&gt;
basically the sample versions of &amp;lt;math&amp;gt;\rho, \sigma&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\hat{\beta_0} = \bar{Y} - \hat{\beta_1} \bar{X}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interpretation ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\beta_1&amp;lt;/math&amp;gt; (the slope) is the estimated change in&lt;br /&gt;
&amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; when &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; changes by one unit.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;\beta_0&amp;lt;/math&amp;gt; (the intercept) is the estimated average of&lt;br /&gt;
&amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; when &amp;lt;math&amp;gt;X = 0&amp;lt;/math&amp;gt;. If &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; cannot be 0,&lt;br /&gt;
this may not have a practical meaning.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;r^2&amp;lt;/math&amp;gt; (&#039;&#039;&#039;coefficient of determination&#039;&#039;&#039;) measures how good&lt;br /&gt;
the line fits the data.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
r^2 = \frac{\sum (\hat{y_i} - \bar{Y})^2 }{\sum (y_i - \bar{Y})^2}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The bottom is total variance. The top is reduced. The value is the&lt;br /&gt;
proportion of variance in &amp;lt;math&amp;gt;y&amp;lt;/math&amp;gt; that is explained by the linear&lt;br /&gt;
relationship between &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;.&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
</feed>