Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is used to summarize the information in a data set described by multiple variables. PCA reduces the dimensionality of data containing a large set of variables. This is achieved by transforming the initial variables into a new, smaller set of variables without losing the most important information in the original data set. These new variables correspond to a linear combination of the originals and are called principal components.

Key Goals of PCA:

PCA is particularly useful when the variables in the data set are highly correlated. Correlation indicates redundancy in the data. Because of this, PCA can reduce the original variables into a smaller number of new variables (principal components) that explain most of the variance in the original variables.

Example Dataset: USAarrests

The USAarrests dataset contains crime statistics for 50 US states in 1973, featuring four variables: Murder, Assault, UrbanPop, and Rape. This dataset is commonly used in Principal Component Analysis (PCA) demonstrations due to its small size and highly correlated variables. Source: kaggle Dataset and is listed below

City	      Murder  Assault UrbanPop	Rape
Alabama 13.2 236 58 21.2 Alaska 10 263 48 44.5 Arizona 8.1 294 80 31 Arkansas 8.8 190 50 19.5 California 9 276 91 40.6 Colorado 7.9 204 78 38.7 Connecticut 3.3 110 77 11.1 Delaware 5.9 238 72 15.8 Florida 15.4 335 80 31.9 Georgia 17.4 211 60 25.8 Hawaii 5.3 46 83 20.2 Idaho 2.6 120 54 14.2 Illinois 10.4 249 83 24 Indiana 7.2 113 65 21 Iowa 2.2 56 57 11.3 Kansas 6 115 66 18 Kentucky 9.7 109 52 16.3 Louisiana 15.4 249 66 22.2 Maine 2.1 83 51 7.8 Maryland 11.3 300 67 27.8 Massachusetts 4.4 149 85 16.3 Michigan 12.1 255 74 35.1 Minnesota 2.7 72 66 14.9 Mississippi 16.1 259 44 17.1 Missouri 9 178 70 28.2 Montana 6 109 53 16.4 Nebraska 4.3 102 62 16.5 Nevada 12.2 252 81 46 NewHampshire 2.1 57 56 9.5 NewJersey 7.4 159 89 18.8 NewMexico 11.4 285 70 32.1 NewYork 11.1 254 86 26.1 NorthCarolina 13 337 45 16.1 NorthDakota 0.8 45 44 7.3 Ohio 7.3 120 75 21.4 Oklahoma 6.6 151 68 20 Oregon 4.9 159 67 29.3 Pennsylvania 6.3 106 72 14.9 RhodeIsland 3.4 174 87 8.3 SouthCarolina 14.4 279 48 22.5 SouthDakota 3.8 86 45 12.8 Tennessee 13.2 188 59 26.9 Texas 12.7 201 80 25.5 Utah 3.2 120 80 22.9 Vermont 2.2 48 32 11.2 Virginia 8.5 156 63 20.7 Washington 4 145 73 26.2 WestVirginia 5.7 81 39 9.3 Wisconsin 2.6 53 66 10.8 Wyoming 6.8 161 60 15.6

Applications of PCA:

PCA is widely used in various fields including agriculture, biology, economics, and image processing, where it helps reduce data complexity while preserving important trends and patterns. In agricultural research, PCA can be used to analyze traits in plants, categorize environmental effects, and explore relationships among different characters in crops. Similarly, it is employed in genetics to study gene expression data and in economics for stock market analysis.

Steps to Perform PCA:

Step 1: Enter Data

Enter or paste the data for PCA in the text area provided on the webpage under the heading "Please Enter or Paste Data". The data should be arranged so that the first observations of all characters appear on the first line, separated by spaces. The second line should contain the second observations, and so on. Do not include character names in this area. For character names, use the separate text area under the heading "Enter Character Names." Character names should be short and entered in the exact sequence as the characters in the data.

Example Input for USAarrests Dataset

To use the USAarrests dataset in our tool, you would enter the data like this:

13.2	236	58	21.2
10	263	48	44.5
8.1	294	80	31
8.8	190	50	19.5
9	276	91	40.6
7.9	204	78	38.7
3.3	110	77	11.1
5.9	238	72	15.8
15.4	335	80	31.9
17.4	211	60	25.8
5.3	46	83	20.2
2.6	120	54	14.2
10.4	249	83	24
7.2	113	65	21
2.2	56	57	11.3
6	115	66	18
9.7	109	52	16.3
15.4	249	66	22.2
2.1	83	51	7.8
11.3	300	67	27.8
4.4	149	85	16.3
12.1	255	74	35.1
2.7	72	66	14.9
16.1	259	44	17.1
9	178	70	28.2
6	109	53	16.4
4.3	102	62	16.5
12.2	252	81	46
2.1	57	56	9.5
7.4	159	89	18.8
11.4	285	70	32.1
11.1	254	86	26.1
13	337	45	16.1
0.8	45	44	7.3
7.3	120	75	21.4
6.6	151	68	20
4.9	159	67	29.3
6.3	106	72	14.9
3.4	174	87	8.3
14.4	279	48	22.5
3.8	86	45	12.8
13.2	188	59	26.9
12.7	201	80	25.5
3.2	120	80	22.9
2.2	48	32	11.2
8.5	156	63	20.7
4	145	73	26.2
5.7	81	39	9.3
2.6	53	66	10.8
6.8	161	60	15.6
Copy Data

And enter the character names as:

Murder 
Assault 
UrbanPop 
Rape

This input format allows you to easily apply PCA to real-world datasets like USAarrests and gain valuable insights into your data.

Step 2: Provide Information

After submitting the data, the module will display another webpage asking for the number of characters and the number of observations per character. Enter the required information in the provided text boxes and click "Analyse".

Step 3: View Results

Once the data is analyzed, the results of the PCA will be displayed on a new page, showing the principal components and their contribution to the variance.

Web hosting by Somee.com