Principal Component Analysis (PCA) is used to summarize the information in a data set described by multiple variables. PCA reduces the dimensionality of data containing a large set of variables. This is achieved by transforming the initial variables into a new, smaller set of variables without losing the most important information in the original data set. These new variables correspond to a linear combination of the originals and are called principal components.
PCA is particularly useful when the variables in the data set are highly correlated. Correlation indicates redundancy in the data. Because of this, PCA can reduce the original variables into a smaller number of new variables (principal components) that explain most of the variance in the original variables.
The USAarrests dataset contains crime statistics for 50 US states in 1973, featuring four variables: Murder, Assault, UrbanPop, and Rape. This dataset is commonly used in Principal Component Analysis (PCA) demonstrations due to its small size and highly correlated variables. Source: kaggle Dataset and is listed below
City Murder Assault UrbanPop RapeAlabama 13.2 236 58 21.2 Alaska 10 263 48 44.5 Arizona 8.1 294 80 31 Arkansas 8.8 190 50 19.5 California 9 276 91 40.6 Colorado 7.9 204 78 38.7 Connecticut 3.3 110 77 11.1 Delaware 5.9 238 72 15.8 Florida 15.4 335 80 31.9 Georgia 17.4 211 60 25.8 Hawaii 5.3 46 83 20.2 Idaho 2.6 120 54 14.2 Illinois 10.4 249 83 24 Indiana 7.2 113 65 21 Iowa 2.2 56 57 11.3 Kansas 6 115 66 18 Kentucky 9.7 109 52 16.3 Louisiana 15.4 249 66 22.2 Maine 2.1 83 51 7.8 Maryland 11.3 300 67 27.8 Massachusetts 4.4 149 85 16.3 Michigan 12.1 255 74 35.1 Minnesota 2.7 72 66 14.9 Mississippi 16.1 259 44 17.1 Missouri 9 178 70 28.2 Montana 6 109 53 16.4 Nebraska 4.3 102 62 16.5 Nevada 12.2 252 81 46 NewHampshire 2.1 57 56 9.5 NewJersey 7.4 159 89 18.8 NewMexico 11.4 285 70 32.1 NewYork 11.1 254 86 26.1 NorthCarolina 13 337 45 16.1 NorthDakota 0.8 45 44 7.3 Ohio 7.3 120 75 21.4 Oklahoma 6.6 151 68 20 Oregon 4.9 159 67 29.3 Pennsylvania 6.3 106 72 14.9 RhodeIsland 3.4 174 87 8.3 SouthCarolina 14.4 279 48 22.5 SouthDakota 3.8 86 45 12.8 Tennessee 13.2 188 59 26.9 Texas 12.7 201 80 25.5 Utah 3.2 120 80 22.9 Vermont 2.2 48 32 11.2 Virginia 8.5 156 63 20.7 Washington 4 145 73 26.2 WestVirginia 5.7 81 39 9.3 Wisconsin 2.6 53 66 10.8 Wyoming 6.8 161 60 15.6
PCA is widely used in various fields including agriculture, biology, economics, and image processing, where it helps reduce data complexity while preserving important trends and patterns. In agricultural research, PCA can be used to analyze traits in plants, categorize environmental effects, and explore relationships among different characters in crops. Similarly, it is employed in genetics to study gene expression data and in economics for stock market analysis.
Enter or paste the data for PCA in the text area provided on the webpage under the heading "Please Enter or Paste Data". The data should be arranged so that the first observations of all characters appear on the first line, separated by spaces. The second line should contain the second observations, and so on. Do not include character names in this area. For character names, use the separate text area under the heading "Enter Character Names." Character names should be short and entered in the exact sequence as the characters in the data.
To use the USAarrests dataset in our tool, you would enter the data like this:
13.2 236 58 21.2 10 263 48 44.5 8.1 294 80 31 8.8 190 50 19.5 9 276 91 40.6 7.9 204 78 38.7 3.3 110 77 11.1 5.9 238 72 15.8 15.4 335 80 31.9 17.4 211 60 25.8 5.3 46 83 20.2 2.6 120 54 14.2 10.4 249 83 24 7.2 113 65 21 2.2 56 57 11.3 6 115 66 18 9.7 109 52 16.3 15.4 249 66 22.2 2.1 83 51 7.8 11.3 300 67 27.8 4.4 149 85 16.3 12.1 255 74 35.1 2.7 72 66 14.9 16.1 259 44 17.1 9 178 70 28.2 6 109 53 16.4 4.3 102 62 16.5 12.2 252 81 46 2.1 57 56 9.5 7.4 159 89 18.8 11.4 285 70 32.1 11.1 254 86 26.1 13 337 45 16.1 0.8 45 44 7.3 7.3 120 75 21.4 6.6 151 68 20 4.9 159 67 29.3 6.3 106 72 14.9 3.4 174 87 8.3 14.4 279 48 22.5 3.8 86 45 12.8 13.2 188 59 26.9 12.7 201 80 25.5 3.2 120 80 22.9 2.2 48 32 11.2 8.5 156 63 20.7 4 145 73 26.2 5.7 81 39 9.3 2.6 53 66 10.8 6.8 161 60 15.6Copy Data
And enter the character names as:
Murder Assault UrbanPop Rape
This input format allows you to easily apply PCA to real-world datasets like USAarrests and gain valuable insights into your data.
After submitting the data, the module will display another webpage asking for the number of characters and the number of observations per character. Enter the required information in the provided text boxes and click "Analyse".
Once the data is analyzed, the results of the PCA will be displayed on a new page, showing the principal components and their contribution to the variance.