Health vs. Attitudinal Analysis

Unsupervised Learning: Canonical Correlation Analysis
Canonical Correlation Analysis (CCA) is a Multivariate Statistics technique that allows you to analyze correlations between two datasets. Canonical Correlation Analysis can be used to model the correlations between two datasets in two ways:


For this analysis, we use CCA to show linear correlations between the two sets of variables--health dataset and attitudinal dataset. 

Using Wilks’ Lambda for CCA 1, we get an f-test of 0.7999, approx. of 5.2003, a df1 of 20, df2 of 1493.43, and p-value of 7.8537e-13. We reject the null hypothesis since the CCA 1 does not equal zero because the p- value is below 0.05 

Using Wilks’ Lambda for CCA2, we get an f-test of 0.9323, approx. of 2.67219, a df1 of 12, a df2 of 1193.53, and a p-value: 1.4986e-03. We reject the null hypothesis since the CCA 2 does not equal zero because the p-value is below 0.05

As seen for above the first strongest CCA 1between Health, X, and Attitudinal, Y is a mere 0.3768. While from the second CCA 2 is a low of 0.2471.

And as can be seen in the plots in the appendix the scatterplot of the Y and X scores do not show a strong relationship for both CCA 1 and CCA 2

For Health, the variables that contribute the most is mental health with the score of 0.98 and coefficient of 0.0123. It has a mediocre correlation, 0.506, to physical health.

As for Attitudinal, the variable that contribute the most is control with the score of 0.78 and coefficient of 0.022. It has a weak correlation, 0.347, with self-esteem.

In conclusion, there is a weak CC between Health and Attitudinal.

For CCA 1:

Health,

X = 0.0123*menheal-0.0016*phyheal-0.0004*timedrs+0.0019*attdrug-0.0006*druguse

Attitudinal,

Y =0.022*control+0.0028*attmar+0.0028*esteem-0.0009*attrole

For CCA 2:

Health,

X = -0.001*menheal+0.0048*phyheal+0.0009*timedrs-0.035*attdrug+0.0039*druguse

Attitudinal,

Y = -0.0003*control+0.0024*attmar-0.0079*esteem+0.004*attrole

Appendix