1. CA Group Regressions

## [1] "There are 121993 individuals without missing data in this analysis."
##                                         Std Beta   SE         p
## (Intercept)                                 0.02 0.00  1.35e-04
## CA_GroupHigh_CA                             0.01 0.01  3.36e-01
## CA_GroupLow_CA                             -0.12 0.03  7.45e-06
## scale(max_age_MHQ)                          0.07 0.00 1.72e-111
## Sex                                         0.02 0.01  2.73e-03
## I(scale(max_age_MHQ)^2)                    -0.02 0.00  1.65e-10
## CA_GroupHigh_CA:scale(max_age_MHQ)          0.00 0.01  8.32e-01
## CA_GroupLow_CA:scale(max_age_MHQ)           0.07 0.02  2.40e-04
## CA_GroupHigh_CA:Sex                        -0.05 0.02  1.86e-02
## CA_GroupLow_CA:Sex                          0.01 0.04  7.81e-01
## scale(max_age_MHQ):Sex                     -0.06 0.01  1.20e-25
## CA_GroupHigh_CA:I(scale(max_age_MHQ)^2)     0.01 0.01  3.94e-01
## CA_GroupLow_CA:I(scale(max_age_MHQ)^2)     -0.01 0.02  5.52e-01
## CA_GroupHigh_CA:scale(max_age_MHQ):Sex     -0.01 0.02  6.65e-01
## CA_GroupLow_CA:scale(max_age_MHQ):Sex       0.02 0.04  5.92e-01

2. Regression with g-factor Group Assumptions

Please not that violations of assumptions are likely due to the ordinal characteristic of the neuroticism score

a) Linearity

If there is no pattern in the residual plot. This suggests that we can assume linear relationship between the predictors and the outcome variables.

b) Homogeneity of variance

It’s good if you see a horizontal line with equally spread points.

c) Normality of residuals

The normal probability plot of residuals should approximately follow a straight line.

d) Outliers and high levarage points

If there is no outliers that exceed 3 standard deviations, it is good.

e) Influential values

A rule of thumb is that an observation has high influence if Cook’s distance exceeds 4/(n - p - 1)(P. Bruce and Bruce 2017), where n is the number of observations and p the number of predictor variables. The Residuals vs Leverage plot can help us to find influential observations if any. On this plot, outlying values are generally located at the upper right corner or at the lower right corner. Those spots are the places where data points can be influential against a regression line.

3. Regression with g-factor

## [1] "There are 121993 individuals without missing data in this analysis."
##                                   Std Beta   SE         p
## (Intercept)                           0.00 0.00  2.83e-01
## G_std                                 0.02 0.00  1.25e-12
## Sex                                   0.02 0.01  4.73e-03
## scale(max_age_MHQ)                    0.08 0.00 1.23e-134
## I(scale(max_age_MHQ)^2)              -0.02 0.00  4.20e-09
## G_std:Sex                            -0.01 0.01  1.76e-01
## G_std:scale(max_age_MHQ)             -0.01 0.00  2.39e-05
## Sex:scale(max_age_MHQ)               -0.06 0.01  7.18e-25
## G_std:I(scale(max_age_MHQ)^2)         0.00 0.00  4.41e-01
## Sex:I(scale(max_age_MHQ)^2)           0.00 0.01  4.29e-01
## G_std:Sex:scale(max_age_MHQ)         -0.01 0.00  2.15e-01
## G_std:Sex:I(scale(max_age_MHQ)^2)     0.00 0.00  4.33e-01

4. Probability of having a phenotype as a function of the g-factor

## `geom_smooth()` using formula 'y ~ x'