Correlation

 

Example Codes: SAS #1 R #1

Definition

 

†††††† Correlation refers to the extent to which two variables are linearly related. It describes the statistical relationship between two variables without implying causation.

 

††††††† Correlation is measured by correlation coefficients, which vary from -1.0 to +1.0. A correlation coefficient with value greater than zero indicates a positive correlation, where the values of two variables move in the same direction. On the contrary, a negative coefficient implies a negative correlation, where the value of one variable decreases while the value of the other variable increases.

Common measures

 

†††††† The most common measure is the Pearsonís correlation coefficient, denoted as when applied to a population, which indicates the distance between actual observations and the expected values.It is calculated as the covariance of the two variables divided by the product of their standard deviations:

 

, where is the covariance between variables and , and are their standard deviations.

 

†††††††† Given a sample data, can be estimated by :

 

 

, where are observed values in the data, and are the sample means, is the sample size.

 

 

Example Code in SAS

data Setosa;

input SepalLength SepalWidth PetalLength PetalWidth @@;

label sepallength='Sepal Length in mm.'

††††††† sepalwidth='Sepal Width in mm.'

††††††† petallength='Petal Length in mm.'

††††††† petalwidth='Petal Width in mm.';

datalines;

50 33 14 0246 34 14 0346 36 .02

51 33 17 0555 35 13 0248 31 16 02

52 34 14 0249 36 14 0144 32 13 02

50 35 16 0644 30 13 0247 32 16 02

48 30 14 0351 38 16 0248 34 19 02

50 30 16 0250 32 12 0243 30 11 .

58 40 12 0251 38 19 0449 30 14 02

51 35 14 0250 34 16 0446 32 14 02

57 44 15 0450 36 14 0254 34 15 04

52 41 15 .†† 55 42 14 0249 31 15 02

54 39 17 0450 34 15 0244 29 14 02

47 32 13 0246 31 15 0251 34 15 02

50 35 13 0349 31 15 0154 37 15 02

54 39 13 0451 35 14 0348 34 16 02

48 30 14 0145 23 13 0357 38 17 03

51 38 15 0354 34 17 0251 37 15 04

52 35 15 0253 37 15 02

;

 

PROC CORR data = Setosa;

VAR speallength petallength;

RUN;

 

Example Code in R

# input data

sepal_length <- c(50, 46, 46, 51, 55, 48, 52, 49, 44, 50, 44, 47, 48, 51, 48,

††††††††††††††††† 50, 43, 58, 51, 49, 51, 50, 46, 57, 50, 54, 52, 55, 49, 54,

††††††††††††††††† 50, 44, 57, 46, 51, 50, 49, 54, 54, 51, 48, 48, 45, 57, 51,

††††††††††††††††† 54, 51, 52, 53, 55)

sepal_width <- c(33, 34, 36, 33, 35, 31, 34, 36, 32, 35, 30, 32, 30, 38, 34,

†††††††††††††††† 30, 32, 30, 40, 38, 30, 35,34, 32, 55, 36, 34, 41, 42, 31,

†††††††††††††††† 39, 34, 29, 32, 31, 34, 35, 31, 37, 39, 35, 34, 30, 23, 38,

†††††††††††††††† 38, 34, 37, 35, 37)

petal_length <- c(14, 14, NA, 17, 13, 16, 14, 14, 13, 16, 13, 16, 14, 16, 19,

††††††††††††††††† 16, 12, 11, 12, 19, 14, 14, 16, 14, 15, 14, 15, 15, 14, 15,

††††††††††††††††† 17, 15, 14, 13, 15, 15, 13, 15, 15, 13, 14, 16, 14, 13, 17,

††††††††††††††††† 15, 17, 15, 15, 15)

petal_width <- c(02, 03, 02, 05, 02, 02, 02, 01, 02, 06, 02, 02, 03, 02, 02,

†††††††††††††††† 02, 02, NA, 02, 04, 02, 02, 04, 02, 04, 02, 04, NA, 02, 02,

†††††††††††††††† 04, 02, 02, 02, 02, 02, 03, 01, 02, 04, 03, 02, 01, 03, 03,

†††††††††††††††† 03, 02, 04, 02, 02)

 

 

# create a data frame

df <- data.frame(sepal_length, sepal_width, petal_length, petal_width)

 

# output the correlation matrix

cor(df, method="pearson")

 

 

Reference

1.     Croxton, Frederick Emory; Cowden, Dudley Johnstone; Klein, Sidney (1968) Applied General Statistics, Pitman. ISBN 9780273403159 (page 625)

2.     Dietrich, Cornelius Frank (1991) Uncertainty, Calibration and Probability: The Statistics of Scientific and Industrial Measurement 2nd Edition, A. Higler

ISBN 9780750300605 (Page 331)

3.     Aitken, Alexander Craig (1957) Statistical Mathematics 8th Edition. Oliver & Boyd. ISBN 9780050013007 (Page 95)

4.     SAS Help Center. (2021). The CORR Procedure. https://documentation.sas.com/doc/en/pgmsascdc/v_010/procstat/procstat_corr_examples02.htm