Reference trait analysis is a method for relating incompatible phenotypes. For example, in the study of brain gene expression predisposing drug self-administration behaviors in mice, it is impossible to measure drug-naive brain gene expression and drug self-administration behaviors in the same individual mice.
In this example we are going succinctly demonstrate how the method works using gene expression and anxiety data from N=258 Diversity Outbred mice studied in Logan et al. 2013. In this example dataset, measurements of two separate sets of anxiety-related traits and hippocampal gene expression profiles have been collected for each of the 258 mice. Since the anxiety behaviors and hippocampal gene expression are not incompatible measures, we can test how well the reference trait analysis method performs.
Code for this analysis is available at this github repository.
# Open-field and light-dark box testing.
# See https://www.ncbi.nlm.nih.gov/pubmed/23433259 for phenotyping details
of <- read_tsv("open_field_Logan_2013.tsv", col_types=cols(mouse_id="c", .default='d'))
ld <- read_tsv("light_dark_Logan_2013.tsv", col_types=cols(mouse_id="c", .default='d'))
# Hippocampal gene expression data (RNA-Seq)
# Data for 17,539 genes in 258 mice.
# These data have been upper quartile normalized and rankZ transformed.
expr <- read.csv("hippo_gene_expr.csv.gz", row.names=1)
We transform some of the phenotypes with log
and sqrt
to more closely achieve approximate normality:
sqrt_cols <- c("Percent_Time_Center", "Percent_Time_Periphery", "PercentTime_Mobile")
of[, sqrt_cols] <- sqrt(of[, sqrt_cols])
of[, "Total_Distance_cm"] <- log(of[, "Total_Distance_cm"])
Let’s make a few plots to look at the data.
# Make plots of open-field arena data:
of_long <- gather(of, "phenotype", "value", -mouse_id)
ggplot(of_long) + geom_histogram(aes(x=value), bins=50, color="gray") +
facet_wrap(~ phenotype, scales='free') + theme_bw(base_size=16)
# Make plots of light-dark box data:
ld_long <- gather(ld, "phenotype", "value", -mouse_id)
ggplot(ld_long) + geom_histogram(aes(x=value), bins=50, color='gray') +
facet_wrap(~ phenotype, scales='free') + theme_bw(base_size=16)
All phenotypes look roughly bell shaped.
We also regress out the effect of sex in an attempt to mitigate this known covariate that is not of primary interest.
remove_sex <- function(df) {
residuals(lm(value ~ sex, data=df))
}
ld_corrected <- mutate(ld_long, sex=substring(mouse_id, 1, 1)) %>%
group_by(phenotype) %>% nest() %>%
mutate(resid=map(data, remove_sex)) %>%
unnest(data, resid) %>%
select(mouse_id, phenotype, resid) %>%
rename("value"="resid")
of_corrected <- mutate(of_long, sex=substring(mouse_id, 1, 1)) %>%
group_by(phenotype) %>% nest() %>%
mutate(resid=map(data, remove_sex)) %>%
unnest(data, resid) %>%
select(mouse_id, phenotype, resid) %>%
rename("value"="resid")
We call light-dark box traits the reference traits and open-field traits the target traits. In this example walkthrough, we pretend that we lack gene expression data for N/2=129 animals and we lack open-field data for the other 129 animals.
set.seed(1)
grp1 <- sort(sample(1:258, size=129, replace=FALSE)) # training group
grp2 <- setdiff(1:258, grp1) # test group
mouse_ids <- sort(of$mouse_id)
grp1_ids <- mouse_ids[grp1]
grp2_ids <- mouse_ids[grp2]
of_train <- filter(of, mouse_id %in% grp1_ids) %>% select(-mouse_id)
ld_train <- filter(ld, mouse_id %in% grp1_ids) %>% select(-mouse_id)
ccor <- cancor(of_train, ld_train, xcenter=TRUE, ycenter=TRUE)
Let’s confirm the correlation for the first canonical variables:
ccor$cor[1]
## [1] 0.6992908
of_canonical_variable1 <- as.matrix(of_train) %*% ccor$xcoef[, 1]
ld_canonical_variable1 <- as.matrix(ld_train) %*% ccor$ycoef[, 1]
# sanity check:
are_equal(ccor$cor[1], cor(of_canonical_variable1, ld_canonical_variable1)[1])
## [1] TRUE
Here we use the canonical correlation model built above to get projected reference traits. Specifically, the model above gives us weights that we can apply to the reference trait data measured on grp2
animals to construct projected traits.
ld_test <- filter(ld, mouse_id %in% grp2_ids) %>% select(-mouse_id)
projected_ref_traits <- as.matrix(ld_test) %*% ccor$ycoef[, 1]
# compare to projection of target traits (open-field)
# (impossible in typical data but possible here because all animals have all data)
of_test <- filter(of, mouse_id %in% grp2_ids) %>% select(-mouse_id)
projected_target_traits <- as.matrix(of_test) %*% ccor$xcoef[, 1]
cor(projected_ref_traits, projected_target_traits)[1]
## [1] 0.6110526
We may compare variation in projected reference traits to transcript abundance measurements:
expr_grp2 <- expr[, grp2_ids]
exprcors <- cor(t(expr_grp2), projected_ref_traits)
head(exprcors)
## [,1]
## ENSMUSG00000000001 -0.02751342
## ENSMUSG00000000028 0.09268146
## ENSMUSG00000000031 0.08498029
## ENSMUSG00000000037 -0.11694770
## ENSMUSG00000000049 0.12843627
## ENSMUSG00000000056 0.10435831
devtools::session_info()
## Session info -------------------------------------------------------------
## setting value
## version R version 3.5.0 (2018-04-23)
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/New_York
## date 2018-06-14
## Packages -----------------------------------------------------------------
## package * version date source
## assertthat * 0.2.0 2017-04-11 CRAN (R 3.5.0)
## backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
## base * 3.5.0 2018-04-24 local
## bindr 0.1.1 2018-03-13 CRAN (R 3.5.0)
## bindrcpp * 0.2.2 2018-03-29 CRAN (R 3.5.0)
## broom 0.4.4 2018-03-29 CRAN (R 3.5.0)
## cellranger 1.1.0 2016-07-27 CRAN (R 3.5.0)
## cli 1.0.0 2017-11-05 CRAN (R 3.5.0)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.5.0)
## compiler 3.5.0 2018-04-24 local
## crayon 1.3.4 2017-09-16 CRAN (R 3.5.0)
## das * 0.0.0.9000 2018-05-09 Github (daskelly/das@0a8f8f5)
## datasets * 3.5.0 2018-04-24 local
## devtools 1.13.5 2018-02-18 CRAN (R 3.5.0)
## digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
## dplyr * 0.7.5 2018-05-19 cran (@0.7.5)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.5.0)
## forcats * 0.3.0 2018-02-19 CRAN (R 3.5.0)
## foreign 0.8-70 2017-11-28 CRAN (R 3.5.0)
## ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.5.0)
## glue 1.2.0 2017-10-29 CRAN (R 3.5.0)
## graphics * 3.5.0 2018-04-24 local
## grDevices * 3.5.0 2018-04-24 local
## grid 3.5.0 2018-04-24 local
## gtable 0.2.0 2016-02-26 CRAN (R 3.5.0)
## haven 1.1.1 2018-01-18 CRAN (R 3.5.0)
## hms 0.4.2 2018-03-10 CRAN (R 3.5.0)
## htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
## httr 1.3.1 2017-08-20 CRAN (R 3.5.0)
## import 1.1.0 2015-06-22 CRAN (R 3.5.0)
## jsonlite 1.5 2017-06-01 CRAN (R 3.5.0)
## knitr 1.20 2018-02-20 CRAN (R 3.5.0)
## labeling 0.3 2014-08-23 CRAN (R 3.5.0)
## lattice 0.20-35 2017-03-25 CRAN (R 3.5.0)
## lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0)
## lubridate 1.7.4 2018-04-11 CRAN (R 3.5.0)
## magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
## methods * 3.5.0 2018-04-24 local
## mnormt 1.5-5 2016-10-15 CRAN (R 3.5.0)
## modelr 0.1.1 2017-07-24 CRAN (R 3.5.0)
## munsell 0.4.3 2016-02-13 CRAN (R 3.5.0)
## nlme 3.1-137 2018-04-07 CRAN (R 3.5.0)
## parallel 3.5.0 2018-04-24 local
## pillar 1.2.2 2018-04-26 CRAN (R 3.5.0)
## pkgconfig 2.0.1 2017-03-21 CRAN (R 3.5.0)
## plyr 1.8.4 2016-06-08 CRAN (R 3.5.0)
## psych 1.8.4 2018-05-06 CRAN (R 3.5.0)
## purrr * 0.2.4 2017-10-18 CRAN (R 3.5.0)
## R6 2.2.2 2017-06-17 CRAN (R 3.5.0)
## Rcpp 0.12.17 2018-05-18 cran (@0.12.17)
## readr * 1.1.1 2017-05-16 CRAN (R 3.5.0)
## readxl 1.1.0 2018-04-20 CRAN (R 3.5.0)
## reshape2 1.4.3 2017-12-11 CRAN (R 3.5.0)
## rlang 0.2.0 2018-02-20 CRAN (R 3.5.0)
## rmarkdown 1.9 2018-03-01 CRAN (R 3.5.0)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
## rstudioapi 0.7 2017-09-07 CRAN (R 3.5.0)
## rvest 0.3.2 2016-06-17 CRAN (R 3.5.0)
## scales 0.5.0 2017-08-24 CRAN (R 3.5.0)
## stats * 3.5.0 2018-04-24 local
## stringi 1.2.2 2018-05-02 CRAN (R 3.5.0)
## stringr * 1.3.1 2018-05-10 CRAN (R 3.5.0)
## tibble * 1.4.2 2018-01-22 CRAN (R 3.5.0)
## tidyr * 0.8.0 2018-01-29 CRAN (R 3.5.0)
## tidyselect 0.2.4 2018-02-26 CRAN (R 3.5.0)
## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.5.0)
## tools 3.5.0 2018-04-24 local
## utils * 3.5.0 2018-04-24 local
## withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
## xml2 1.2.0 2018-01-24 CRAN (R 3.5.0)
## yaml 2.1.19 2018-05-01 CRAN (R 3.5.0)