index.qmd

---
title: "Is cannabis associated with bipolar disorder?"
author:
  - name: Bruno Braga Montezano
    id: bbm
    orcid: https://orcid.org/0000-0002-4627-1776
    email: bmontezano@hcpa.edu.br
    affiliation: 
      - name: Federal University of Rio Grande do Sul
        city: Porto Alegre
        state: RS
        url: https://www.ufrgs.br/ppgpsiquiatria/en/home-english/
date: today
format: html
theme: journal
number-depth: 1
table-of-contents: true
highlight-style: github
execute: 
  echo: false
---

```{r load-packages}
#| warning: false
#| message: false
library(tidyverse)
```

```{r read-data}
df <- read_rds("data/data_bd_cannabis_2023_10_09.rds")
```

# Methods

In order to summarize the exposure variables, we used a descriptive table with
absolute and relative frequencies of the categorical variables and presented
mean with standard deviations and median with minimum and maximum values for
numeric variables. The bivariate analyses were performed with Student’s $t$-tests,
$\chi$-squared tests, Fisher's exact test and Mann-Whitney $U$ tests depending on the
variables distributions.

We built binomial logistic models to assess the effects for bipolar disorder
incidence at 22 years old in the studied exposures, controlling for the other
variables. Considering multicollinearity in regression analysis can be a problem
since variables wouldn’t provide independent or unique information, we used
variance inflation factor (VIF) to measure the correlations between the
predictors in the model (Fox et al., 1992). We considered a VIF of 4 or greater
as threshold to classify a predictor estimate as non-reliable.

All analyses were conducted through scripts written in the R programming
language (version 4.3.1). Additional information on the present R session is
available at the end of the page.

# Results

```{r select-columns}
df_model <- df |> 
  select(
    outcome = bd_any,
    subtype = bd_subtype,
    sex = sexonovo,
    skin_color = jcorpel5,
    asset_index = kibem,
    asset_index_quintiles = kibem5,
    physical_abuse = hc13,
    cannabis_at_11 = hc06m,
    cannabis_at_15 = jc11d,
    cannabis_at_18 = kc05,
    cannabis_at_18_frequency = maconha_freq,
    cocaine_at_18 = kc09
  )
```

## How is the bipolar disorder (BD) outcome distributed?

Eighty-seven participants (2.3%) were diagnosed with bipolar disorder at
follow-up, of which, 76 (2.04%) had a diagnosis for bipolar disorder type I and
11 (0.3%) for bipolar disorder type II.

```{r plot-outcome-dist}
df_model |> 
  mutate(subtype = if_else(subtype == "NOS", "Not BD", subtype)) |> 
  count(outcome, subtype) |> 
  ggplot(aes(x = outcome, y = n, fill = subtype)) +
  geom_col() +
  scale_y_continuous(labels = \(x) format(x, big.mark = ",")) +
  theme_classic(12, "IBM Plex Sans") +
  ggsci::scale_fill_jama() +
  labs(x = "Bipolar disorder at 22 years old",
       y = "# of subjects",
       fill = "BD subtype") +
  annotate("text", y = 2600, x = 2,
           label = "Not BD: 3,625\nType 1: 76\nType 2: 11",
           family = "IBM Plex Sans",
           size = 4) +
  theme(legend.position = "top")
```

## Firstly, we'll create the main descriptive table

For the $p$-values, `chisq.test.no.correct` was used for categorical variables
with all expected cell counts $\geq 5$, and `fisher.test` for categorical
variables with any expected cell count $< 5$.

```{r descriptive-table}
df_model |> 
  mutate(
    cannabis_at_18_frequency = fct_relevel(
      cannabis_at_18_frequency, c("Never", "Experimented", "Used in the past",
                                  "Sometimes", "Weekends", "Daily")
    )
  ) |> 
  gtsummary::tbl_summary(
    by = outcome,
    label = list(
      subtype ~ "BD subtype",
      sex ~ "Sex",
      skin_color ~ "Skin color",
      asset_index_quintiles ~ "Socioeconomic status at 18 years old (1: poorest; 5: wealthiest)",
      physical_abuse ~ "Physical abuse by parents (at 11 years old)",
      cannabis_at_11 ~ "Cannabis use (at 11 years old)",
      cannabis_at_15 ~ "Cannabis use (at 15 years old)",
      cannabis_at_18 ~ "Lifetime cannabis use (at 18 years old)",
      cannabis_at_18_frequency ~ "Cannabis use frequency (at 18 years old)",
      cocaine_at_18 ~ "Lifetime cocaine use (at 18 years old)"
    ),
    include = c(-asset_index)
  ) |> 
  gtsummary::add_p() |> 
  gtsummary::add_overall()
```


## Missing data visualization

```{r plot-missing-data}
plot_missing_data <- DataExplorer::plot_missing(df_model,
                           ggtheme = theme_classic(12, "IBM Plex Sans"),
                           geom_label_args = list("family" = "IBM Plex Sans",
                                                  "size" = 2.5),
                           theme_config = list(legend.position = "none"))
```

## Are cannabis and cocaine use associated with bipolar disorder?

We fitted a logistic model (estimated using ML) to predict bipolar disorder with
lifetime cannabis use at 18 years old (formula: $BD$ \~ $cannabis$). The model's
explanatory power is very weak (Tjur's $R^2$ = 1.81e-03). The model's intercept,
corresponding to cannabis = No, is at -3.92 (95% CI [-4.21,
-3.65], $p$ < .001). The effect of cannabis use at 18 [Yes] is
statistically significant and positive ($\beta$ = 0.60, 95% CI [0.10, 1.08], $p$
= 0.016; $B$ = 0.60, 95% CI [0.10, 1.08]; OR = 1.82, 95% CI [1.10, 2.93]).

We also fitted a logistic model (estimated using ML) to predict bipolar disorder
with lifetime cocaine use at 18 years old (formula: $BD$ \~ $cocaine$). The
model's explanatory power is very weak (Tjur's $R^2$ = 1.47e-03). The model's
intercept, corresponding to cocaine = No, is at -3.83 (95% CI [-4.09, -3.59],
$p$ < .001). The effect of cocaine use at 18 [Yes] is statistically significant
and positive ($\beta$ = 0.67, 95% CI [0.02, 1.25], $p$ = 0.030; $B$ = 0.67, 95%
CI [0.02, 1.25]; OR = 1.96, 95% CI [1.02, 3.49]).

These results are summarized in the tables below.

```{r crude-substance-at-18}
cannabis_bd_fit <- glm(outcome ~ cannabis_at_18,
                       data = df_model,
                       family = binomial)

gtsummary::tbl_regression(
  cannabis_bd_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use")
) |>
  gtsummary::as_gt() |>
  gt::tab_caption("Crude model of lifetime cannabis use at 18 years old and bipolar disorder onset at 22 years old.")

cocaine_bd_fit <- glm(outcome ~ cocaine_at_18,
                       data = df_model,
                       family = binomial)

gtsummary::tbl_regression(
  cocaine_bd_fit,
  exponentiate = TRUE,
  label = list(cocaine_at_18 ~ "Lifetime cocaine use")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Crude model of lifetime cocaine use at 18 years old and bipolar disorder onset at 22 years old.")
```

## What about adjusted models for lifetime cannabis use at 18 years old?

Since the crude (non-adjusted) model was significant, we may follow it up with
an adjusted analysis. The results below are the adjusted model for sex, skin color,
and socioeconomic status (asset index quintiles).

```{r cannabis-adjusted-1}
cannabis_asset_sex_skin_color_fit <- glm(
  outcome ~ cannabis_at_18 + asset_index_quintiles + sex + skin_color,
  data = df_model,
  family = binomial
)

gtsummary::tbl_regression(
  cannabis_asset_sex_skin_color_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use",
               asset_index_quintiles ~ "Socioeconomic status",
               sex ~ "Sex",
               skin_color ~ "Skin color")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Adjusted model for lifetime cannabis use at 18 years old.
                  Adjusted for socioeconomic status, sex, and skin color.")
```

Then, we added physical abuse by parents and lifetime cocaine use as well in
the previous model from the table above.

```{r cannabis-adjusted-2}
cannabis_abuse_fit <- glm(
  outcome ~ cannabis_at_18 + asset_index_quintiles + sex + skin_color +
    physical_abuse,
  data = df_model,
  family = binomial
)

gtsummary::tbl_regression(
  cannabis_abuse_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use",
               asset_index_quintiles ~ "Socioeconomic status",
               sex ~ "Sex",
               skin_color ~ "Skin color",
               physical_abuse ~ "Physical abuse by parents")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Adjusted model for lifetime cannabis use at 18 years old.
                  Adjusted for socioeconomic status, sex, skin color,
                  and also physical abuse by parents.")
```

Finally, we added lifetime cocaine use as well for the last adjusted model.

```{r cannabis-adjusted-3}
cannabis_cocaine_fit <- glm(
  outcome ~ cannabis_at_18 + asset_index_quintiles + sex + skin_color +
    physical_abuse + cocaine_at_18,
  data = df_model,
  family = binomial
)

gtsummary::tbl_regression(
  cannabis_cocaine_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use",
               asset_index_quintiles ~ "Socioeconomic status",
               sex ~ "Sex",
               skin_color ~ "Skin color",
               physical_abuse ~ "Physical abuse by parents",
               cocaine_at_18 ~ "Lifetime cocaine use")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Adjusted model for lifetime cannabis use at 18 years old.
                  Adjusted for socioeconomic status, sex, skin color,
                  physical abuse by parents, and also lifetime cocaine
                  use.")
```

In conclusion, lifetime cannabis use just became not significant after the
inclusion of lifetime cocaine use in the adjusted model.

## Based on previous analyses, we considered to try an approach based on stratification by sex and parental spanking

First, the model stratified by sex. The models are presented in the tables below.

```{r strat-sex}
df_male <- df_model |> filter(sex == "Male")
df_female <- df_model |> filter(sex == "Female")

male_fit <- glm(
  outcome ~ cannabis_at_18,
  data = df_male,
  family = binomial
)

female_fit <- glm(
  outcome ~ cannabis_at_18,
  data = df_female,
  family = binomial
)

gtsummary::tbl_regression(
  male_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Crude model for lifetime cannabis use at 18 years old in bipolar disorder onset using a male subset.")

gtsummary::tbl_regression(
  female_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Crude model for lifetime cannabis use at 18 years old in bipolar disorder onset using a female subset.")
```

In the following tables, we present the results on stratification by physical
abuse by parents.

```{r strat-physical-abuse}
df_abuse <- df_model |> filter(physical_abuse == "Yes")
df_not_abuse <- df_model |> filter(physical_abuse == "No")

abuse_fit <- glm(
  outcome ~ cannabis_at_18,
  data = df_abuse,
  family = binomial
)

not_abuse_fit <- glm(
  outcome ~ cannabis_at_18,
  data = df_not_abuse,
  family = binomial
)

gtsummary::tbl_regression(
  abuse_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Crude model for lifetime cannabis use at 18 years old in bipolar disorder onset using a subset of subjects that were exposed to physical abuse by parents.")

gtsummary::tbl_regression(
  not_abuse_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Crude model for lifetime cannabis use at 18 years old in bipolar disorder onset using a subset of subjects that were not exposed to physical abuse by parents.")
```

We also tested for binary logistic regression models with interaction terms between
cannabis and sex/physical abuse by parents and there was no significant result,
besides the standalone lifetime cannabis use parameter in the physical abuse model.
You can check the results in a detailed manner in the tables below.

```{r interaction-sex-physical-abuse}
sex_inter_fit <- glm(
  outcome ~ cannabis_at_18 * sex,
  data = df_model,
  family = binomial
)

abuse_inter_fit <- glm(
  outcome ~ cannabis_at_18 * physical_abuse,
  data = df_model,
  family = binomial
)

gtsummary::tbl_regression(
  sex_inter_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use",
               sex ~ "Sex")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Logistic regression with interaction term for lifetime cannabis use at 18 years old and sex.")

gtsummary::tbl_regression(
  abuse_inter_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_18 ~ "Lifetime cannabis use",
               physical_abuse ~ "Physical abuse by parents")
) |> 
  gtsummary::as_gt() |> 
  gt::tab_caption("Logistic regression with interaction term for lifetime cannabis use at 18 years old and physical abuse by parents.")
```

## Can the risk of lifetime cannabis use in bipolar disorder vary depending on the frequency of use?

The effect of cannabis use frequency at 18 [Yes (use on weekends or
daily)] is statistically non-significant and positive (OR = 2.68, 95% CI
[0.80, 6.72], $p$ = 0.062).

```{r cannabis-freq}
#| message: false
#| warning: false
df_freq <- df_model |> 
  mutate(cannabis_at_18_frequency = fct_relevel(
    cannabis_at_18_frequency, c("Never", "Experimented", "Used in the past",
                                "Sometimes", "Weekends", "Daily")
  )) |> 
  mutate(
    cannabis_at_18_frequency_dic = case_when(
      cannabis_at_18_frequency %in% c("Never",
                                      "Experimented",
                                      "Sometimes",
                                      "Used in the past") ~ "No (never used, experimented once, sometimes, used in the past)",
      cannabis_at_18_frequency %in% c("Daily", "Weekends") ~ "Yes (use on weekends or daily)"
    ) |> factor()
  ) |> 
  labelled::set_variable_labels(
    cannabis_at_18_frequency = "Cannabis use frequency (at 18 years old)",
    cannabis_at_18_frequency_dic = "Cannabis use frequency (at 18 years old)"
  )

freq_fit <- glm(
  outcome ~ cannabis_at_18_frequency_dic,
  data = df_freq,
  family = binomial
)

ggstats::ggcoef_model(freq_fit,
                      exponentiate = TRUE) +
  theme(text = element_text(size = 12, family = "IBM Plex Sans")) +
  scale_y_discrete(labels = scales::label_wrap(15))
```


## Does cannabis use at 18 years old mediate the effect of sex, skin color or socioeconomic status in young adults?

After evaluating it in three separate models for each, we did not found any mediation
effect of lifetime cannabis use at 18 years old related to the variables. Refer to the
author of this document for more details on this analysis.

```{r mediation-sex}
#| eval: false
# mediator (M): cannabis
# exposure (X): sex
# outcome (Y): bipolar

df_mediation <-
  df_model |>
  mutate(
    sex = case_when(sex == "Female" ~ 0,
                    sex == "Male" ~ 1),
    cannabis_at_18 = case_when(cannabis_at_18 == "No" ~ 0,
                               cannabis_at_18 == "Yes" ~ 1),
    skin_color = case_when(skin_color == "White" ~ 0,
                           skin_color == "Non-white" ~ 1)
  ) |>
  filter(!is.na(sex), !is.na(cannabis_at_18), !is.na(outcome))
  

fit_m_sex <- glm(
  cannabis_at_18 ~ sex,
  data = df_mediation,
  family = binomial
)

fit_y_sex <- glm(
  outcome ~ sex + cannabis_at_18,
  data = df_mediation,
  family = binomial
)

fit_med_sex <- mediation::mediate(
  model.m = fit_m_sex,
  model.y = fit_y_sex,
  treat = "sex",
  mediator = "cannabis_at_18"
)

summary(fit_med_sex)

plot(fit_med_sex)

fit_med_sex_boot <- mediation::mediate(
  model.m = fit_m_sex,
  model.y = fit_y_sex,
  boot = TRUE,
  sims = 1000,
  treat = "sex",
  mediator = "cannabis_at_18"
)

summary(fit_med_sex_boot)

plot(fit_med_sex_boot)
```

```{r mediation-skin-color}
#| eval: false
# mediator (M): cannabis
# exposure (X): skin color
# outcome (Y): bipolar

fit_m_skin <- glm(
  cannabis_at_18 ~ skin_color,
  data = df_mediation,
  family = binomial
)

fit_y_skin <- glm(
  outcome ~ skin_color + cannabis_at_18,
  data = df_mediation,
  family = binomial
)

fit_med_skin <- mediation::mediate(
  model.m = fit_m_skin,
  model.y = fit_y_skin,
  treat = "skin_color",
  mediator = "cannabis_at_18"
)

summary(fit_med_skin)

plot(fit_med_skin)

fit_med_skin_boot <- mediation::mediate(
  model.m = fit_m_skin,
  model.y = fit_y_skin,
  boot = TRUE,
  sims = 1000,
  treat = "skin_color",
  mediator = "cannabis_at_18"
)

summary(fit_med_skin_boot)

plot(fit_med_skin_boot)
```

```{r mediation-socio-status}
#| eval: false
# mediator (M): cannabis
# exposure (X): socioeconomic status
# outcome (Y): bipolar

fit_m_socio <- glm(
  cannabis_at_18 ~ asset_index_quintiles,
  data = df_mediation,
  family = binomial
)

fit_y_socio <- glm(
  outcome ~ asset_index_quintiles + cannabis_at_18,
  data = df_mediation,
  family = binomial
)

fit_med_socio <- mediation::mediate(
  model.m = fit_m_socio,
  model.y = fit_y_socio,
  treat = "asset_index_quintiles",
  mediator = "cannabis_at_18"
)

summary(fit_med_socio)

plot(fit_med_socio)

fit_med_socio_boot <- mediation::mediate(
  model.m = fit_m_socio,
  model.y = fit_y_socio,
  boot = TRUE,
  sims = 1000,
  treat = "asset_index_quintiles",
  mediator = "cannabis_at_18"
)

summary(fit_med_socio_boot)

plot(fit_med_socio)
```

## What about cannabis use at 11 (and 15) years old?

We fitted a logistic model (estimated using ML) to predict bipolar disorder with
cannabis use at 11 years old (formula: $BD$ \~ $cannabis\ at\ 11$). The model's
explanatory power is very weak (Tjur's $R^2$ = 1.38e-03). The model's intercept,
corresponding to cannabis at 11 = No, is at -3.80 (95% CI [-4.04, -3.58], $p$ <
.001). The effect of cannabis at 11 [Yes] is statistically non-significant and
positive ($\beta$ = 2.01, 95% CI [-0.94, 3.79], $p$ = 0.064; $B$ = 2.01, 95% CI
[-0.94, 3.79]; OR = 7.46, 95% CI [0.39, 44.38]). The model is summarized at the
table below.

```{r crude-cannabis-11}
cannabis_11_bd_fit <- glm(outcome ~ cannabis_at_11,
                       data = df_model,
                       family = binomial)

gtsummary::tbl_regression(
  cannabis_11_bd_fit,
  exponentiate = TRUE,
  label = list(cannabis_at_11 ~ "Lifetime cannabis use")
) |>
  gtsummary::as_gt() |>
  gt::tab_caption("Crude model of lifetime cannabis use at 11 years old and bipolar disorder onset at 22 years old.")
```

The lifetime cannabis use at 15 years old is not available to be modelled because
there are no subjects with positive instances on the outcome and exposure and the
model will not fit.

```{r count-cannabis-15-bd}
df_model |> 
  count(outcome, cannabis_at_15) |> 
  rename(Outcome = outcome,
         `Cannabis at 15` = cannabis_at_15) |> 
  gt::gt(caption = "Count unique values based on outcome (bipolar disorder at
         22 years old) and cannabis use at 15 years old.")
```

## What about the effect of having used cocaine or cannabis at age 18?

We built another generalized linear model to estimate whether cannabis or
cocaine lifetime use as a unique feature could predict bipolar disorder.
The result is reported in the table below.

```{r aggregate-effect-cannabis-cocaine}
df_agg <- df_model |> 
  mutate(agg_sub = factor(case_when(
    cannabis_at_18 == "Yes" & cocaine_at_18 == "Yes" ~ "Yes",
    cannabis_at_18 == "Yes" & cocaine_at_18 == "No" ~ "Yes",
    cannabis_at_18 == "No" & cocaine_at_18 == "Yes" ~ "Yes",
    cannabis_at_18 == "No" & cocaine_at_18 == "No" ~ "No"
  )))

sub_or_fit <- glm(outcome ~ agg_sub,
                       data = df_agg,
                       family = binomial)

gtsummary::tbl_regression(
  sub_or_fit,
  exponentiate = TRUE
) |>
  gtsummary::as_gt() |>
  gt::tab_caption("Crude model of aggregate effect of cocaine and cannabis at 18 years old and bipolar disorder onset at 22 years old.")
```


# Session information for reprodutibility purposes

```{r session-info}
sessioninfo::session_info()
```