Slope-Hunter: A robust method for collider bias correction in conditional genome-wide association studies
Our department's own Dr Osama Mahmoud led this seminar on bias correction in genetic studies.
Osama explained that studying genetic associations conditioned on another phenotype, for example a study of blood pressure conditional on weight, could be affected by selection bias. An example of this is the study of genetic associations with prognosis (e.g. survival, subsequent events).
Selection on disease status can induce associations between causes of incidence with prognosis, potentially leading to selection bias - also called "index event bias" or "collider bias". At moment one method for adjusting genetic associations for this bias assumes there is no genetic correlation between incidence and prognosis, which may not be a plausible assumption.
Osama proposed the ‘Slope-Hunter’ approach, which has two stages. In the first stage he showed how to use cluster-based techniques to identify: variants affecting neither incidence nor prognosis (these should not suffer bias and only a random sub-sample of them are retained in the analysis); variants affecting prognosis only (excluded from the analysis).
In the second stage, Osama demonstrated cluster-based model to identify the class of variants only affecting incidence. This class was used to estimate the adjustment factor. Simulation studies showed that the approach eliminates the bias and outperforms alternatives in the presence of genetic correlation, and performs as well as alternatives under no genetic correlation when its assumptions are satisfied.
Recent papers
- Elaine Fuertes, Iana Markevych, Richard Thomas, Andy Boyd, Raquel Granell, Osama Mahmoud, Joachim Heinrich, Judith Garcia-Aymerich, Célina Roda, John Henderson, Debbie Jarvis, "Residential greenspace and lung function up to 24 years of age: The ALSPAC birth cohort", Environment International, Volume 140, 2020, 105749, ISSN 0160-4120.
- Roda, Célina and Mahmoud, Osama and Peralta, Gabriela P and Fuertes, Elaine and Granell, Raquel and Serra, Ignasi and Henderson, John and Jarvis, Deborah and Garcia-Aymerich, Judith (2020) 'Physical-activity trajectories during childhood and lung function at 15 years: findings from the ALSPAC cohort.' International Journal of Epidemiology, 49 (1). 131 - 141. ISSN 0300-5771
- Talaei, Mohammad and Hughes, David A and Mahmoud, Osama and Emmett, Pauline M and Granell, Raquel and Guerra, Stefano and Shaheen, Seif O (2021) 'Dietary intake of vitamin A, lung function, and incident asthma in childhood.' European Respiratory Journal. ISSN 0903-1936
Paving the Road to Open Science via Automated Data Documentation
In this talk University of Essex graduate Ahmed Abdelmaksoud looked at how research can be transformed by better data documentation.
Many research projects collect data, which gives us an unparalleled opportunity to not only produce integrated research, but also improve open-access. For example, socio-economic projects could collect and generate data that would also be useful for a health research project. This pre-collected data could help shorten the research timeframe for the health project, but any insights from a health perspective could be used to further generate impact on the original socio-economic project, or provide a foundation for a third piece of research.
Additionally, better data collection and documentation of the process of collecting the data can allow other researchers to replicate a study, proving that the conclusions drawn in the original are accurate (or alternatively identifying issues that need to be addressed).
At the moment there is no standardised way to document data, especially as it often has to be done manually and requires a great deal of time and effort for little reward beyond altruism.
In his talk Ahmed discussed a potential automated solution. Called ‘D2’, it is written in ‘R’ and mainly employs ‘R-Markdown’ and ‘Knitr` to generate documentation for the most commonly used data formats. Minimal input is required from the user, so researchers can utilise it without it taking up too much time.
Ahmed's proposal was particularly interesting as the pandemic and rush to create a Covid-19 vaccine have shown the importance of open science and sharing of data. A tool like this could be a cost and time-effective way to improve this in the future.