Psychiatric Disorder Genomic Study

Comparative Analysis of SNP Genomic Data in Patients with
Anxiety Disorder and Major Depressive Disorder

Project Report

This project was conducted as part of the [SDS 322E] Elements of Data Science course, taught by Professor Layla Guyot during the Spring 2023 semester at The University of Texas at Austin.


In this project, I will analyze two genomic meta datasets from two separate scientific papers. Both datasets were acquired from the Psychiatric Genomics Consortium (PGC) database. The data on patients with anxiety disorder (AD) came from the paper titled “Meta-analysis of genome-wide association studies of anxiety disorders” by Otowa, T et al., published in ‘Molecular Psychiatry’ in 2016. The data on patients with major depressive disorder (MDD) came from the paper titled “Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression” by the authors Wray, Naomi et al. This paper was published in Nature Genetics in 2018.

Both dataset focuses on Single Nucleotide Polymorphism (SNP), which is a type of genomic mutation where one nucleotide in the DNA differs from other individuals in the population. In the study, the authors obtained the information of genomic SNP via comparing genome sequencing data from patient group and control group. I have a particular interest in this topic because I’m interested in the study of mental disorders, as well as genomic data analysis.

Each unique row in the datasets represent a single case of SNP. The columns display information about the SNP ID (categorical), chromosomal location (categorical), base pair (numeric), reference allele (categorical), alternative allele (categorical), allele frequency (numeric), number of subjects who display the SNP (numeric), and much more. I will be able to join the two datasets by using the SNP ID, which is a universal identifier assigned to each SNP.

The potential trend I’m interested in investigating is how the patterns of SNP in the genome differs between those with anxiety disorder (AD) and major depressive disorder (MDD). Along with the differences, I expect some similarity between the two, due to the overlapping biological nature of the disorders. My research question for this project is: Are there any significant difference in SNP profiles between patients with AD or MDD, and can we identify any potential genetic markers associated with these disorders?