Preface
Human Genomics is a practical guide to learning and research. Published as a living text instead a physical book, Human Genomics takes a deep dive into statistical methodology, computational tools, and biological concepts surrounding the human genome.
Almost two decades since the IHGSC published an initial draft of the human genome (IHGSC, 2021), the T2T consortium has finally completed the long-standing challenge of completing 3.055 billion–base pair sequence of a human genome and creating gapless assemblies for all autosomes and the X chromosome (Nurk et al., 2022).
With decreasing costs of genomic data capturing techniques like sequencing or genotyping, the amount of genomic data generated is growing so fast that the only domain surpassing genomics in the total data volume (as measured in petabytes), the amount of GiHub repositories, and the amount of GitHub commits, is astronomy (Navarro et al., 2019).
Despite this rapid increase, the diversity still remains a significant concern in this domain - individuals of African ancestry account for only about 2.4% of the genome-wide association studies while representing almost 50% of diverse ancestral groups (Atutornu et al., 2022). Genetic studies still disproportionately favor individuals of European ancestry, specifically those of Northwestern European ancestry.
One of the less nefarious reasons for such discrepancy is the fact that ancestry presents a significant challenge in genomic studies as a confounder in associations with traits, particularly those with lower penetrance. While statistical methodology advances successfully reduced such effects, most investigators still rely on ancestry-specific inclusion criteria to reduce the rates of false positive associations (Sul, Martin and Eskin, 2018).
The issue of ancestry gets further complicated when considering populations with substantial amount of recent admixture, particularly in the Americas, which is why these populations are least represented in the current literature. This is just one of many challenges in the domain of human genomics, and my hope is that this book motivates solutions by making science, methodology, and resources within the domain more accessible and digestible.
There are four major pillas considered throughout this book. These represent foundations of every chapter, example, and exercise.
- Conceptual. Conceptual pillar represents theoretical aspects in genetics and genomics, biostatistics, quantitative and population genetics, genetic epidemiology, and computational genomics. These are necessary foundations that every genomics scientist hopeful should understand prior to taking on more advanced challenges.
- Computational. Due to the high and ever-increasing number of features (such as genotypes) and samples in an average genomics research project, computational pillar represents the essetial practical tool at the forefronts of genomics research. Thus, I introduce commonly used programming languages and platforms in the very initial chapters, which we will use to run hands-on practices throughout this book.
- Methodological. Methodological pillar represents the means by which we got to understand and contiuously expand our knowledge of human genomics and related topics. The ideas of data generative methods, from sample processing and DNA genotyping to statistical and cmputational concepts, is a thread common to every chapter in this book.
- Practical. Practical pillar represents application of the knowledge gathered in this textbook to real world problems. This is reflected in hands-on exercises and problem sets in every chapter of this book.
Like I mentioned in the very first sentence of this preface, this book is a living document. Chapters will be added, removed, and merged as the book evolves. My hope is this will be a go-to resource for all the human genomics enthusiasts, which is why this book will be freely accessible at this domain, humangenomics.us, in perpetuity. However, if the readers feel so inclined, they can support me by contributing to maintenance and expansion of the textbook via PayPal, Venmo, or Ko-Fi.
I also welcome any potential collaborators who would like to join me on this endeavor. Any comments, complaints, corrections, or questions can be directed to my email: franjo@franjo.us.
Admonitions
These boxes contain empasized review of key concepts relevan to the MCAT.
These boxes contain some of the important topic-relevant questions that are still unanswered.
These boxes contain tips for analyses and approaches to solving probles.
Abbreviations
DNA: Deoxyribonucleic Acid
IHGSC: International Human Genome Sequencing Consortium
MCAT: The Medical College Admission Test
T2T: Telomere to Telomere
References
Atutornu, J., Milne, R., Costa, A., Patch, C., & Middleton, A. (2022). Towards equitable and trustworthy genomics research. eBioMedicine, 76, 103879. https://doi.org/10.1016/j.ebiom.2022.103879
IHGSC. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921. https://doi.org/10.1038/35057062
Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., Vollger, M. R., Altemose, N., Uralsky, L., Gershman, A., Aganezov, S., Hoyt, S. J., Diekhans, M., Logsdon, G. A., Alonge, M., Antonarakis, S. E., Borchers, M., Bouffard, G. G., Brooks, S. Y., … Phillippy, A. M. (2022). The complete sequence of a human genome. Science, 376(6588), 44–53. https://doi.org/10.1126/science.abj6987
Sul, J. H., Martin, L. S., & Eskin, E. (2018). Population structure in genetic studies: Confounding factors and mixed models. PLOS Genetics, 14(12), e1007309. https://doi.org/10.1371/journal.pgen.1007309
Cite this Book
You can cite this book as:
Ivankovic, F. (2022). Human Genomics. Franjo Ivankovic, LLC.