A Precision Medicine Primer
So he bought a 23andMe kit, sent his spit away and got some reports about this carrier status and muscle type. He noticed he could get his raw data, so he downloaded it, put it into another website, and discovered that he might be clopidogrel resistant. But he was curious as to how the website could think his genotype for something called rs4244285 could possibly be related to clopidogrel resistance.
Is Genomic “Raw Data” Just a 3 Billion Long Line of As, Cs, Gs and Ts?
It depends on what you are doing. If you are getting genotyped by a company such as 23andMe, you cannot get the string of ATCGs, but you can access your different polymorphisms at certain locations in your genome. If you are getting whole genome or whole exome sequencing, you could feasibly get a string of AAACCCTTTGGG.
Genetic tests tend to come in two varieties, sequencing and genotyping.
Sequencing determines the exact base pairs in a certain span of DNA, such as every ATCG in the whole prothrombin gene.
Genotyping is the process of determining which genetic variants an individual possesses. Sometimes, this is just determining which ATCG is at single location in an individual’s DNA. For example, one might genotype an individual for the most common variant of prothrombin, which is defined by a T instead of a G at position 20210 (also known as Prothrombin G20210A).
Genotyping requires prior knowledge of the variants you want to analyze, such as G20210A in prothrombin. This knowledge can change over time, depending on new discoveries (for example a new clinically relevant mutation in prothrombin). In such cases, individuals would need to be re-genotyped. However, if one sequenced the entire gene, one could look for new variants in the existing sequencing data since every base pair in that gene would be known.
In this case, 23andMe gives consumers raw data that somewhat looks like the table to the right. The important part of the table is the “SNP reference” which typically has a value that begins with rs. Basically, when a researcher describes a new point mutation at one position in the genome (a Single Nucleotide Polymorphism or SNP), they can submit it to a database at the National Institutes of Health that keeps track of these mutations. If the mutation has never been described, it is assigned a new Reference SNP ID starting with the letters rs. If the SNP the researcher sent in has been previously described, then it is merged into the pre-existing rs number.
What is a SNP?
A SNP is variation in a single nucleotide at a specific location in the genome. An example is Prothrombin G20210A where there is an A instead of a G at position 20210 in the prothrombin gene, this is associated with hypercoagulability.
A single nucleotide polymorphism (SNP, pronounced snip), is a variation in a single nucleotide that may occur at some specific position in the genome, and it is the most common type of genetic variation among people, occurring normally throughout the DNA. For example, a SNP may replace the nucleotide cytosine (C) with the nucleotide adenine (A) in a certain stretch of DNA. SNPs may fall within coding sequences of genes, non-coding regions of genes, or the regions between genes. They occur once in every 300 nucleotides on average, which means there are roughly 10 million SNPs in the human genome.
If There Is A SNP In The Gene For Something Like Prothrombin, Does That Make The Prothrombin Protein Different?
Not necessarily. SNPs can occur anywhere, not necessarily in the part of the gene that codes for protein. In the case of Prothrombin mutation, the SNP is in a non-coding region that is thought to make the pre-mRNA more stable, leading to more circulating prothrombin.
“Gene” traditionally referred to a unit of DNA that carried the instructions for making a specific protein or a set of proteins. There are an estimated 20,000 to 25,000 genes in the human genome. A gene is usually composed of regulatory regions, exons or coding regions (coding means they get translated into proteins) and introns (they are not translated into proteins). SNPs can occur anywhere in the gene, and do not necessarily need to be in the coding region of the gene.
How Do You Know If A SNP Is Important?
If that SNP is well described, then it might be obvious, like in sickle cell disease. Most of the time, though, it might be hard to know.
Most SNPs have no effect on health or development. Some SNPs, however, may help predict an individual’s response to certain drugs, susceptibility to environmental factors such as toxins, and risk of developing particular diseases. SNPs can also be used to track the inheritance of disease genes within families.
There are two types of SNPs in the coding region: synonymous and nonsynonymous SNPs. An amino acid may be encoded by more than one codon (so-called “degenerate coding”), and a mutation in such codons may not produce any change in translation, resulting in a silent mutation (synonymous). Nonsynonymous SNPs change the amino acid sequence of the protein, and fall into two types: missense and nonsense.
A nonsense mutation results in a premature stop codon, or a nonsense codon in the transcribed mRNA, resulting in a truncated, incomplete, and usually nonfunctional protein. Some genetic disorders such as thalassemia and DMD result from nonsense mutations.
Missense mutation, on the other hand, is when a single nucleotide is changed to cause substitution to a different amino acid. Missense mutations can also render the resulting protein nonfunctional, and such mutations are responsible for diseases such as the sickle-cell anemia and SOD1 mediated ALS. For example, in the most common variant of sickle-cell disease, the 20th nucleotide of the gene for the beta chain of hemoglobin is altered from the GAG to GTG (A -> T). In this case, the 6th amino acid – glutamic acid is substituted by valine and the protein is sufficiently altered to cause the sickle-cell disease. However, not all missense mutations lead to appreciable protein changes. An amino acid may be replaced by an amino acid of very similar chemical properties, in which case, the protein may still function normally; this is termed a neutral, “quiet”, “silent” or conservative mutation. Alternatively, the amino acid substitution could occur in a region of the protein which does not significantly affect the protein’s secondary structure or function.
Note that SNPs that are not in protein-coding regions may still affect gene splicing, transcription factor binding and mRNA degradation. These non-coding region SNPs can manifest in a higher risk of cancer for example, or may affect mRNA structure and disease susceptibility.
If I Have the SNP Associated with a Disease, Does that Mean I Have the Disease?
No. Many conditions are variable penetrance, which means that although different people have a mutation, not all of them have the same outward expression of that mutation. Interpretation of results is frequently more difficult than the sequencing the individual.
Some conditions are described as having reduced or incomplete penetrance. This means that clinical symptoms are not always present in individuals who have the disease-causing mutation. For example, if a mutation in the gene responsible for a particular autosomal dominant disorder has 70% penetrance, then 70% of those with the mutation will develop the disease, while 30% will not.
Another important term to consider in medical genetics is expressivity. Variable expressivity means that individuals with the same genotype will have that phenotype expressed to a different degree. An example for variable expressivity is neurofibromatosis with patients with the same genetic mutation showing different signs and symptoms of the disease.
Basically, penetrance indicates weather a disease trait will show up or not, while expressivity refers to how a disease or trait will manifest.
Where Can I Get More Info For a SNP?
There are some resources that can help.
Online Mendelian Inheritance in Man (OMIM): http://www.omim.org/
NIH’s ClinVar http://www.ncbi.nlm.nih.gov/clinvar/
You can put the rs id in the search box, and typically it will return some associated with the polymorphism. These tend to be very technical resources, but will guide you in the right direction as they often link to other resources such as PubMed.
What was rs4244285?
A variant of CYP2C19, which is involved in metabolism of drugs such as clopiderol.
Using the resources above, we can learn that,rs4244285 is a mutation in the CYP2C19 gene, encoding the CYP2C19*2 variant. This variant has a G to A mutation at nucleotide 681 in exon 5 that creates an aberrant splice site by producing a stop codon, resulting in a deletion of amino acids 215-227, and producing a truncated, nonfunctional protein. CYP2C19 is involved in the metabolism of some antidepressants, the anti-platelet drug clopidogrel and proton pump inhibitors like omeprazole.
Does That Mean He Should Not Take Clopidogrel If He Gets A Cardiac Catheterization?
It’s complicated. Although physiologically it makes sense, when studied sometimes it does not affect a patient oriented outcome.
Clopidogrel is a pro-drug, and needs to be metabolized by CYP2C19 to its active form. Therefore, it makes physiologic sense that those with poorly functioning or absent 2C19 activity would have worse outcomes in situations requiring clopidogrel, such as after cardiac catheterization.
However in the ARTIC-GENE study published by Collet in 2015, platelet function was not associated with CYP450 status. That is, the reactivity of platelets in patients taking clopidogrel was poorly concordant with that patient’s 2C19 genotype. This is most likely due to multiple mechanisms that influence platelet reactivity beyond 2C19 activity in a patient.
Collet, Jean-Philippe, et al. “Genetic and platelet function testing of antiplatelet therapy for percutaneous coronary intervention: the ARCTIC-GENE study.” European journal of clinical pharmacology 71.11 (2015): 1315-1324.
But I Thought Genomics Was Supposed to Give Us Straightforward, Unambiguous Answers.
Interpretations do not always agree, after all they are just interpretations.
Interpretation of genomic data is tough. One JAMA study where two sequenced genes were sent to three different laboratories for interpretation, the three laboratories agreed that the SNP was pathogenic or likely pathogenic only 10% of the time. Two our of three laboratories agree 33% of the time.
Van Driest, Sara L., et al. “Association of arrhythmia-related genetic variants with phenotypes documented in electronic medical records.” JAMA 315.1 (2016): 47-57.