So you found a SNP that is an Fst outlier in populations of your non-model organism. That’s great. But what do you do now?
If I was your PhD supervisor, I’d tell you to take your Illumina read and BLAST search GenBank. Pray for a good hit. The genes matching your query might give you some indication of the functional significance of SNP X. At that point you could probably write up your findings and publish… everybody else seems to.
But let me ask you: is that satisfying?
I think not.
If a SNP is a word, a gene is a book, and a genome is a library, why would you look up the shelf number of a book and not take it off the shelf, to open it?
So far, all you’ve done is formulate a hypothesis: maybe SNP X is associated with functional variants of gene X.
Or maybe it’s not.
But don’t stop there.
If SNP X is located on an exon you might try making a 3-D model of the protein. It’s not that hard, and if your results are interesting you’ll end up with a really neat figure for your paper.
Step 1: take than gene from your BLAST search and download the .fasta file of the mRNA.
Step2: Load your mRNA sequence in a program like MEGA (opensource) and translate it into an amino acid sequence.
Step3: Submit your amino acid sequence to an online protein modeling server. There are several, but I recommend Phyre2 (http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index)
It can take a few hours for the job to complete, but in the end you get nifty 3-D models of your protein. I recently did this for a SNP in my salmon data—a complement pathway gene—and got the following model
The model is pretty straight forward. Those curly things are alpha helices. The long flat arrows are beta sheets. In this particular image you’ll see a bunch of red shapes in a cluster at the bottom, this is a model of a pocket, which are often reactive parts of a protein. And if you look closely you’ll see some red balls on the amino acid chain, next to the pocket—that is the location of my SNP.
You can move the model around in 3-D space, to see the protein from different angles. Above is that salmon complement pathway protein again. The blue balls on the right hand side with a red halo is my SNP. The other red highlight is a known binding site in the human homolog of this gene.
One important caveat: The model of your protein is based on a database of known 3-D protein structures. In other words, the Amino acid substitution caused by SNP X won’t change the overall shape of the model in meaningful ways.
What protein modeling does do is allow you to see where on the protein SNP X is and what the functional significance might be… if any.
So, you started with an outlier SNP—nothing more than a hypothesis about an adaptive process manifested in the DNA.
Now you’ve modeled the protein that SNP X belongs to… and you still only have a hypothesis about adaptation in your target organism. But it is a better hypothesis than the one you had before.
The SNP I’ve shown you is not near the known binding site of this protein in humans, but it is located on a pocket in the protein that might be functionally important. In fact, the threonine to methionine substitution changes the model in terms of where that pocket is predicted to be.
I think that is a nice detail for my study.
In contrast, if you modeled your protein and SNP X ended up looking very unimportant, you could rule it out as a quantitative trait nucleotide. And that might be important towards finding the real functional variant in your genome.