I Dream of Genome

A recent paper, published in the journal of Molecular Ecology Resources, played on the name of a TV show for its title:


The gist of the paper was this: if you use a reduced representation of a genome (such as a RAD-seq data set) to do genome-wide selection scans, you are taking a shot in the dark.

I found this paper rather validating, as I’ve long had misgivings about RAD-seq techniques. Like everyone else, I was BEWITCHED by them at first, but now it is clear that by omitting huge portions of the genome you are potentially missing a lot of the adaptive genomic diversity.

What should you do instead of RAD? One suggestion by the authors of the paper is to do exome sequencing. This idea seems attractive at first, because you’d only be looking at DNA polymorphisms that directly affect the primary structure of proteins. But this is just another way to miss valuable signal in the genome.

For example, say that the polymorphism being acted on by selection is in a regulatory region, even one that is close to the gene. If the protein being regulated is highly conserved, you’ll end up eliminating this locus in your data, removing it during your data filtering steps because it would appear to be non-polymorphic.

And what if, in that conserved gene, there are adaptive SNPs in an intron? Yes, in an intron. When introns are cleaved during DNA transcription where do you think they go? Often they are picked up by RNAse protein complexes that use them to target RNA substrates (e.g. https://en.wikipedia.org/wiki/Dicer). Ergo, these intronic sequences can be involved in RNA interference and thus participate in gene expression regulation.

And what if there are important SNPs in long non-coding RNA genes found in intergenic regions?

Exome sequencing, while useful for some things, is just another genome reduction, not so different from RAD.

Let’s face it. What you really want… what we all really dream of…

is Genome

Genome assembly.

The authors of the Breaking Rad paper touch on whole genome sequencing a little bit, but for the most part I felt like they were poking it with a stick, as if it were some stinging jellyfish washed up on the beach. They failed to see Genome as the lovely, wish-granting sprite that she really is.

The many charms of Genome

I get the feeling that some researchers don’t fully appreciate the many types of genomic variation that exist. I personally don’t understand what is so great about the SNP polymorphisms that are popular right now. Of all genetic variants, SNPs are perhaps the least likely to be of functional significance. So if you’re looking to find quantitative trait loci then most SNPs can only be cool by association.

I think a much better marker than SNPs are indels (insertion/deletions). If you have a reference genome, calling indels from NGS data is just as easy as calling SNPs (using the mpileup command in SAMTOOLS). There are buh-zillions of small indels in any given genome that can be associated with phenotypic traits, just as well as any SNP, and larger indels are more likely to be functional.

Think of the long, intergenic, non-coding RNA gene that I mentioned in the last section. A SNP in that gene is perhaps unlikely to alter the function of that RNA. Such a mutated RNA gene will still be able to bind to its target substrate. It would take several SNP mutations to mess with the function of an lncRNA. An indel, on the other hand, something longer than a few base-pairs, would be a good indication that the polymorphism is not trivial.

Indels within exons would be even more significant because, while a SNP in a third codon position is often synonymous, an indel at a third codon position changes the entire downstream amino acid translation. And before you think that you’re not likely to find indels in exons, read this paper (http://genome.cshlp.org/content/23/5/749.short).

Of course, Genome has other charms besides indels: large chromosomal rearrangements, transposons, retrotransposons, and exon shuffling, just to name a few. Unfortunately, most species don’t get examined for these kinds of genomic phenomena, but Genome has the potential to help us see how many different forms of nucleotide variation are relevant to the evolution of organisms in the wild.


I recently participated in the review of a research proposal that aimed to sequence the genomes of 1,200 cod. It was the most ambitious genome sequencing project for a marine organism that I’ve ever heard of.

Do you think I gave it a favorable review?

You bet I did!

Even if the authors don’t succeed in their primary research goal, the body of data they will collect in the process will be used by cod researchers for ever more. It will be like a genome bible for cod biology.

Can I get a hallelujah?

Earlier this year, a paper by Berg et al. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4794648/) found a large chromosomal inversion associated with inshore and offshore cod ecotypes. Reading it made me sad… because I wished that I could have discovered it first. But, on the bright side, there are probably more interesting secrets lurking in the cod genome, and if a huge library of cod genomes is forthcoming, then there is a good chance that ordinary, every-day scientists like me will get to discover them.

It is a good time to be a cod biologist.

The Oxford Nanopore family.

With the advent of third-generation DNA sequencing, such as Oxford nanopore, whole genomes are only going to become more commonplace.

You can get an Oxford Nanopore MinION for ~$1,000 USD, and flow cells for that unit start at $500 (https://nanoporetech.com/). It’s so cute. It fits in your pocket and simply plugs into the USB port on your laptop.

And their next machine, the SmidgION is designed to be used… wait for it… on your smart phone!!!! (https://nanoporetech.com/products/smidgion).

DNA sequencer of the near future… My daughter will be getting one for her sixteenth birthday in 2025.

Oxford Nanopore machines still suffer from high sequencing error rate, so you wouldn’t want to sequence a genome entirely on this machine… yet. But with time the accuracy of nanopore sequencers is only going to get better. And in the meantime something even better might come along (some people think that Illumina has an ace up its sleeve and is waiting for the right time to play it).

Don’t wait, jump on the whole-genome bandwagon now!

One way or another, we are all going to end up sequencing whole genomes for all samples. This is the future, so you best be ready for it.

If your favorite species doesn’t already have a reference genome, you need to get on that immediately. Even a rough assembly is better than no assembly. Let me give you an example of why:

Here is a PCadapt Manhattan plot of 93,703 genom-wide SNPs from Atlantic Salmon… in no particular order.


Now put them in genomic order:


That makes quite a difference, doesn’t it?

Here’s that same plot, but with chromosomes labeled.


Without the genomic positions, we cannot see how outlier loci are distributed. Without the genome positions all you can have is a jumble of dots and no real way to know if the patterns of divergence are real or not.

The days of RAD-seq and calling SNPs without a reference genome in STACKS are just about over. Moving forward with our scientific lives, we all need conspecific reference genomes against which we can call SNP, indel, and even microsatellite variants from whole-genome sequencing.

Learn to love her

Genome: she comes with her own set of problems and can be trying at times, but she is extremely useful and eager to please. Don’t be the guy that doesn’t appreciate her, and feels that he is stuck with her. Don’t be the guy who doesn’t want her to do anything magical. That guy is boring. Instead, learn to appreciate Genome for who she is and how she can make your life more interesting.





Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s