A little hack for the destruct.py script

If you have any experience with population genetics at all, you probably know what the program STRUCTURE is, and what its plots (distruct plots) look like. But in the past I’ve hated using STRUCTURE because it is slow.

But only this year I’ve come to love STRUCTURE like never before, because of the new version of this analysis called fastSTRUCTURE

Anil Raj, Matthew Stephens, and Jonathan K. Pritchard. fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets , (Genetics) June 2014 197:573-589 [Genetics, Biorxiv]

FastSTRUCTURE runs like snapping your fingers, even with large SNP datasets (> 90K loci). Even better, it’s a python script so you just run it in the command line on your unix terminal. Installing the python dependencies isn’t painless but once everything is up and running it is a breeze to execute.

But fastSTRUCTURE is not versatile when in comes to plotting options. It comes with another python script called distruct.py that only outputs the distruct plot in .png format. And that is a problem because you can’t manipulate a .png in Adobe Illustrator. So no changing the colors, or the labels, etc. Sorry.

As I was writing this it occurred to me that you could take the outputs of fastSTRUCTURE and try and make a destruct plot using older STRUCTURE accessories, but I have a feeling that that could get messy. And I have a better way:

Hack the distruct.py script, which is available on github here:

https://github.com/rajanil/fastStructure/blob/master/distruct.py

The last three lines of the script look like this:

# plot the data

figure = plot_admixture(admixture, population_indices, population_labels, title)

figure.savefig(params[outputfile], dpi=300)

So all we have to do is copy the code into a new file and change the very last line to look like this:

figure.savefig(params[outputfile], dpi=300, format=‘pdf’)

And now the distruct.py script will give us a pdf file. The different file format options are:

png, pdf, ps, eps and svg.

The savefig() function from the matplotlib python library has some other options too. And if you’re interested you can find them here: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.savefig

For months now I’ve wanted to hack the distruct.py script, but never got around to it. I hoped it would be easy, and was not disappointed.

And now it will be even easier for you.


2 thoughts on “A little hack for the destruct.py script

  1. Hi there,

    Sorry to follow up on this old post but I am currently editing distruct.py too and I am a bit doubtful about this whole script and I wanted to ask somebody opinion.

    First thanks for sharing your tip.

    The plot of distruct.py seems a bit odd to me and different from the standard admixture plots. I will probably just end up re plotting it by myself in R with stacked bars because in any case I do not like to much the final product of distruct.py, aesthetically speaking.

    Anyway the odd thing for me, and I may be missing something so feel free to correct me, is the width of the vertical bars corresponding to one sample (or 1 individual).

    If you look at your plot: https://johnbhorne.files.wordpress.com/2016/11/plot1.png

    For an easy example, look at the population before the last one. I am referring to the population predominantly light blue before the final primarily red population at the far right of the plot. If you look at the purple bits on the top they have a certain width. By looking at your plot at first glance I would say that the width of those purple bits are the width of an individual.

    But then if you look in the same section of the plot but on the bottom, the red bars are clearly thinner.

    So my interpretation at this point is, an individual bar span the size of those red bits in the bottom of that section, which are the thinner I can see. So the purple bits on top of the same section must be 2 or more individuals with very similar (or equal) levels of purple next to each other of that population in your dataset, because if they were one bars would be thinner. But also this interpretation when I double check it with my data (where I see the same issue) does not seem correct, as I cannot find consecutive samples with similar level of say ancestry purple to confirm that bigger width of some of the bars.

    If I divide the lenght of the plot by the number of my samples (in my dataset) the thinner bars seems that are the one that correspond to individual width, because they add up to my number while the bigger ones do not. But still I cannot explain those apparently bigger width bars unless there is something that distruct.py does with the order of samples.

    I am using a custom popfile with 3 populations to order the sample. My understanding is that the script put these populations in order, but then the individuals within each population remain in the order that are found in your file.
    While if you do not use the popfile they are simply ordered by decreasing level of each K and using fastStructure calculated delimiter.

    I know this is probably very confusing and as I said I will probably just re-implement it in R but I was curious as this seems odd.

    Gabriele

    Like

    1. Thanks for the comment. Something is indeed wrong about these plots. I’m not sure how to fix it. Something tells me that it works better for certain values of K, but that is just a guess, as I haven’t used this program in several years. At any rate, I wasn’t too fastidious about the figure for the blog post. But for a publication I would definitely be more fussy.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s