Lecture #8 03/31/2014 and homework assignment for 4/7

pptx slides for lecture are at MCB5472_lecture_03_31_2014 (links only work, if you are in presentation mode)

Slides of what we actually covered are at MCB5472_lecture_03_31_2014 short

pdf summary is at MCB5472_lecture_03_31_2014.pptx

Homework for next week is on slide 73 or at Homework for 4_6

Sample scripts for oligos scanning are at Homework_for_3_31  :
tetra.pl: scans genome and lists tetramers (both to table and screen)
nmerfreq.pl: scans genome and lists n-mers (n-mer size specified in script)
nmerfreqMod.pl: scans genome and lists n-mers and calls an R-script that produces a histogram
histogramScript_pdf.R: R script that produces histogram from a list of numbers and also compares the observed frequency to the frequency expected under a normal distribution. Documentation in histogramScript.R.read_me.doc



2 thoughts on “Lecture #8 03/31/2014 and homework assignment for 4/7

  1. Hi guys,

    I am working on the assignment due monday and am a bit confused. I aligned and converted all of the files to .phy and replaced the dashes with question marks but don’t know where to go from here. In order to do the bootstrap analysis do we need to use Phyml or another program? And if we use Phyml do we just use the command line format or is there a system command? Any help would be greatly appreciated!


    1. Hi Sarah:
      Slides 47ff from the lecture last Monday provide some background.
      You will need to run the following programs sequentially:
      1st: seqboot, which creates the bootstrapped samples.
      2nd: protpars, which will perform parsimony analyses on each of the bootstrap samples (you need to tell the program that you have multiple (i.e., 100) datasets.
      3rd: consense, which will take the 100+ (more than 100 because there are some datasets, which will result in multiple equally parsimonious trees, if there are two they are counted as .5 trees by consense) trees from protpars, calculate the consensus tree, and bootstrap support values for each split.
      Each of the programs expects the input to be in a file called infile, if it doen’t find this file, it will ask for a file to work on.
      Each program writes the results into files called outfile and outtree.
      E.g., if your MSA is in a file called infile, then seqboot will create an outfile which contains 100 bootstrap samples generated from infile.
      You could do a system command to move the outfile to infile.
      Protpars, if told to work on multiple datasets, will generate an outfile and a file called outtree which will contain the 100+ trees in Newick format. You could do a system command to move the outtree to infile (and delete or rename outfile, or you need to tell the next program to overwrite the outfile). Consense then calculates the consensus tree with support values.
      I recommend to run this on the cluster, but you could install PHYLIP on your laptop.
      PS: Seaview provides a GUI for protpars, but if you develop a computational pipeline, this is not very useful.

Comments are closed.