Exercise 2. Analysis of DNA sequence data

You are analyzing sequence data from the mitochondrial D-loop region.  You have determined DNA sequence from Your Favorite Sample (YFS) and it is depicted below along with comparative sequences from humans around the world. 

1) (5 pts) Relative to the reference sequence, circle all variants present in the practice dataset.

YFS -              GATTCTAGTTTAAACTAGTCTCTGTTCTCTCATGAGGAAGCATATTTGGG

 

Reference -      GATTCTAATTTAAACTAGTCTCTGTTCTTTCATGGGGAAGCATATTTGGG

Mongolian -      GATTCTAATCTAAACTAGTCTCTGTTCTCTCATGGGGAAGCATATTTGGG

Yemeni -          GATTCTAATTTAAACTAGTCTCTGTTCTCTCATGGGGAAGCATATTTGGG

Italian -             GATTCTAATTTAAACTAGTCTCTGTTCTCTCATGAGGAAGCACATTTGGG

Egyptian -         GATTCTAATTTAAACTAGTCTCTGTTCTTTCATGGGGAAGCATATTTGGG

Mayan -           GATTCTAATCTAAACTAGTCTTTGTTCTCTCATGGGGAAGCATATTTGGG

Khoisan -         GATTCTAATTTAAACTATTCTCTGTTCTTTCATGGGGAAGCATATTTGGG

 

2) (10 pts) Generate a table that lists the haplotype of each individual relative to the reference sequence (list only the polymorphic positions, i.e. the variants you circled in #1).

 

Reference

A T G C T G T

Khoisan

A T T C T G T

Egyptian

A T G C T G T

Yemeni

A T G C C G T

Mongolian

A C G C C G T

Mayan

A C G T C G T

Italian

A T G C C A C

YFS

G T G C C A T

 

3) (5 pts) How many variants are represented in your dataset and what type are they (each variant should be counted only once regardless of how many individuals carry the mutation)?

 6 transitions and 1 transversion

 

4) (5 pts) Which variants are singletons (appear in only a single individual) and which are shared by more than one individual?  Specify the individuals who carry each variant.

Singletons:1st variant, A-to-G, is a singleton found only in YFS; 3rd variant, G-to-T, is a singleton found only in the Herero; 4th variant, C-to-T, is a singleton found only in the Mayan; 7th variant, T-to-C, is a singleton found only in the Italian.

Non-singletons (found in 2 individuals each): 2nd variant, T-to-C, is found in the Mongolian and Mayan; 5th variant, T-to-C, is found in the Yemeni, Mongolian, Mayan, Italian and YFS; 6th variant, G-to-A, is found in the Italian and YFS.

 

5) (10 pts) Draw a phylogeny of the DNA sequences in your dataset (do not include the reference sequence).  Write the haplotype (polymorphic sequence) for each individual in the phylogeny.  Write the haplotype of any intermediate(s) you must propose in order to fit the phylogeny.  Although all the sequences in your dataset are from extant samples, they will represent internal and terminal nodes in the phylogeny.

 

6) (5 pts) What is your interpretation of the data regarding the origin or evolution of YFS?

 European affiliation, based on shared G-to-A variant and fewest variants between the Italian and YFS.

 

7) (10pts) If one transition occurs every 35,000 years in the region of DNA under study (which translates to a mutation rate of .57 per million years), what date would you calculate for the emergence of modern humans from Africa?  Explain your rationale and, specifically, explain where in the phylogeny you are pinpointing emergence from Africa.  How do you interpret this date?

1 transversion and 3 transitions since ancestral African population as represented by the Khoisan individual:

Empirical determination of ts:tv ratio = 6

(1 transversion)(6:1)(35,000 yrs) + (3 transitions)(35,000 yrs) = 315,000 years

Or, consider the ancestral population to be represented by the Egyptian individual:

(3 transitions)(35,000 yrs) = 105,000 years

You can say that the emergence of anatomically modern humans can be dated between 105,000 and 315,000 years based on this phylogeny.  The divergence from the Egyptian individual may be too recent if that sample is the result of recent gene flow into Africa.  The divergence from the Khoisan is likely too old and more accurately represents the divergence, or emergence, of modern humans in Africa.  Thus, the actual emergence of humans from Africa is likely between these two divergences.