Exercise 2. Analysis of DNA sequence data
You are analyzing
sequence data from the mitochondrial D-loop region. You have determined DNA sequence from Your
Favorite Sample (YFS) and it is depicted below along with comparative sequences
from humans around the world.
1) (5 pts) Relative
to the reference sequence, circle all variants present in the practice dataset.
YFS - GATTCTAGTTTAAACTAGTCTCTGTTCTCTCATGAGGAAGCATATTTGGG
Reference - GATTCTAATTTAAACTAGTCTCTGTTCTTTCATGGGGAAGCATATTTGGG
Mongolian - GATTCTAATCTAAACTAGTCTCTGTTCTCTCATGGGGAAGCATATTTGGG
Yemeni - GATTCTAATTTAAACTAGTCTCTGTTCTCTCATGGGGAAGCATATTTGGG
Italian - GATTCTAATTTAAACTAGTCTCTGTTCTCTCATGAGGAAGCACATTTGGG
Egyptian - GATTCTAATTTAAACTAGTCTCTGTTCTTTCATGGGGAAGCATATTTGGG
Mayan - GATTCTAATCTAAACTAGTCTTTGTTCTCTCATGGGGAAGCATATTTGGG
Khoisan - GATTCTAATTTAAACTATTCTCTGTTCTTTCATGGGGAAGCATATTTGGG
2) (10 pts) Generate
a table that lists the haplotype of each individual relative to the reference
sequence (list only the polymorphic positions, i.e. the variants you circled in
#1).
|
Reference |
A T G C T G T |
|
Khoisan |
A T T C T G T |
|
Egyptian |
A T G C T G T |
|
Yemeni |
A T G C C G T |
|
Mongolian |
A C G C C G T |
|
Mayan |
A C G T C G T |
|
Italian |
A T G C C A C |
|
YFS |
G T G C C A T |
3) (5 pts) How many
variants are represented in your dataset and what type are they (each variant
should be counted only once regardless of how many individuals carry the
mutation)?
6 transitions
and 1 transversion
4) (5 pts) Which variants are singletons (appear in only a single
individual) and which are shared by more than one individual? Specify the individuals who carry each
variant.
Singletons:1st variant, A-to-G, is a singleton found only
in YFS; 3rd variant, G-to-T, is a singleton found only in the Herero; 4th variant, C-to-T, is a singleton
found only in the Mayan; 7th variant, T-to-C, is a singleton found
only in the Italian.
Non-singletons
(found in 2 individuals each): 2nd variant, T-to-C, is found in the
Mongolian and Mayan; 5th variant, T-to-C, is found in the Yemeni,
Mongolian, Mayan, Italian and YFS; 6th variant, G-to-A, is found in
the Italian and YFS.
5) (10 pts) Draw a
phylogeny of the DNA sequences in your dataset (do not include the reference
sequence). Write the haplotype
(polymorphic sequence) for each individual in the phylogeny. Write the haplotype of any intermediate(s)
you must propose in order to fit the phylogeny.
Although all the sequences in your dataset are from extant samples, they
will represent internal and terminal nodes in the phylogeny.

6) (5 pts) What is your interpretation of the data regarding the origin
or evolution of YFS?
European
affiliation, based on shared G-to-A variant and fewest variants between the
Italian and YFS.
7) (10pts) If one
transition occurs every 35,000 years in the region of DNA under study (which
translates to a mutation rate of .57 per million years), what date would you
calculate for the emergence of modern humans from
1 transversion and 3 transitions since ancestral African
population as represented by the Khoisan individual:
Empirical determination
of ts:tv ratio = 6
(1 transversion)(6:1)(35,000 yrs) + (3 transitions)(35,000 yrs) = 315,000 years
Or, consider the
ancestral population to be represented by the Egyptian individual:
(3
transitions)(35,000 yrs) = 105,000 years
You can say that the
emergence of anatomically modern humans can be dated between 105,000 and 315,000
years based on this phylogeny. The
divergence from the Egyptian individual may be too recent if that sample is the
result of recent gene flow into