By: Charles F.
Kerchner, Jr., P.E. (Retired)
Written: 7 Jan 2005
Last Edit/Update: 17 Apr 2008
Copyright ©2005-2008
All Rights Reserved
Add Your Calculated
Surname Project Y-STR Mutation Rate Data to the Surname Project Y-STR Mutation
Rate Log
Kerchner's DNA Testing and
Genetic Genealogy Info and Resources Page
See my online Genetic
Genealogy Glossary for help with esoteric terms
Kerchner's Genetic
Genealogy DNA Testing Dictionary
DNA Mutation Rates
Mutation Rate: The rate at which a genetic marker mutates or changes over time.
The number of mutations per hundreds of generations expressed as a decimal
value or a percentage. For example: A typical mutation rate quoted in early
(circa 2001/2002) Y chromosome STR (Y-STR) TMRCA
calculations and analysis is one per 500 generations (transmission events).
That would be an average mutation rate (Y-STR Genetic Clock Mutation/Tick Rate)
of .002 or 0.2%. Some commercial DNA testing labs are using an average Y-STR
mutation rate of .003 or 0.3%. And a 2004 study by FamilyTreeDNA indicates that
the average mutation rate for all Y-STR markers for the male population as a
whole may be twice as fast as the historical standard rate, i.e., .004 or 0.4%
instead of .002 or 0.2%.
Does the Y-STR Mutation Rate Genetic Clock Tick at the Same Average
Rate for All Markers and for All Males Lines?
Studies indicate that the Y-STR mutation rate varies for different markers.
Also recent studies indicate that the average Y-STR rate for all markers used,
and for all male lines tested, could be twice as fast as previously surmised.
Also for some panels of markers the average rate could be even higher. Thus for
example the average Y-STR mutation rate could be once per 250 generations which
is .004 or 0.4% for the entire male population instead of .002 or 0.2%. In
addition to the overall average mutation rate for all males
lumped together being subject to debate, anecdotal evidence reported over the
past few years by Genetic Genealogists indicates that the calculated average
Y-STR mutation rate varies from one Y chromosome male line to another. The Y
chromosome average mutation rate of one surname project male line compared to
another surname project male line mutation rate is very different in some
cases. One could see a significantly higher average mutation rate in one
surname male line. One could see significantly lower average mutation rate in
another surname male line. Some family male lines appear to have a very stable
Y chromosome. And other males lines may have one that mutates far more than the
average estimated for the total population of all males, i.e., significantly
more than the frequently used .002 Y-STR mutation rate per generation rate.
When averaging together these surname project group male line averages, the
overall average may still approximate the overall averages seen in the general
studies of Y-STR mutations rates. It would be logical to expect that. But there
may be dramatic differences in the average mutation rate in one family male
line surname project compared to another family male line surname project.
Explanations have been postulated as to why one male line Y chromosome would be
mutating faster than the overall average of other projects and the general male
population as a whole. Some have suggested the fundamental mutation rate may be
the same but that there is a Y-STR copy repair mechanism to fix mutations
during the Y-DNA replication/copying process which may work better in some male
Y chromosome family lines than it does in other family male lines resulting in
net differences being observed in average mutation rates from one male line
surname project to the other. Therefore it is speculated that the male lines
which have the higher average mutation rates have a less effective Y-STR copy
repair system. If so, that would help explain the marked differences observed
in the average Y-STR mutation rate sometimes found when comparing one male line
surname project average mutation rate to another male line surname project's
average mutation rate. It should be noted that the Y-STR average mutation rate
is also currently the subject of much debate. But in my opinion, when it comes
to Y-STR mutation rates, it
is definitely not one size shoe that fits all male lines.
Mutation Rate Term Applies to Various Types of DNA Markers
(but as later will discussed the mutation rate varies greatly for these
different types of DNA markers)
A mutation rate can be defined and estimated for a Single Nucleotide
Polymorphism marker (SNP), a single Short Tandem Repeat (STR) at a DNA
Y-chromosome Segment (DYS) marker location, i.e., a Y-STR marker, and/or for a
Haplotype or set of several markers. A Y-DNA Haplotype is a set of numbers,
i.e., the numeric allele values of a set of STRs
(typical commercial tests look at a set or panel of 12, 25, 26, 37, 43, or 67
DYS STR markers) which are located in the vast non-coding "junk DNA"
region of the Y chromosome. The haplotype term is also used to describe the set
of SNP mutations in the case of Mitochondria DNA (MtDNA) testing in the Hyper
Variable Regions (HVRs) which are located in the
small non-coding region of the mtDNA molecule. Note: The commerical
mtDNA test is basically a giant multi-nucleotide SNP test checking 1050
nucleotide locations if both HVR1 (540 nucleotides) and HVR2 (510 nucleotides)
are tested. Your actual mtDNA haplotype would be all 1050 alleles (DNA letters,
i.e., A, T, G, or C for each nucleotide location). But the standard convention
for reporting the results of mtDNA test is to only report the
mutations/differences as compared to the mtDNA Cambridge Reference Sequence
(CRS), which was the first mtDNA molecule fully sequenced. Even the mutation
rate for the entire Y chromosome molecule located in the cell nucleus can be estimated
although typing the entire molecule is not economically practical at this time
because of its large size (about 58,000,000 nucleotides).
What are some of the observed or estimated mutation rates for various types of
DNA and DNA test haplotypes? See the following discussion of commonly used
mutation rates and their implications to genetic genealogists.
Estimated DNA Marker Mutation Rates for the Various Types of DNA
Reference Y-STR Y chromosome DYS marker mutation/slippage rate:
2 x 10^-3 or .002 per marker transmission event (birth of a new generation) is
a commonly used value. Source for this historically quoted Y-STR average
mutation rate is from FamilyTreeDNA circa 2001. This Y-STR mutation or repeat
count slippage rate is currently the subject of much debate.
Reference Mt-DNA molecule (maternal line non-nuclear DNA - located inside cells
but outside the nucleus) nucleotide (SNP) mutation rate:
3.0 x 10^-5 or .00003 per nucleotide transmission event (birth of a new
generation) is a commonly used value for D Loop HVR regions.
The 0.00003 one significant digit average mutation rate I'm using in this
overview is based on the work by Parsons, A High Observed Substitution Rate in
the Human Mitochondrial DNA Control Region, Nature Genetic, Vol. 15, April
1997, p.363-367. That report determined a rate of about 2.9 x
10^-5.
Reference Y-DNA nuclear molecule (paternal line nuclear DNA - located inside
the cell nucleus) nucleotide (SNP) mutation rate:
2 x 10^-8 or 0.00000002 per nucleotide transmission event (birth of a new
generation) is a commonly used value. The 0.00000002 one significant digit
average mutation rate I’m using is this overview was given in a
presentation given in February, 2003 to the Department of Biological Sciences
at
Number of Markers or Potential Markers in the Various Types of DNA
Number of Y-STR Y chromosome DYS markers in Y-DNA, i.e., the Y chromosome:
Surmised to be hundreds available although currently 67 or less are offered in
a single test by a single commercial Y-DNA testing company. In the not too
distant future commercial tests for typing over 100 YTRs
may be offered by a single company.
Number of nucleotides in the Mt-DNA molecule (non-nuclear DNA, i.e., located
outside the cell nucleus):
A frequently quoted number from the Cambridge Reference Sequence (CRS) for the
mtDNA molecule is 16,569 nucleotides of which 1050 nucleotides are currently
commercially tested in the non-coding section of the mtDNA molecule containing
two hyper variable regions 1 and 2 which are the regions tested in the typical
commercial mtDNA test. Commercial tests are now available to type the whole
mtDNA molecule.
Number of nucleotides in the Y-DNA molecule, i.e., the Y chromosome (part of
the nuclear DNA, i.e., located inside the cell nucleus):
About 58,000,000. Does anyone know the exact count from the first whole Y
chromosome sequenced? Does anyone know the exact count of the number of
nucleotides in the large non-recombining Y-DNA (NRY) portion of the Y
chromosome? In the future a commercial test may be become available to
economically sequence and type the whole Y chromosome.
DNA Haplotype
Mutation Rates
(a haplotype is a
set of DNA markers used together as a panel in a DNA test)
Typical Average Y-STR (paternal line) Haplotype Mutation Rates (ystrHMR)
(Note: The first example
calculations are assuming the historical .002 rate as the underlying Y-STR
average mutation rate is the correct overall rate which more recent studies (Kerchner 2005-2007) now
indicate it is not. Yet many newbies believe that is
the rate to use. Others have argued at various times that .002 is still the
correct rate and all the other studies indicating otherwise are statistical
aberrations. The below calculations are for example purposes only to define my
various sized haplotypes ystrHMR calculations. For
the different haplotype sizes a weighted average Y-STR mutation rate for each
size Y-STR panel or haplotype should be used. I have provided some examples of
those from my own Kerchner Surname Project.)
First let me explain
some math simplification assumptions I used to make things easier to calculate
and understand. I did this by
reducing the true exponential calculations to a simple arithmetic model for the
examples given later in this report. Here is my explanation of the
simplification.
Given that mu = the assumed Y-STR marker mutation rate and M = number of markers in the haplotype then:
Probability (new haplotype) = 1 – probability (no mutations at any marker in the haplotype)
Probability (no mutations at any marker in the haplotype) = (1-mu)^M
Probability of new haplotype = 1 – (1-mu)^M
For a specific example if we use M = 12 markers for an example haplotype size and a mu = 0.002 (an often quoted historical average Y-STR marker mutation rate) we get a probability of no new haplotype with each transmission event (birth) equal to: 1 – (1-.002)^12 = .0237
This whole process can be simplified for very small mutation rates, i.e., mu much, much smaller than one, and for a relatively small number of markers, i.e., M less than 100, by using a simpler approximation which gives very close to the same answer as the more complex equation. That simplified assumption is: Probability new haplotype = mu*M. For a 12 marker haplotype and a mutation rate of .002 this simplification yields .002*12 = 0.024 per transmission event which is very close to the actual true probability of .0237. Using this simplification allows one to use of simple arithmetic in my examples below to demonstrate the value of knowing the average mutation rate for your male line. Thus for the below simplified examples I have chosen to use this simplification to calculate the expected life expectancy of a haplotype, i.e., an estimate of how many generations a haplotype can remain unchanged given various assumed or determined average marker mutation rates. However, if you are familiar with using exponents, and have a pocket calculator which can do the exponent calculations using the precise formulas described above, then I of course recommend that you use the precise equations to get a more precise answer for your male line’s average haplotype life expectancy, i.e., how many generations the YSTR haplotype of a given size would likely on average survive unchanged in your male line, without at least one marker allele value change and thus creating a new and slightly mutated haplotype.
Reference Y-STR (12) haplotype mutation
rate (ystrHMR12) calculations:
.002 x 12 DYS STR markers = .024 per transmission event (birth of new
generation). (1/.024)=41.6. A new mutation can happen at any time but a 12
marker haplotype using the .002 historical rate
indicates that it can typically survive unchanged since the generation of the
prior mutation event for several dozen generations (transmission events). Thus
random matches are common with people of different surnames because of a shared
common ancestor who probably predates the adoption of surnames and written
family records.
For my Kerchner Surname Project the 12 marker average haplotype
mutation rate for ten people YDNA12 tested is .0044.
.0044 x 12 DYS STR markers = .0528 per transmission event (birth of new
Kerchner generation). (1/.0528)=18.9. Thus the longevity of the 12 marker
Kerchner haplotype on average can typically survive unchanged about 18.9
generations (transmission events). And since this is a time frame which
predates the adoption of surnames in some areas of
Reference Y-STR (25) haplotype mutation
rate (ystrHMR25) calculations:
.002 x 25 DYS STR markers = .050 per transmission event (birth of new
generation). (1/.05)=20. A new mutation can happen at any time but a 25 marker
haplotype using the .002 historical rate indicates it
can typically survive unchanged since the generation of the prior mutation
event for about 20 generations (transmission events). Random matches with
people of different surnames will be markedly reduced with a 25 marker test.
For my Kerchner Surname Project the 25 marker average haplotype
mutation rate for ten people YDNA25 tested is .0042.
.0042 x 25 DYS STR markers = .105 per transmission event (birth of new Kerchner
generation). (1/.105)=9.5. Thus the
longevity of the 25 marker Kerchner haplotype on average can typically survive
unchanged about 9,5 generations (transmission events).
This is well within the time frame of when surnames were adopted so one would
expect to see few if any random matches with different surnames. And in my
Kerchner project I don't have any random matches at 25 markers to other
surnames in the FTDNA database.
Reference Y-STR (26) haplotype mutation
rate (ystrHMR26) calculations:
.002 x 26 DYS STR markers = .052 per transmission event (birth of new
generation). (1/.052)=19.2. A new mutation can happen at any time but a 26
marker haplotype using the .002 historical rate
indicates it can typically survive unchanged since the generation of the prior
mutation event for about 19 generations (transmission events). Random matches
with people of different surnames will be markedly reduced with a 26 marker
test.
At present I have not calculated the 26 marker ystrHMR26 rate for my Kerchner
project. Only three of my project participants where tested at multiple labs.
So I don't have the necessary data for the 26 marker test.
At present I have not calculated the 26 marker ystrHMR26 rate for my Kerchner project. Only three of my project participants were tested at multiple labs. I don't have the necessary data for the 26 marker test.
Reference Y-STR (37) haplotype mutation rate (ystrHMR37) calculations:
.002 x 37 DYS STR markers = .074 per transmission event (birth of new
generation). (1/.074)=13.5. A new mutation can happen at any time but a 37
marker haplotype using the .002 historical rate
indicates it can typically survive unchanged since the generation of the prior
mutation event for a bit more than a dozen generations (transmission events).
Random matches will be minimal, if any. The resolving power of a 37 marker test
places the most likely time to recent common ancestor definitely in a time
frame of genealogical interest and a time frame when many male lines had
already adopted their surnames and written birth records started to be
maintained. If you share the same or similar surname and match closely with a
37 marker test you probably share a genealogically relevant common male
ancestor even if not known via the traditional evidence.
For my Kerchner Surname Project the 37 marker average haplotype
mutation rate for ten people YDNA37 tested is .0057.
.0057 x 37 DYS STR markers = .2109 per transmission event (birth of new
Kerchner generation). (1/.2109)=4.7. Thus the longevity of the 37 Kerchner
haplotype on average can typically survive unchanged about 4.7 generations
(transmission events). This is well, well within the time frame of when
surnames were adopted and well, well within the time frame when the American
colonies were settled, so one would not expect to see any random matches with
different surnames. And in my Kerchner project I don't see any random matches
at 37 markers to people with other surnames in the FTDNA database.
Reference Y-STR (43) haplotype mutation
rate (ystrHMR43) calculations:
.002 x 43 DYS STR markers = .086 per transmission event (birth of new
generation). (1/.086)=11.6. A new mutation can happen at any time but a 43
marker haplotype using the .002 historical rate
indicates it can typically survive unchanged since the generation of the prior
mutation event for a bit less than a dozen generations (transmission events).
Random matches will be minimal, if any. The resolving power of a 43 marker test
places the most likely time to recent common ancestor definitely in a time
frame of genealogical interest and a time frame when many male lines had
already adopted their surnames and written birth records started to be
maintained. If you share the same or similar surname and match closely with a
43 marker test you probably share a genealogically relevant common male
ancestor even if not known via the traditional evidence.
At present I have not
calculated the 43 marker ystrHMR43 rate for my Kerchner project. Only three of
my project participants were tested at multiple labs. I don't have the
necessary data for the 43 marker test.
Reference Y-STR (67) haplotype mutation
rate (ystrHMR67) calculations:
.002 x 67 DYS STR markers = .134 per transmission event (birth of new
generation). (1/.134)=7.5. A new mutation can happen at any time but a 67
marker haplotype using the .002 historical rate
indicates it can typically survive unchanged since the generation of the prior
mutation event for a bit more than seven generations (transmission events).
Random matches will be minimal, if any. The resolving power of a 67 marker test
places the most likely time to recent common ancestor definitely in a time
frame of genealogical interest and a time frame when many male lines had
already adopted their surnames and written birth records started to be
maintained. It is also a time frame when the American colonies were being
settled. If you share the same or similar surname and match closely with a 67
marker test you probably share a genealogically relevant common male ancestor
even if not known via the traditional evidence.
For my Kerchner Surname Project the 67 marker average haplotype
mutation rate for the seven people YDNA67 marker tested is .0043.
.0043 x 67 DYS STR markers = .289 per transmission event (birth of new Kerchner
generation). (1/.289)=3.5. Thus the longevity of the 67 Kerchner haplotype on
average can typically survive unchanged about 3.5 generations (transmission
events). This is well, well within the time frame of when surnames were adopted
and well, well within the time frame when the American colonies were settled,
so one would not expect to see any random matches with different surnames. And
in my Kerchner project I don't see any random matches at 67 markers to people
with other surnames in the FTDNA database. An anecdotal comment: In my Kerchner
family project I match my brother exactly at 67 markers. But I have a genetic
distance (GD) of 1 with my second cousin (thus 6 transmission events of
separation), i.e., we have a mutational difference at one DYS marker in the
upper 30 makers of the 67 marker panel. This observation correlates with what
is expected to be observed in our male line given the higher average mutation
rate consistently being observed in our male line as more people and markers
have been tested. See the Excel table for the details: http://www.kerchner.com/kerchner67mkrs.htm
Note: There is much debate at present as to whether the underlying Y-STR .002 average mutation rate used in my above haplotype mutation rate examples is correct. It is probably not. And that historical average rate should certainly not be used to calculate the various larger haplotype average mutation rates. My examples above demonstrate that large difference in results which are obtained by using different overall averages and project specific rates. A recent study by FamilyTreeDNA.com indicates the average Y-STR marker mutation rate may be more like .004. Relative Genetics has been at times using an average Y-STR marker mutation rate of about .003. If the underlying mutation rate on average is significantly higher than .002, then the expected longevity of the various haplotypes listed above on average will be proportionately much shorter. Thus the parent/original/ancestral haplotype of these mutated haplotypes would be likely fewer generations back in time. Higher Y-STR average mutation rates in some surname projects is what some surname project administrators have been reporting, i.e., average Y-STR mutation rates in their male line under study which is on average substantially higher than .002 and thus seeing more mutations in a time frame of relevant genealogical interest for the ancestral haplotype of a previously known common male ancestor. For example, in my Kerchner Surname Project the average mutation rate is .0044 for the 12 marker haplotype, .0042 for the 25 marker haplotype, .0057 for the 37 marker haplotype, and .0043 for the 67 marker haplotype for ten related participants in three independent descendant branches from the common male ancestor who was known by prior traditional genealogical research and family history.
Also, notice how when adding
the 3rd panel (going up to 37 markers) for the FTDNA haplotypes the
average mutation rate jumps up. In my opinion, the 0.002 historically quoted
mutation rate originally used for the Y-STR mutation rates for the overall male
population, and still widely mentioned in discussions, were based on only a few
markers which were relatively slow moving compared to some of the more recently
offered panels of markers used in genetic genealogy tests. Thus the originally
used average mutation rate was based on a few slower markers and does not hold
true as other new panels of markers were added. Trying to stick by the
historical .002 average Y-STR marker mutation rate when evaluating all
haplotype mutation rates, with those haplotypes containing many new fast moving
markers, is the cause of much confusion and anxiety in some family surname
projects, in my opinion. And then again, in some other family surname projects
the mutations are zero at 12 markers, zero at 25 markers, and zero at 37
markers. So it is clear that the average mutation rate varies from male family
line to male family line. From the Kerchner Ancestral 37 marker haplotype there
were eight unique mutations observed with ten people tested including a
parallel one which occurred separately in two of the independent descendant
branches. A faster than average Y-STR mutation rate is good for genealogists in
those families who have it, not bad. It helps us sort out the various
descendant branches by providing branch tags. And we also need to remember that
for male lines where the underlying Y-STR mutation rate is substantially higher
or lower than the historically quoted average rate, it will dramatically effect the estimates of time to most recent common ancestor
calculations, for those surname projects who do not know the most recent common
ancestor.
Therefore with significant variations in Y-STR average mutation rates from male
line to male line, in my opinion, it is becoming
increasingly important for surname project administrators to try to estimate
the average Y-STR marker mutation rate and Y-STR haplotype mutation rates for
the male line they are studying.
Someone asked me once. How do you do this if you don't know the MRCA ancestor
for all your surname project members but your gut instinct and the traditional
evidence and the similar haplotypes and surnames tells you that they are all
probably related in the last several hundred years? Well if you don't know the
ultimate common male ancestor and the ancestral haplotype you cannot do it
directly. My suggestion of the way to estimate the Y-STR average mutation rate
for males in a family surname project where the ultimate most recent common
ancestral haplotype for a few surmised to be related sub-branches/groups has
not been deduced, is to calculate the average mutation rate for each
independent branch, cluster, or group for the participants in hand so far that
you do know their respective more recent MRCAs. Let
me clearly preface this and repeat this by saying I am talking about surname
projects with clusters of participants who are known to be related to each
other and who share the same or similar surname with the other clusters in the
project who by prior traditional research were thought to be related, but the
common male ancestor for the whole project is not known, and the Genetic
Distance between the members of one cluster compared to another cluster is
close enough such that combination of sharing the same surname, the traditional
evidence, and now the new genetic distance evidence leads one to conclude these
clusters do share a common male ancestor in a time frame of genealogical
interest. For want of a better name I'll call it the "Sum of the Parts
Method". What I am suggesting is to deduce the ancestral haplotype for
each surmised sub-branch to calculate an estimated mutation rate for each
sub-branch. Use the known common male ancestor for each independent
branch/cluster as a reference to calculate the mutation rate for each cluster.
Sum up all the unique mutations and unique transmission events in each
cluster/branch and use the combined information to calculate an estimated Y-STR
average mutation rate for your family surname project. While this will probably
not yield the ultimate answer, it will provide an estimate none the less. And
then use that Y-STR average mutation rate to estimate the Y-STR haplotype
longevity as I did in my above examples instead of using the commonly used
average mutation rate of .002. I think this will get some surname project
administrators an answer closer to reality for how far back in time the common
ancestor for their family could be as compared to the one size fits all for the
Y-STR average marker mutation rate. I think this estimating approach would work
especially well with large projects with many members and mutations in the
clusters/branches. But again essential to this is that traditional evidence
exists (and the new genetic evidence is reinforcing) that these clusters do
share a common male ancestor in a time frame of genealogical interest, i.e.,
the last 500 years. For those of you with perfect haplotype matches and zero
mutations, well the estimated solution is indeterminate via this method. You
will have to continue to use the standard methodology offered by the testing
companies which uses the overall Y-STR average mutation rate for the overall
male population, rather than a surname specific male line mutation rate.
Hopefully this page provides useful information to surname project
administrators new to genetic genealogy who are looking at mutation rates and
haplotype mutation rates for their surname project(s).
Another point that is worthy of noting in regards to Y-STR haplotypes is this.
Given that the Y-STR marker average mutation rate varies from Y-STR marker to
Y-STR marker, the average Y-STR haplotype mutation rate will be dependent on
which markers are included in the set of markers which make up the Y-STR
haplotype. For example: one could choose a panel of 12 Y-STR markers from the
48 or so Y-STR markers available commercially and put together a 12 marker
Y-STR test and haplotype which would mutate significantly faster on average
than the first panel of 12 markers offered by FamilyTreeDNA. Or likewise one
could put together a Y-STR marker haplotype set of 12 markers which mutated on
average slower than the 12 marker panel offered by FamilyTreeDNA. Also, the
differences in which markers were used in the panels offered by the various
early commercial testing companies could be part of the reason why there were
differences in the average overall marker Y-STR mutation rates being reported
by different companies, early on in this new industry, i.e., FamilyTreeDNA
using .002 and Relative Genetics using .003 for Y-STR average marker mutation
rate. Of the two early 25 and 26 marker panels, only about half of the markers
in each panel overlapped, i.e., were the same markers. Before the modern 37
marker and 43 marker tests were offered one had to get tested at multiple labs to get Y-STR
marker data for more than 26 Y-STR markers. One or more of the companies could
have deduced the underlying average marker mutation rate they often cited from
the set of markers they were using, which differed from company to company. The
point is when comparing Y-STR haplotype mutation rates not
only is the haplotype's Y-STR marker count relevant but also which
markers are used. While I used the commonly used .002 average Y-STR marker
mutation rate in my example Y-STR Haplotype Mutation Rate (HMR) calculations,
the average marker mutation rate itself be dependent
on the markers used in the haplotype too. This can all get very complicated
very quickly as one digs deeper and deeper into the whole concept of haplotype
mutation rates. My reference example Y-STR haplotype mutation rates above are
simply that, examples. The examples were designed to show the Genetic
Genealogist how increasing the number of Y-STR markers tested helps them deduce
whether the close but non-exact match is relevant in a time frame of
genealogical interest. For more on why see this Zip+4 analogy
report I wrote: YSTR37
Extended Haplotype and Zip+4 Analogy
Typical Average Mt-DNA (maternal line) Haplotype Mutation Rate (mtsnpHMR)
Reference Mt-DNA (maternal line)
haplotype (HVR1 Region) mutation rate (mtsnpHMR540):
.000030 x 540 nucleotides (markers) in HVR1 region tested = .0162 per
transmission event (birth of a new generation). Thus an HVR1 mtDNA haplotype
can easily survive unchanged for about 62 generations (about 1550 years). This
is about 4.6 times slower than the typically used commercial Y-STR paternal
line haplotypes mutate. Thus the common female ancestor of two people who
randomly match exactly for the HVR1 region mtDNA haplotype usually long
predates a time frame of relevant genealogical interest, i.e., long before the
last 500 years. Therefore using mtDNA in a random search for matching maternal
lines will lead to many wild goose chases. But mtDNA can be used to verify
pre-existing traditional genealogy evidence that two maternal lines are
related, i.e., the direct maternal line of descent of two families is thought
to be from two women who are surmised to be sisters from prior paper trail or
oral history evidence. But be prepared for numerous random matches with lines
not recently related with mtDNA (maternal line) testing.
Reference Mt-DNA (maternal line)
haplotype (HVR1 and HVR2 regions) mutation rate (mtsnpHMR1050):
.000030 x 1050 nucleotides (markers) in HVR1 and HVR2 regions tested = .0315
per transmission event (birth of a new generation). Thus a combined (HVR1+HVR2)
mtDNA haplotype can easily survive unchanged for about 32 generations (about
800 years). This is about 2.4 times slower than the typically used commercial
Y-STR paternal line haplotypes mutate. Thus the common female ancestor of two
people who randomly match exactly for the combined HVR1+HVR2 mtDNA haplotype
usually long predates a time frame of relevant genealogical interest, i.e.,
long before the last 500 years. Therefore using mtDNA in a random search for
matching maternal lines will lead to many wild goose chases. But mtDNA can be
used to verify pre-existing traditional genealogy evidence that two maternal
lines are related, i.e., the direct maternal line of descent of two families is
thought to be from two women who are surmised to be sisters from prior paper
trail or oral history evidence. But be prepared for numerous random matches
with lines not recently related with mtDNA (maternal line) testing.
Average Y-DNA Nuclear (paternal line) Whole Molecule Mutation Rate
Reference Y chromosome nuclear (paternal
line) whole molecule mutation rate (ysnpHMR60meg):
For this calculation M is not small but is instead quite large, i.e., in the
order of 58,000,000. Thus if we use
the simplification method I described above for the smaller marker counts, we
get a large error in the result.
Using the simplication method above we would
get a Y chromosome (haplotype) mutation rate, .00000002 x 58,000,000
nucleotides (markers) in the molecule if completely sequenced and tested, equal to 1.2 per Y chromosome
transmission event (birth of a new generation). The actual results using the equation
described earlier in this paper of “probability of new haplotype = 1
– (1-mu)^M”, yields a Y chromosome
(haplotype) mutation rate of 0.687. Thus if we could economically sequence the
whole non-recombining region of the Y chromosome molecule we would expect to
see a SNP change roughly about once in every generation. Because of the sheer
number of nucleotides in the Y chromosome molecule's NRY region it is not
likely to survive much more than one generation unchanged. Thus a Y chromosome
would statistically survive about 1.45 generations without change based on the
reference Y-DNA nucleotide mutation rate of 2x10^-8. And rounding things off to an integer,
as a rule of thumb, I generally state in casual conversations that we can
expect about one new SNP per generation.
But finding that new SNP, would be a costly
challenge with today's technology (as of 2005). The cost of sequencing the
entire Y chromosome molecule at present makes it totally out of the question
for amateur genealogists. But think of it. In the not so far future we may be
able to economically sequence and type the entire Y chromosome for about
$1000. Then we will be able to find
our own unique family private Y-SNPs. But, for now we can now only economically
do SNP tests for known nucleotide mutation locations. The Y-SNP markers now
used are mutations which occurred long ago and which are found in large groups
of males. These Y-SNP markers are
used to sort human Y chromosomes into the various Haplogroups we hear
so much about in the anthropological use of DNA testing. The most recent common
male ancestor of two people who share the same Y-SNP test Haplogroup is many
thousands, even 10's of thousands of years ago. Thus Y-SNP testing, while
interesting, is not of much use for traditional genealogical purposes, other
than being able to tell you what large geographic area or continent of the
world your ancient male Y chromosome ancestor originally is thought to have
lived. Males who share the same 37, 43, or 67 Y-STR marker haplotype are very
likely fairly recently related via their direct male line. Males who share the
same SNP test haplogroup are in general not closely related and share that
haplogroup assignment with millions and millions of other males. Traditional
genealogists should only do a SNP test if they are interested in the ancient
anthropological origin of their Y chromosome, if they wish to learn the broad
geographical area of the modern world where that haplogroup is found in high
frequency today, or to 100% confirm a haplogroup assignment predicted from
one’s Y-STR haplotype pattern if that prediction is somewhat ambiguous.
Kerchner's
DNA Testing and Genetic Genealogy Info and Resources Page
Copyright
©2005-2023
Charles F. Kerchner, Jr., P.E.
All Rights Reserved
Created - 07 Jan 2005
Updated - 26 Jun 2023