Kerchner's Zip+Four Analogy of Why to Upgrade to the 37 Y-STR Marker Extended Haplotype
A Tutorial Paper on Why One Should be Tested at or Upgrade to 37 Markers
by Charles F. Kerchner, Jr.
Genetic Genealogy DNA Testing Dictionary
An Overview of DNA Mutataion Rates by Charles Kerchner
Copyright ® 2005-2008 Charles F. Kerchner Jr. All Rights Reserved
Notice: Establishing or posting links to this report/webpage is encouraged and permitted. But, reuse or reprinting
it in its entirety or in part in other websites, email, or mail lists, newsleters, or in any other media or publication,
without my prior expressed permission, is not permitted. Printing a hard copy of this report for your own personal,
non-commercial use, or one or two personally printed copies for a genetic genealogist friend or two, is permitted.
But no mass reproduction in any formatted is permitted without prior expressed permission. Such permission can easily
be arranged by email once you explain what you want to do. But ask first. And as I said, links to the page with a simple
one or two summary lines in your webpage as to what this paper is about, in any genealogy or genetic genealogy webpage
are always acceptable, with or without prior permission.
FamilyTreeDNA (FTDNA) provides three different panels or suites of markers for Y-DNA testing which together
add up to a total of 37 markers, i.e., a 37 marker haplotype (set of numbers) which is a very good "Y-DNA signature"
for a male line Y chromosome. This 37 marker haplotype was developed by FTDNA and the University of Arizona as three
panels over the past five years in order to increase the resolving power of their Y-DNA paternal line test. It does
this in two ways. First by increasing the number of markers. And second by increasing the overall average marker mutation
rate as the second and third panels were added to FTDNA's product line working towards the present high resolution,
37 marker Y-DNA haplotype test. Note the word average and panel in that statement. Not every marker in panel two is
faster than every marker in panel one. But the average mutation rate of panel two as a set of markers is faster than
the average mutation rate of the panel one as a set of markers. And similarly the average mutation rate of panel
three is faster than that of panel two.
First there is the original 12 marker panel which is now named the Y-DNA 12 Paternal Line Test. It is also now known in
retrospect as the low resolution panel/test. Then they added a second panel of 13 more markers (named PP3 internally by
FTDNA). Combined with the first panel this yielded a 25 marker haplotype (set of 25 numbers). The combined 25 marker test
is named by FTDNA as the Y-DNA 25 Paternal Line Test. It is now known as the medium resolution test. And finally,
there is a third panel of 12 more markers (named PP5 internally by FTDNA) which when combined with the first two
panels yields a 37 marker haplotype (set of 37 numbers). The combined set of 37 markers is known as the high
resolution test.
In this analogy of haplotype to zip codes, it is also important to know that the average marker mutation rate of the markers
in the first panel is slower than that of the average marker mutation rate of the markers in the second panel. And the second
panel average marker mutation rate is slower than for the average marker mutation rate for the third panel. In fact the third panel
average marker mutation rate is almost twice as fast on average as the first panel. In October 2004 at the 1st International
Conference on Genetic Genealogy in Houston TX, FTDNA revealed some results of a new study of the Y-STR mutation rates for
the three haplotype panels as calculated from a large sample of surname projects data whose participants were known to be
related by prior traditional genealogy research. They used traditional descendant chart information showing the relationship
of each person tested in the surname projects to the Most Recent Common Ancestor (MRCA) for all those tested in the surname
project who met the criteria of the MRCA being known. From this data they calculated the various marker mutation rates.
That study indicated the first panel's average marker mutation rate was .00399.
The second panel's average marker mutaion rate was .00481. And the third panel's average marker mutation rate was .00748.
FTDNA had calculated the individual marker mutation rates but did not release that information. Their plan is to
release that information at some time in the future in a peer reviewed scientific paper.
My analogy, comparing a haplotype (set of numbers) to a zip code (set of numbers), is designed to demonstrate to a
beginner to Genetic Genealogy how adding more and more specially selected markers in haplotype panels provides a clearer indication
of the probability of a genealogical relevant recent relationship in Genetic Genealogy Surname Projects, let us use this
example.
Charles Kerchner's 37 Marker Y-DNA Haplotype:
13 24 14 11 11 16 12 12 12 13 13 29 - 17 8 10 11 11 26 15 19 30 15 15 16 16 - 11 11 19 22 16 15 18 17 36 37 12 12
Charles Kerchner's Zip+Four Postal Address Zip Code:
1 8 0 4 9 - 1 5 4 4
For this analogy and example let us rewrite the zip+four as: 1 8 0 - 4 9 - 1 5 4 4
To understand why it is important to either take the 37 marker test initially or to upgrade from the
12 marker test to the 37 marker test let's dig into the details of the Zip+4 analogy which I
developed and perfected over the last few years in talking to various genealogy groups who knew nothing
about Genetic Genealogy.
We are all familiar with ZIP+4. The first number gives us a major area of the country. For example if the first
number of your zip code is a 1 you are in the east coast area of the USA. And if your zip code begins with a 9 you
are in the west coast area of the USA. Numbers 2 to 8 in the first position define other areas of the USA as we
go east to west. In my analogy using only one number of the zip would be like having a test with may only one marker
in your haplotype (set of Y-DNA markers). Depending on which marker was your first number in your haplotype, it may
be very crudely predictive to a continent of the world where your Y chromosome originated from or maybe not at all.
In general the first number of a zip code has much more resolving power than a single marker in a Y-DNA test so this
analogy is not an exact one for one relationship. But it is the concept of adding more numbers which results in finer
and finer division of the potential population that is relevant to that number that is important. As
we add more and more numbers one can see how my simple analogy helps beginners grasp the concept of why more
YSTR markers are better than less and that labs are not just selling more markers to make more money.
The first two or three numbers of a zip code (180) in my example give us the regional post office. The next two numbers
(49) get you to my town and even closer to my family. And the last four numbers (1544) added to the end of the first
five get you to my exact family home or in more rural cases it may only define a carrier route. So you can see how
the Zip+4 analogy will either give you a fine enough resolution to either find my specific family home or in other cases
find someone who knows where it is.
Analogously, for the 37 marker haplotype, the first panel of 12 numbers gets us categorized in a very large group of very
distantly related male line families which are a segment of an anciently related large group of males called a haplogroup
of the human species. From the first 12 numbers of my haplotype I can estimate that I belong to a very large haplogroup
named R1b which is Western European in origin. That haplogroup assignment narrows down the pool of male lines to whom I
could be more recently related to from billions to many millions. And my specific 12 marker haplotype (set of numbers that
are the low resolution signature of my Y chromosome) narrows the portion of the haplogroup population pool that I could be
recently related to down to an even smaller segment within the R1b haplogroup based on how close the sets of numbers of
other lines are to my haplotype, i.e., set of 12 numbers. Thus the specific 12 marker haplotype signature brings the pool
size down from millions down to a few hundred of thousands, thousands, hundreds or maybe even just a few lines depending on
how rare the 12 marker haplotype is, and how narrowly I want to set the screen of potentially related candidates, i.e., how
many differences in the exact 12 numbers I will accept as indicative that the other haplotype patterns of 12 markers very
nearly matching mine can be considered recently related to mine in the time frame of interest. Thus with 12 markers I have
analogously narrowed down the subset of the haplogroup or pool of males in my near-match haplotype world address to a "regional
post office for my Y-DNA haplotype", i.e., Western Europeans, and to a specific subset of Western Europeans, i.e., those
that share my 12 marker haplotype exactly, or are very near to it. Some 12 marker haplotypes are shared or nearly shared
by large groups of males such as the Western Modal Atlantic Haplotype (WAMH). If you have that 12 marker WAMH haplotype
you will get many male line exact matches who do not share your surname and are most likely related to you from a time
frame prior to the adoption of surnames, i.e., from 600-1000 years or more ago. In my analogy that would be like if you
share a zip code prefix which is the one for the New York City area. With that zip code prefix you will have a larger pool
of people than if you share a zip code prefix for the Allentown PA area. You could now also add in the additional filtering
mechanism of shared surname and you would find very few Kerchners in Allentown PA area zip codes prefixed by 180. However
some are related to me and some are not. Even with a fairly rare surname like Kerchner knowing only the 180 zip code area
prefix is not definitive enough. And likewise having/knowing an exact 12 marker match or nearly exact 12 marker match with
another line is not all by itself good enough to be very certain you have found the correct family.
Adding the second panel of 13 more markers to your haplotype and getting your 25 marker haplotype is similar to knowing
the whole first five digits of a zip code, i.e., adding the (49) city/town code to the (180) prefix and getting the zip code
for my town, Emmaus PA, i.e., 18049, narrows things down considerably. You now know with more precision where my family
belongs in the geography of the USA. But also many other families share that same zip code with me, some maybe related
and some certainly are not related. Thus it is with a 25 marker haplotype. It narrows down where we belong in the human
population of the world.
While we have narrowed down the pool of other male lines who could possibly be related to me considerably with 25 markers,
with some haplotypes such as those in the Super Western
Atlantic Modal Haplotype they will probably still have many, many matches with people of different surnames, as do people
who share the zip code of a borough of New York City. Similarly, many people who live in Emmaus PA have my zip code of 18049
but do not share my surname and thus are most likely not related to me on the direct male line. Now if you add in a surname
match with the zip code match, well then with a rare surname like mine you would find the right family line in Emmaus.
But if I lived in Allentown PA and had the Miller surname it is still unlikely that you could narrow things down to the
correct male line. However in a macro sense, the exact or near matches to my 25 marker haplotype are more likely to be
related to me than the the pool of matches or near matches segregated out of the whole population using only 12 markers.
Now if we add the next "+four" digits to my Emmaus PA town zip code, or to a Miller male's address living in the city of
Allentown PA you can definitely find the correct family. It is much more probable that people living at the 18049-1544
ZIP+4 zip code are related to each other than those at the 18049 five digit zip code. And thus it is much more likely that
people who share an exact 37/37 match are more likely to be recently related than those who only share a 25 marker match.
And of course if you then add in the surname filter match combined with the with the 37 marker haplotype match (zip+4 match)
you are home.
No analogy is perfect and neither is this one. But I find this zip+code example to explain the value of upgrading the
number of markers tested to 37 works very well when speaking to beginners and newbies about which YSTR test to order
and whether or not to upgrade from the 12 marker test they took years ago.
Another advantage of course to adding more markers with FTDNA's well chosen panels of markers, is that since the second panel
mutates faster on average than the first, and the third panel mutates faster on average then the second panel, as you add
more and more markers which are also on average faster mutating markers, not only do you reduce the population set which
you could be related to, but you also narrow the statistical estimate and confidence bands for the estimate of the Time
to Most Recent Common Ancestor (TMRCA) for male lines you find who you match with your haplotype, and maybe surname too,
but for whom you do not know the true genealogical relationship or the common male ancestor via traditional genealogical
research and historical records. For more on TMRCA see the paper by Dr. Bruce Walsh. Link below.
I hope this paper helps experienced Project Admins and/or beginner Genetic Genealogists to explain to the people tested
already the value of upgrading 12 marker tests to 37 markers, or in selecting to take the 37 marker test right up front.
Comments welcome. More technical help as to why more markers are better can be found in the following links.
Time to Most Recent Common Ancestry Charts and Tables
and Comparison of 12, 25, 37, and 67 YSTR Marker Haplotype Estimates
provided by FamilyTreeDNA.
Time to Most Recent Common Ancestry Estimation Algorithm (TMRCA)
Using Genetic Marker Similarity Between Two Individuals
by Dr. Bruce Walsh of the University of Arizona
Kerchner's Genetic Genealogy DNA Testing Info & Resources Page
Copyright © 2005-2008
Charles F. Kerchner, Jr.
All Rights Reserved
Created - 5 Nov 2005
Updated - 31 May 2006