Genealogy geek goals: organizing DNA test data

img_0432When genealogy is one of your passions, there comes a point in time when you go full nerd.

I’ve been known to get my geek on about life in bygone eras, political and cultural upheaval brought on by The Reformation, migration routes and diasporas.

Nowadays, I find myself fascinated by consumer DNA testing.

“I’m the future with a bunch of antique.” – will.i.am

Just who are these distant cousins?

Naturally, I quickly moved past the novelty of reported ethnicity percentages and went straight to trying to make sense of my family’s DNA matches.

The major DNA testing companies have different tools to help group matches into subgroups. And sure, genealogy websites that feature online family trees – some with auto-populating hints – have replaced the group sheets and relationship charts of yore, making it far easier to locate potential relatives and figure out shared ancestry.

But in order to really get organized and prove relationships – particularly those of distant ancestors – I eventually had to break down and create spreadsheets.

WHOOP! WHOOP! 🚨 ¡NERD ALERT! 🚨

 

Always first steps

Now, in my own family I have tested myself, my paternal aunt and three of my paternal grandmother’s first cousins. This has made for a great multi-generational group in which to compare matches pertaining to specific branches on my paternal line.

We have all tested with FamilyTreeDNA, but have also uploaded our DNA data to Gedmatch and MyHeritage. All three, but especially Gedmatch, have tools in which to find shared matches between two people.

So, from the shared matches within my family’s core test group, I can find even more matches. If my aunt and/or any of my grandmother’s cousins share a match, I can then compare who that particular match and they each share in common. And so on…

I can also see on which chromosome segments (and, most importantly, where on those segments) they match.

I then go to work figuring out our MRCA (most recent common ancestor). Some matches have their family trees well populated and made public. Others, I have had to email for information in hopes of finding the connection. Still others, I have had to do quite a bit of the genealogical “paper trail” (or digital trail) hunt myself in order to find our shared ancestor.

But soon this all becomes way too much information to keep organized in one’s own head.

Spreadsheet time!

As an example, I will show how I organize my data for my Baumgardner line. I use Google Sheets simply because I haven’t found anything easier (and yes, it can be tedious to enter data into a spreadsheet), and I can share the spreadsheet easily with others.

The goals for this spreadsheet are fourfold:

  • prove that two Baumgardner males were indeed brothers (sons of Christian Baumgärtner, Sr.) – this goal was accomplished over the past few months, as there were several descendants of both.
  • confirm other potential siblings. So far, yet to find descendants of those siblings who have DNA tested.
  • identify the maiden name (and family line) of Christian Baumgärtner, Sr.’s wife, Maria.
  • find others with Baumgardner ancestry that may descend from Christian Sr.’s siblings. Right now there are only a couple of leads.

First, I created a spreadsheet in Google sheets and entitled it Baumgardner.

Next, I created tabs within the spreadsheet for each chromosome (labeled Chr 1, Chr 2, etc.). You could start with just the chromosomes you know for sure have matches and add more as more matches on other chromosomes become apparent – or you could enter all 23 tabs from the get go and populate the tabs with data as the need arises.

On the first line of each tab I make the following columns:

Screen Shot 2019-03-28 at 17.40.16

Start and End are, of course, the numeric start and end points on a chromosome segment for each match. Centimorgans (cMs) and SNPs are helpful to show how strong the match is, as the larger the numbers in both columns the stronger the match.

I generally enter only matches I can compare in Gedmatch, so Match 1 and Match 2 have both a column for name (or initials or username) and the Gedmatch number.

Relationship is where I input the ancestor (if known) of each match.

Notes is where I write additional information (such as, if the match is only via MyHeritage or FamilyTreeDNA).

Next, comes the laborious task of data input…

Data, sweet data

Within Gedmatch I can do a one-to-one comparison between two users for both the standard 22 autosomal chromosomes and the X (or 23rd) chromosome. Note, I wrote user and not match. That is because unique to Gedmatch is the ability to do one-to-one comparisons with others who might not show as a match to one person but who are a match to another.

After adding names, I just copy and paste the numeric start and end points, cMs, and SNPs, and Gedmatch numbers into the appropriate columns.

Sometimes you’ll find matches on more than one segment of a certain chromosome.

Here’s an example of the Chr 1 tab from my Baumgardner spreadsheet:

Screen Shot 2019-03-29 at 12.46.17

As you can see, Elaine and David match others on Chromosome 1 on two distinct segments. For this example, I have spaced those two segments apart from each other.

The second segment matches I have color coded.  This is to visually indicate that these are triangulated matches that prove the genetic connection to a distant ancestor, Christian Baumgärtner, Sr.

An alternative (or honestly, typical) way of using a spreadsheet to group and compare DNA matches can be found on the Segmentology blog, but as a visual person I rather prefer my method.

I imagine there may be apps in development that are more visually appealing than a spreadsheet and hopefully can facilitate importing directly from the testing companies’ websites – or that the DNA testing companies themselves will soon come up up with even better tools to group triangulated matches. Auto-clustering is hot right now, but it doesn’t equal auto-triangulate.

Triangulation is key

There are several fantastic articles written by geneticists which delve into triangulation, and, simply put, triangulation is really the stuff when it comes to confirming ancestry using DNA.

(For more on the how-to and what-for of triangulation, I recommend reading Kitty Cooper’s excellent “Triangulation: Proving a Common Ancestor”.)

Once I begin organizing my DNA matches in spreadsheets for the different ancestral branches I am working on, I can count on a full blown nerdgasm when I put together a triangulated group. Sometimes there will be a mystery match or two (or more) in the group.

Screen Shot 2019-03-29 at 12.36.23

In this triangulated group on Chromosome 5, there are three matches for whom I have yet to find a common ancestor. Dee’s family tree shows that she indeed has Baumgardner ancestry – but neither of our lines go back far enough to find who is our MRCA. I hope that in time we will figure out the connection.

This group excites me because it indicates that the shared ancestry is further back in time, opening the road for new genealogical discoveries.

And what thrills a genealogist more than that?

COPYRIGHT (C) 2019 BY JANA SHEA. ALL MATERIALS PROTECTED UNDER THE LAWS OF COPYRIGHT. DO NOT COPY OR REPRODUCE WITHOUT AUTHOR’S PERMISSION.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s