RootsWeb.com Mailing Lists
Total: 2/2
    1. [AUTOSOMAL-DNA] Application of phased files from two parent one child trio
    2. Gregg Bonner
    3. My apologies for a topic which I am sure has already been addressed, nevertheless: I have a cousin "C" who has had her autosomal DNA tested by FTDNA, as have her parents "P" and "M". All three raw files were uploaded to gedmatch, and a pair of phased files was generated with respect to "C". The phasing was performed at gedmatch by "M". What now? My principle question is...what is the order of operations to follow using the pair of phased files to extract whatever data will "fall out" of the data itself? What I am principally interested in is using the files to reduce the number of false positive matching segments in the list of _"P"_. Is that possible, and if so, how do you go about doing it? Is it an excel solution followed by re-upload of yet another file to gedmatch? My other questions are about file input/output in gedmatch. Can we download any of the raw data files? Is the user offered a chance to download the phased files upon creation, or do they reside always at gedmatch only? It seems this should offer genotyping error cleaning. Is there a way to get "cleaned" output of a raw file based on the trio? And regarding input, is it the case in gedmatch usage that the "raw" file can be the RS number followed by either 1 or 2 letters, or a dash, or is it the case that the phased file has every phased RS appearing homozygous so that the input still has two characters per row? Just for completeness, I want to quote "M" - "What do I do with these results [i.e., the phased files]? And what will I know that I did not know before?" In short, I'd like like to know the application of phased files that extends beyond the simple notion that a match in "C" should have had to have come from either "P" or "M". Cheers, Gregg

    10/17/2013 04:03:23
    1. Re: [AUTOSOMAL-DNA] Application of phased files from two parent one child trio
    2. Tim Janzen
    3. Dear Greg, You need to always keep in mind precisely what your goals are with autosomal DNA testing. You are always trying to do two primary things: 1. You are trying to confirm your (or someone else's) family tree. 2. You are trying to extend your (or someone else's) family tree. To do the above you need to be carefully mapping your genomes as well as that of your close relatives. In your case, you want to be mapping the genome of your cousin's parent who is biologically related to you. You do that just exactly in the same way you would map the genome of your parents. What you really want to have at GEDmatch are the phased data files of the parent of your cousin who is biologically related to you (your aunt or your uncle). There should be two files, one for each set of autosomal chromosomes. You then want to be entering the data from the people who are matching one of those two files into your match list. I mentioned my mom's match list at https://dl.dropboxusercontent.com/u/21841126/23andMe%20and%20FF%20matches%20 for%20Betty%20Janzen%20(public).xls earlier this evening as an example of what you should be keeping. You then want to be creating a chromosome map for anyone you have found a genealogical connection with whose data you feel comfortable mapping. This would be similar to the one for my mom at https://dl.dropboxusercontent.com/u/21841126/phased%20genome%20of%20Robert%2 0and%20Betty%20Janzen%20(public).zip. If you aren't yet comfortable with chromosome mapping, I suggest you review the document that Emily Aulicino and I wrote at https://dl.dropboxusercontent.com/u/21841126/Basics%20of%20Chromosome%20Mapp ing.docx. So in summary, the sequence of the things you need to do in terms of GEDmatch are as follows: 1. Upload two phased data files of the parent of your cousin who is biologically related to you to GEDmatch. Use David Pike's utility or mine at https://dl.dropboxusercontent.com/u/21841126/phasing%20program%20(small%20ve rsion).xls to generate the phased data if you need to. 2. Record all of the matching segment data for each of these two phased data files in GEDmatch. 3. Get the pedigree charts for each of the matches (preferably as GEDCOMs). 4. Look for shared genealogical connections for everyone who is matching on the same segment. 5. When you are convinced of the ancestry of any particular segment then record that information on the chromosome map. GEDmatch doesn't allow you to download your phased data files for your personal use. It only creates them for you on their system. Keep in mind that GEDmatch is only creating one of the two phased files for each of the parents of your cousin. You need to create and upload the second one for each parent on your own. Yes, you can get cleaned phased data files if you phase the data yourself. You need to run a program in Excel or in some other way determine which SNPs are discordant. I have a phasing discrepancy file in Excel that I wrote specifically for this purpose. You will generally find about 200 to 300 SNPs that are discrepant in a two-parent/one child trio. You want to remove the data for all of these SNPs from your phased data files before you upload them to GEDmatch. The phased data files you want to upload to GEDmatch will only have a single allele value for each SNP. You may need to correspond directly with John Olson at GEDmatch at GEDmatch@gmail.com if you have any problems uploading your phased data files. What you will know when you are using phased data files at GEDmatch rather than simply unphased ones are which of the HIRs are IBD and which are IBS. Quite a few of the HIRs under 10 cMs are IBS. Using phased files helps you eliminate those from consideration in your match list. Keep in mind that when you are using phased files from a two-parent/one child trio you won't know where all of the crossovers are. Thus, there will be a few HIRs that are IBD that will be picked up when you are running the comparisons with an unphased file that won't be picked up when running the comparison using the phased file. The only way to sort that out is to do extensive chromosome mapping using data from first and second cousins so that you can determine approximately where the crossovers are. Then eliminate 25 to 50 SNPs on each side of where you believe the crossovers are and insert a "-" in your phased file in that region. Then put the correct phased data in each file. For instance, if you have extensively mapped the data for one of your cousin's parents, then put the data for one of the cousin's grandparents in one file and put the data for the other cousin's grandparent in the second file. Relatively few people can do that because they don't know where very many of these crossovers are. I know where about 80% of the crossovers are for my mom because I have mapped just over 80% of her genome at this point. Sincerely, Tim Janzen -----Original Message----- From: autosomal-dna-bounces@rootsweb.com [mailto:autosomal-dna-bounces@rootsweb.com] On Behalf Of Gregg Bonner Sent: Thursday, October 17, 2013 10:03 PM To: autosomal-dna@rootsweb.com Subject: [AUTOSOMAL-DNA] Application of phased files from two parent one child trio My principle question is...what is the order of operations to follow using the pair of phased files to extract whatever data will "fall out" of the data itself? What I am principally interested in is using the files to reduce the number of false positive matching segments in the list of _"P"_. Is that possible, and if so, how do you go about doing it? Is it an excel solution followed by re-upload of yet another file to gedmatch? My other questions are about file input/output in gedmatch. Can we download any of the raw data files? Is the user offered a chance to download the phased files upon creation, or do they reside always at gedmatch only? It seems this should offer genotyping error cleaning. Is there a way to get "cleaned" output of a raw file based on the trio? And regarding input, is it the case in gedmatch usage that the "raw" file can be the RS number followed by either 1 or 2 letters, or a dash, or is it the case that the phased file has every phased RS appearing homozygous so that the input still has two characters per row? Just for completeness, I want to quote "M" - "What do I do with these results [i.e., the phased files]? And what will I know that I did not know before?" Gregg

    10/17/2013 07:07:47