RootsWeb.com Mailing Lists
Total: 8/8
    1. [DNA] Phasing from multiple sources
    2. McDonald, J Douglas
    3. I'm going to get a complete phase job on myself, for purposes of upload to all the places that can use it. I don;t have files for either parent. What I do have is 1) a comple genome from the Full Genomes Long Read product. This is phased into blocks covering well over 99% of the genome. I.E. its physical phasing. However, we have no idea which block goes with which parent. 2) I have data on one aunt, three first cousins, one first cousin once removed, and one second cousin once removed, all on my father's side. Together these generate triangulated blocks covering 90% of the genome. These are done by various companies. The question is .. need I reinvent the wheel? Or has someone done this and provided intsructions, or even software? Doug McDonald

    10/28/2018 07:04:00
    1. [DNA] Re: Phasing from multiple sources
    2. Tim Janzen
    3. Dear Doug, I have done some chromosome mapping and phasing for some clients who have your situation before except that I was using data from 23andMe and Family Finder. The Lazarus tool at GEDmatch is the closest tool that approaches what you are trying to do. However, Lazarus doesn't give you the phased data. It simply allows you to use it at GEDmatch. I did all of the work for projects like this previously in Excel. Sincerely, Tim Janzen -----Original Message----- From: McDonald, J Douglas [mailto:jdmcdona@illinois.edu] Sent: Sunday, October 28, 2018 6:04 PM To: genealogy-dna@rootsweb.com Subject: [DNA] Phasing from multiple sources I'm going to get a complete phase job on myself, for purposes of upload to all the places that can use it. I don;t have files for either parent. What I do have is 1) a comple genome from the Full Genomes Long Read product. This is phased into blocks covering well over 99% of the genome. I.E. its physical phasing. However, we have no idea which block goes with which parent. 2) I have data on one aunt, three first cousins, one first cousin once removed, and one second cousin once removed, all on my father's side. Together these generate triangulated blocks covering 90% of the genome. These are done by various companies. The question is .. need I reinvent the wheel? Or has someone done this and provided intsructions, or even software? Doug McDonald

    10/28/2018 07:33:42
    1. [DNA] Re: Phasing from multiple sources
    2. Ann Turner
    3. How long are the blocks from Full Genomes? Ann Turner On Sun, Oct 28, 2018 at 6:04 PM McDonald, J Douglas <jdmcdona@illinois.edu> wrote: > I'm going to get a complete phase job on myself, for purposes of upload to > all the > > places that can use it. I don;t have files for either parent. What I do > have is > > > 1) a comple genome from the Full Genomes Long Read product. This is phased > into blocks covering > > well over 99% of the genome. I.E. its physical phasing. However, we have > no idea which block goes with which parent. > > > 2) I have data on one aunt, three first cousins, one first cousin once > removed, and one second cousin once removed, all on my father's side. > Together these generate triangulated blocks covering 90% of the > > genome. These are done by various companies. > > > The question is .. need I reinvent the wheel? Or has someone done this and > provided intsructions, or even software? > > > Doug McDonald > > _______________________________________________ > Email preferences: http://bit.ly/rootswebpref > Unsubscribe > https://lists.rootsweb.com/postorius/lists/genealogy-dna@rootsweb.com > Privacy Statement: https://ancstry.me/2JWBOdY Terms and Conditions: > https://ancstry.me/2HDBym9 > Rootsweb Blog: http://rootsweb.blog > RootsWeb is funded and supported by Ancestry.com and our loyal RootsWeb > community >

    10/28/2018 08:53:20
    1. [DNA] Re: Phasing from multiple sources
    2. McDonald, J Douglas
    3. They claim that 50% of them are longer than 819 kilobases. Doug McDonald -----Original Message----- From: Ann Turner <dnacousins@gmail.com> Sent: Sunday, October 28, 2018 9:53 PM To: DNA Genealogy Mailing List <genealogy-dna@rootsweb.com> Subject: [DNA] Re: Phasing from multiple sources How long are the blocks from Full Genomes? Ann Turner On Sun, Oct 28, 2018 at 6:04 PM McDonald, J Douglas <jdmcdona@illinois.edu> wrote: > I'm going to get a complete phase job on myself, for purposes of upload to > all the > > places that can use it. I don;t have files for either parent. What I do > have is > > > 1) a comple genome from the Full Genomes Long Read product. This is phased > into blocks covering > > well over 99% of the genome. I.E. its physical phasing. However, we have > no idea which block goes with which parent. > > > 2) I have data on one aunt, three first cousins, one first cousin once > removed, and one second cousin once removed, all on my father's side. > Together these generate triangulated blocks covering 90% of the > > genome. These are done by various companies. > > > The question is .. need I reinvent the wheel? Or has someone done this and > provided intsructions, or even software? > > > Doug McDonald > > _______________________________________________ > Email preferences: http://bit.ly/rootswebpref > Unsubscribe > https://lists.rootsweb.com/postorius/lists/genealogy-dna@rootsweb.com > Privacy Statement: https://ancstry.me/2JWBOdY Terms and Conditions: > https://ancstry.me/2HDBym9 > Rootsweb Blog: http://rootsweb.blog > RootsWeb is funded and supported by Ancestry.com and our loyal RootsWeb > community > _______________________________________________ Email preferences: http://bit.ly/rootswebpref Unsubscribe https://lists.rootsweb.com/postorius/lists/genealogy-dna@rootsweb.com Privacy Statement: https://ancstry.me/2JWBOdY Terms and Conditions: https://ancstry.me/2HDBym9 Rootsweb Blog: http://rootsweb.blog RootsWeb is funded and supported by Ancestry.com and our loyal RootsWeb community

    10/29/2018 07:42:01
    1. [DNA] Re: Phasing from multiple sources
    2. McDonald, J Douglas
    3. I now have collected all the data in a gigantic spreadsheet .... I have my own data phased into the Full Genomes LR blocks, but these are still arbitrary as to father/mother. I have columns with the unphased allele pairs for 5 of my close paternal aunts and cousins. I'm still trying to get the raw data of one of them. I have columns indicating half-identical regions from FTDNA for three of those people, and equivalents by done by myself for three more paternal cousins who tested at Ancestry.com. These blocks cover 77.5% of my genome. When I get the next paternal one it will be 80%. I have found folks from my maternal side covering over half of the rest, but am having no luck getting them to send me the raw data. Having these would get it over 94% of the genome covered commercially. Now the big problem: the logic and computer programming to USE all this data!!!!!!!!!!! Has anybody yet done it? (Using the Full Genomes long read data). If so ... what did you do? Remember ... I have no parent data. Doug McDonald On Sun, Oct 28, 2018 at 6:04 PM McDonald, J Douglas <jdmcdona@illinois.edu> wrote: > I'm going to get a complete phase job on myself, for purposes of upload to > all the > > places that can use it. I don;t have files for either parent. What I do > have is > > > 1) a comple genome from the Full Genomes Long Read product. This is phased > into blocks covering > > well over 99% of the genome. I.E. its physical phasing. However, we have > no idea which block goes with which parent. > > > 2) I have data on one aunt, three first cousins, one first cousin once > removed, and one second cousin once removed, all on my father's side. > Together these generate triangulated blocks covering 90% of the > > genome. These are done by various companies. > > > The question is .. need I reinvent the wheel? Or has someone done this and > provided intsructions, or even software? > > > Doug McDonald >

    11/04/2018 09:13:05
    1. [DNA] Re: Phasing from multiple sources
    2. Tim Janzen
    3. Dear Doug, I haven't done this work with long read sequence data yet, but the logistics are very similar to doing it with 23andMe version 2 data which is how I started back in 2010. At that time I used my parents' data and my data and wrote a phasing program in Excel to phase the data. I then had 4 long columns in Excel. One column was the allele values my father passed on to me, one column was the allele values my father possessed but didn't pass on to me, one column was the allele values my mother passed on to me, and one column was the allele values my mother possessed did not pass on to me. This was (and still is) about 550,000 rows of data in Excel. I then assigned sections of each chromosome to my parents' parents, grandparents, and great grandparents as I got new matching segment data from first, 2nd, and 3rd cousins. I continue that process today. In your case, you have phased data, but you don't know which parent you got it from yet. I suggest you take a similar approach to what I did. Have two columns of data for each chromosome and then begin the process of assigning regions of the chromosome to your parents as you get matching segment data in from your relatives. You need to make sure that you try to find all of the crossovers in your genome. There will probably be about 40 to 60 of these in your entire genome. When you find a crossover, you need to switch which parent you assign the column to. I am inserting two rows from my parents' chromosome map to give you an idea how I structured it: rs3813199 1 1148140 G G G G PJ1863 NJ inf PP1847 MF and NC inf CLY1876 inf from LY to LaA comparison PEM1865 EH and DM inf rs3766186 1 1152298 C C C C PJ1863 NJ inf PP1847 MF and NC inf CLY1876 inf from LY to LaA comparison PEM1865 EH and DM inf You will want to do something similar to this except that you will have more rows if you use all of the sequence data. You will only have two columns of allele values. You will switch which parent you received the allele values from at each crossover location. Crossover locations can be difficult to pinpoint with precision if you don't have long read full sequence data for all of your relatives. Sincerely, Tim Janzen -----Original Message----- From: McDonald, J Douglas [mailto:jdmcdona@illinois.edu] Sent: Sunday, November 4, 2018 8:13 AM To: genealogy-dna@rootsweb.com Subject: [DNA] Re: Phasing from multiple sources I now have collected all the data in a gigantic spreadsheet .... I have my own data phased into the Full Genomes LR blocks, but these are still arbitrary as to father/mother. I have columns with the unphased allele pairs for 5 of my close paternal aunts and cousins. I'm still trying to get the raw data of one of them. I have columns indicating half-identical regions from FTDNA for three of those people, and equivalents by done by myself for three more paternal cousins who tested at Ancestry.com. These blocks cover 77.5% of my genome. When I get the next paternal one it will be 80%. I have found folks from my maternal side covering over half of the rest, but am having no luck getting them to send me the raw data. Having these would get it over 94% of the genome covered commercially. Now the big problem: the logic and computer programming to USE all this data!!!!!!!!!!! Has anybody yet done it? (Using the Full Genomes long read data). If so ... what did you do? Remember ... I have no parent data. Doug McDonald

    11/04/2018 10:05:23
    1. [DNA] Re: Phasing from multiple sources
    2. Ann Turner
    3. I've never even seen a sample of long-read output. What does it look like? Are you able to do some sort of lookup and extract information corresponding to a SNP genotype? Say your paternal cousin has a matching segment with a genotype of AA for rs123 on chromosome 1 at position 55,555,555. You have two long reads that straddle that position, and you've figured out that one of them has an A and the other has a T. Then you would know that the long read with the A is your paternal side. Ann Turner On Sun, Nov 4, 2018 at 8:13 AM McDonald, J Douglas <jdmcdona@illinois.edu> wrote: > I now have collected all the data in a gigantic spreadsheet .... I have my > own data phased into the > Full Genomes LR blocks, but these are still arbitrary as to father/mother. > > I have columns with the unphased allele pairs for 5 of my close paternal > aunts and cousins. > I'm still trying to get the raw data of one of them. > > I have columns indicating half-identical regions from FTDNA for three of > those people, and equivalents > by done by myself for three more paternal cousins who tested at > Ancestry.com. > These blocks cover 77.5% of my genome. When I get the next paternal one it > will be 80%. > > I have found folks from my maternal side covering over half of the rest, > but am having no luck getting > them to send me the raw data. Having these would get it over 94% of the > genome covered commercially. > > Now the big problem: the logic and computer programming to USE all this > data!!!!!!!!!!! > > Has anybody yet done it? (Using the Full Genomes long read data). > If so ... what did you do? Remember ... I have no parent data. > > Doug McDonald > > > > > > On Sun, Oct 28, 2018 at 6:04 PM McDonald, J Douglas <jdmcdona@illinois.edu > > > wrote: > > > I'm going to get a complete phase job on myself, for purposes of upload > to > > all the > > > > places that can use it. I don;t have files for either parent. What I do > > have is > > > > > > 1) a comple genome from the Full Genomes Long Read product. This is > phased > > into blocks covering > > > > well over 99% of the genome. I.E. its physical phasing. However, we have > > no idea which block goes with which parent. > > > > > > 2) I have data on one aunt, three first cousins, one first cousin once > > removed, and one second cousin once removed, all on my father's side. > > Together these generate triangulated blocks covering 90% of the > > > > genome. These are done by various companies. > > > > > > The question is .. need I reinvent the wheel? Or has someone done this > and > > provided intsructions, or even software? > > > > > > Doug McDonald > > > > _______________________________________________ > Email preferences: http://bit.ly/rootswebpref > Unsubscribe > https://lists.rootsweb.com/postorius/lists/genealogy-dna@rootsweb.com > Privacy Statement: https://ancstry.me/2JWBOdY Terms and Conditions: > https://ancstry.me/2HDBym9 > Rootsweb Blog: http://rootsweb.blog > RootsWeb is funded and supported by Ancestry.com and our loyal RootsWeb > community >

    11/04/2018 12:06:21
    1. [DNA] Re: Phasing from multiple sources
    2. McDonald, J Douglas
    3. The long read data comes in various forms. First is a BAM, unused in this sort of stuff (though used by me to find the big error in Build 38). It has the data with each read tagged by the primer used to identify it. The other is the short VCF file. It has an allele tagged with things like 0|1 or 1|0 or 1|1 (or 1|2 for positions with two different mutations) and these signify phased blocks. In between the phased blocks are 0/1 (or 0/1 or 1/2) to signify a break between blocks. Then there's a long VCF file that looks the same, but has a VAST number of common SNPs that may have a 0/0 or nocall in them. There are VERY few nocalls. Remember that Chromium data which this is doesn't have real long reads ... its bunches of regular Illumina reads which are tagged with DNA badcodes identifying which long piece they came from. Their software does the assignments based on knowing which long read a base comes from. They get long phase blocks by overlapping reads. Doug

    11/04/2018 12:58:11