singhals wrote: > Didn't someone in the past, oh, say, month, mention software that would > compare databases and flag matches? > > Not necessarily a *specific* genealogy program database, jsut databases > in general? > > I'm looking for an easy way to vacuum up "hit" lists from Ancestry. WC, > Google, et al, and find the common ones. > > > Cheryl I don't recall anything like that and a quick google doesn't find anything. Wishful thinking? It's an interesting problem. First of all what's the format of the hit lists? Are the hits from all the sources in the same format? Secondly, most comparison tools that I can think of work on a specific file format, usually a flat text file although there are some that work on XML files. You would need to get the files into the appropriate format. Thirdly, many comparison tools do the opposite of what you want - they look for differences. My favourite approach to looking for multiple occurrences of *identical* lines across multiple files would be the Unix command cat x y z|sort|uniq -c|sort -rn|more where x, y & z would be 3 file names (you can cat as few or many files as you like). This will merge the contents into alphabetical order so that duplicates follow each other, process each line with the count of times it was found, re-sort them in descending order of count and page the output. You can then see which lines were in more than one file but not which file they were in. This requires that you have the hits in a common flat file format or can convert them to that; that hits which you would consider matching are identical within the files; that you either don't care which lists the matches were in, don't mind just comparing them in pairs or are prepared to hunt for them in the files and finally that you have access to Unix-style commands (if you're on Windows only, google for "cygwin"). -- Ian Hotmail is for spammers. Real mail address is igoddard at nildram co uk
On Thu, 06 Mar 2008 20:42:06 +0000, Ian Goddard <goddai01@hotmail.co.uk> wrote: >singhals wrote: >> Didn't someone in the past, oh, say, month, mention software that would >> compare databases and flag matches? >> >> Not necessarily a *specific* genealogy program database, jsut databases >> in general? >> >> I'm looking for an easy way to vacuum up "hit" lists from Ancestry. WC, >> Google, et al, and find the common ones. >> >> >> Cheryl > >I don't recall anything like that and a quick google doesn't find >anything. http://www.mudcreeksoftware.com/ has GenMatcher. -- Dennis
Ian Goddard wrote: > singhals wrote: > >> Didn't someone in the past, oh, say, month, mention software that >> would compare databases and flag matches? >> >> Not necessarily a *specific* genealogy program database, jsut >> databases in general? >> >> I'm looking for an easy way to vacuum up "hit" lists from Ancestry. >> WC, Google, et al, and find the common ones. >> >> >> Cheryl > > > I don't recall anything like that and a quick google doesn't find > anything. Wishful thinking? > > It's an interesting problem. First of all what's the format of the hit > lists? Are the hits from all the sources in the same format? > > Secondly, most comparison tools that I can think of work on a specific > file format, usually a flat text file although there are some that work > on XML files. You would need to get the files into the appropriate format. > > Thirdly, many comparison tools do the opposite of what you want - they > look for differences. My favourite approach to looking for multiple > occurrences of *identical* lines across multiple files would be the Unix > command > > cat x y z|sort|uniq -c|sort -rn|more > > where x, y & z would be 3 file names (you can cat as few or many files > as you like). This will merge the contents into alphabetical order so > that duplicates follow each other, process each line with the count of > times it was found, re-sort them in descending order of count and page > the output. You can then see which lines were in more than one file but > not which file they were in. > > This requires that you have the hits in a common flat file format or can > convert them to that; that hits which you would consider matching are > identical within the files; that you either don't care which lists the > matches were in, don't mind just comparing them in pairs or are prepared > to hunt for them in the files and finally that you have access to > Unix-style commands (if you're on Windows only, google for "cygwin"). > Yes, quite possibly I was mis-remembering either the details or the list. I couldn't find it either. (g) I've done it by hand, and it's not /that/ onerous, but the person who needs it would reach for the smellin' salts if I mentioned Unix or even CMD lines. Thanks. Cheryl