Comment to 'Dog Origins South of Yangtze River'
  • This is page 17 of the study mtDNA Data Indicates a Single Origin for Dogs South of Yangtze River, less than 16,300 Years Ago, from Numerous Wolves Analysis of complete mtDNA genomes reveals 10 subclades in clades A, B and C, with geographical representation following the East-to-West gradient There is clear difference in coverage of clade A among geographical regions, especially between ASY and the rest of the world (fig. 1b). This indicates that clade A, rather than being a single dense clade, may consist of several different phylogenetic subgroups with different geographical spread, groups that cannot be resolved based on the CR data. To study this geographical pattern in detail, and to obtain sufficient resolution for dating the dog origins and estimating the number of founders, we analysed almost the entire mtDNA genomes for 169 dogs and 8 wolves (16,195 bp analysed, repetitive and difficult-to-align regions were excluded). The samples were chosen to cover most of the mtDNA diversity for dog clades A, B and C according to the CR-based MS networks (fig. 1c), for the West (Europe, SW Asia, India and Africa) as well as for East Asia (Supplementary Dataset S2, fig. S1 in Supplementary Material). Phylogenetic analysis of the mtDNA genomes improved the resolution considerably, compared to analysis of the CR (fig. 2a). The two major phylogenetic clades, A and B, which were weakly supported in the CR based tree, obtained Bayesian support values of 100% in the genome based tree (fig. S3 and S4 in Supplementary Material). More importantly, the analysis also revealed a distinct substructure within clades A, B and C. Thus, the seemingly dense clades A, B and C are composed of a substructure of subclades (fig. 2a and b). Clade A had six major subclades, and B and C two each, giving a total of 10 subclades (or haplogroups), with high bootstrap and Bayesian support values (fig. 2a, fig. S3 and S4 in Supplementary Material), and separated by relatively large genetic distances (fig. 2b). For the CR part of the genome sequences, the 10 subclades group almost perfectly in separate parts of the MS networks (fig. 1c). Importantly, 5 of the 6 subclades of clade A, corresponding to those parts of the CR-based MS network which are empty for populations in the western populations, were found only in East Asia (fig. 2b). Accordingly, when all 1,576 CR sequences are assorted into the 10 subclades based on diagnostic mutations (see Supplementary Material for details), the geographical distribution of the subclades follow a distinct gradient; the complete set of 10 subclades is found only in ASY, while 7 are represented in Central China and Japan, 5 in North China, India and SW Asia, and only 4 in Europe (table 2, fig. 3a and 3b, table S2 in Supplementary Material). Only 1 of the 6 subclades of clade A is represented in Europe and SW Asia, and the missing 5 subclades correspond to the empty parts of the CR-based MS networks (fig. 1b and 1c). To conclude, the full extent of diversity for clades A, B and C, all the 10 major phylogenetic groups, is represented in the region comprising China south of Yangtze River and Southeast Asia, ASY. Outside this region only part of the total diversity is found, but it can be traced to a subset of the gene pool in ASY, basically the 14 universally occurring haplotypes, the UTs, which are distributed in 4 of the 10 subclades. Thus, the facts that nearly 100% of dogs in Europe and SW Asia have CR-based haplotypes closely related to the 14 UTs while Eastern populations have a large number of unique and distinct haplotypes, and that parts of the CR-based MS networks are empty for the western populations, can be attributed to the almost complete absence of 6 out of the 10 major phylogenetic groups in the western part of the Old World. Within ASY, there was no single subregion having all 10 subclades, but in relatively small samples from Yunnan (n=75), Southeast Asia (n=59) and Guizhou (n=57), 9, 9 and 8 subclades, respectively, were found (fig. 3b, table S2 in Supplementary Material). The smallest region containing all ten haplogroups comprises Yunnan and Southeast Asia, in the southwest of ASY. The simplest explanation for the observed geographical distribution of the 10 subclades of clades A, B and C is that they had a single origin within or close to ASY, and that only a subset of the original gene pool spread to the rest of the world. 