Aquilegia represents a unique clade of basal eudicots possessing a number of important unique features, including its phylogenetic position in the lower eudicots, unusual floral morphology (e.g., petaloid sepals, nectar spurs and staminodia), and its distribution in diverse ecological habitats. Collectively, all these traits contributed to Aquilegia being developed as a new model system for studying floral variation, adaptive radiations and evolution [23, 32, 36]. To further understand the genome structure and provide molecular insights bridging monocots and eudicots and facilitate molecular dissection of the traits associated with inflorescence development and environmental adaptations, a BAC-based genomic resource, including three BAC libraries and a physical map, was developed in this study. Among the three libraries were two libraries derived from A. formosa, representing 15.2X and 13.3X genome equivalents, respectively, for physical map construction. A third library was constructed from A. coerulea Goldsmith to have 20.7X genome coverage for further comparative genomics studies to address the molecular basis for floral variation and adaptive radiation within the genus. The Aquilegia physical map was composed of 50,155 clones and had a deep 21X genome coverage. Furthermore, a collection of BACs orchestrating a minimal tiling path from the contig assembly were isolated for BAC end sequencing to provide a glimpse of the genome organization of this model plant. Both the physical map and the BESs could also serve as landmarks for genome sequence assembly and anchoring ESTs to the genome. Marker hybridizations using a total of 197 markers associated with drought-stress, anthocyanin biosynthesis and floral development not only allowed integration of genetic map into the contig framework, but also identified candidate genomic regions for further gene isolation and characterization. The genome resource is expected to serve as a pivotal platform for comparative genomics study to elucidate genome variations between monocots and basal eudicot and to provide insights into the molecular mechanisms underlying environment adaptation and floral variations.
In recent years, HICF fingerprinting has been commonly applied to replace traditional agarose [41] and polyacrylamide gel methods [42] in various genome fingerprinting projects due to its high-throughput procedure, increased number of fragments generated from each clone and more improved contig assembly than other approaches [43]. In this study, an average of 81 restriction fragments was generated from the clones in the FPC project. The high-informative fingerprints provided high resolution identity from each clone for accurate contig assembly that can be further verified by marker hybridization in which 189 (96%) of the total 197 genetic markers hybridized to only 1 or 2 contigs instead of scattering around the entire genome. Furthermore, the positively hybridized clones were overlapped in clusters in most contigs, indicating that the contig assembly, which is based on fingerprinting similarity, is consistent with the sequence-based results. The accuracy of contig assembly could also be verified by PCR amplicon analysis as shown in Table 7. Thus, we are confident with the strategy for building a physical map that begins with contig assembly at high stringency at cutoff 1e-50 and tolerance 3, which gave a high average Sulston score of 0.879, followed by a series of End-End and Single-End merges of the small BAC scaffolds under gradually decreased stringency till 1e-35, followed by further manual editing at 1e-20 based on marker hybridization data. Among the successful 197 markers used for the hybridization were 87 markers that have been genetically mapped; these markers anchored a total of 54 contigs that cover 76.4 Mb (25.5% of the genome) on all 7 linkage groups (Table 4). These mapped contigs not only organize a framework to study the Aquilegia genome, but also pave the way for gene isolation and characterization by map-based cloning approach to further understand the genes of interest.
The genes involved in anthocyanin pigmentation biosynthesis in wheat are arranged in a gene cluster in the short arm of chromosome 7 [44–46]. Similar clustering of the genes involved in the biosynthesis of secondary metabolites was also reported from grapevine [47]. Unlike these species, the 16 anthocyanin biosynthesis related genes in Aquilegia appear to be dispersed in the genome (Table 6), suggesting the unique deployment of the genes in this lower eudicot genus. However, a number of additional genes belonging to the anthocyanin and broader flavonoid pathway have been identified [36] but not assayed here, and therefore the possibility cannot be ruled out that some gene clustering might be identified in the future. The contigs anchored from this study could serve as resource for unravelling the molecular basis underlying floral color variation and evolution.
An expansion in the physical span of the contigs was observed in this study. The collective physical span of all contigs as calculated by the CB map function of FPC software [35] was estimated to be 689.8 Mb (~ 2.3X genome size, 1N = 300 Mb). As only 197 marker hybridization results were analyzed and these markers were biased toward specific biological functions, it cannot be ruled out, although unlikely, that the contig assembly is not best optimized and some contigs remain to be further merged together. As the single A. formosa individual used for BAC library construction has been shown to be highly heterozygous at more than 30 SSR and SNP loci (Hodges, unpublished data), the excessive physical length might be due to the heterozygous genome collected from the field that was composed of highly diverse haplotype DNAs as a result of the outcrossing nature of the species. Similar inflated length from physical map has been reported from other outcrossing species, including poplar [48] and grapevine [49]. As the genome sequencing project is near finishing, further assembly and analysis of genome sequence will uncover more details about the genome components and suggest events that took place affecting genome structure of this basal eudicot taxa. To maintain the accuracy in contig assembly, further reduction in stringency to merge more contigs was not pursued in this study. In the future, fingerprint contig assembly can be refined through more hybridizations using additional mapped markers and probes designed from the end clones of contigs.
The BESs from the minimal tiling path clones also provided insights into the genome composition of this novel model plant, including low GC content, transposable elements and gene content. Interestingly, higher homology in putative coding regions shared between Aquilegia and the grapevine, V. vinifera, in comparison to two other model plants, including rice and Arabidopsis was also observed (Figure 5). As Vitis is affiliated with the earliest diverging lineage of rosids in the core eudicots of the angiosperms [50], and Aquilegia is in basal eudicots in the phylogenetic tree [23, 32], the close conservation between these two species not only provides a global molecular evidence to support the phylogenetic lineage that connects basal eudicots to core eudicots but also provides a rich resource for investigating the genome evolution, such as the events of genome duplication and subsequence variation [51–53], in the course from monocots to eudicots in angiosperms. In this report, preliminary comparative genomics studies using SyMAP uncovered 54 syntenic blocks between Aquilegia and Vitis (Figure 6). These syntenies provide a first glimpse of the Aquilegia structural organization and a rich resource to trace the events of DNA translocation during the evolution of these two lineages. Further characterization of the shared transposable elements from the Aquilegia genome will also provide insights into the evolution of plants. More extensive survey using the whole-genome sequence information in the near future is expected to aid in-depth studies into the evolution genomics of the basal eudicot taxa. On the other hand, the discovery that alignment of the BESs from the physical framework contigs failed to identify significant synteny with other reported genomes also reiterates the significance of the unique genome structure of Aquilegia in understanding the evolution of the plant genomes.







