Generation of ORFeome resources
The advent of systems biology necessitates the cloning of all of an organism's protein-encoding open reading frames (ORFs) collected into an ORFeome collection, so as to allow functional studies of the corresponding proteome. For the past decade, CCSB and the greater ORFeome Collaboration has been building a collection of human ORFs (hORFs) that are cloned into Gateway donor vectors. Using flexible recombinational cloning, these ORFs are easily transferred into myriad Gateway destination vectors. Several successive versions of the hORFeome exists (v1.1, v3.1, v5.1, v7.1, v8.1, v9.1). Currently, the collection contains approximately 18,000 of the 20,000 human protein-coding genes.
In addition to the core hORFeome collection, which typically contains one ORF per gene, CCSB has also assembled collections of disease-variant containing ORFs (CEGS project) and alternatively spliced ORF sequences.
In the ORF cloning pipeline:
The Gateway recombinational cloning strategy we use provides a robust platform for large-scale automated cloning for any genome of interest (Rual et al, Curr Opin Chem Biol 2004; Brasch et al, Genome Res 2004).
This well-defined ORFeome pipeline has:
In addition to the core hORFeome collection, which typically contains one ORF per gene, CCSB has also assembled collections of disease-variant containing ORFs (CEGS project) and alternatively spliced ORF sequences.
In the ORF cloning pipeline:
- Predicted ORFs are precisely amplified between annotated initiation and termination codons by PCR, either using a cDNA library as template or by RT-PCR, and specific primers with Gateway recombinational cloning sites add these sequences to the 5' end of the ORF;
- Resulting PCR products are recombined directionally into a Gateway donor vector to create entry clones;
- For alternatively spliced ORFs, ORF sequence tags (OSTs) are obtained from the entry clones, providing experimental evidence for the existence and intron-exon structure of the corresponding coding isoform.
The Gateway recombinational cloning strategy we use provides a robust platform for large-scale automated cloning for any genome of interest (Rual et al, Curr Opin Chem Biol 2004; Brasch et al, Genome Res 2004).
This well-defined ORFeome pipeline has:
- Provided experimental evidence for the existence of at least 17,300 genes in C. elegans (Reboul et al, Nat Genet 2001),
- Generated the first genome-wide attempt at cloning all predicted ORFs for a multicellular organism, leading to ~12,000 verified and cloned C. elegans ORFs (Reboul et al, Nat Genet 2003),
- Been used to generate genome-wide collections of cloned C. elegans promoter sequences (Dupuy et al, Genome Res 2004)
- Underlain a large-scale RACE approach for proactive experimental definition of the C. elegans ORFeome based on genome-scale application of 5’ and 3’ RACE to experimentally refine full-length worm ORFs (Salehi-Ashtiani et al, Genome Res 2009),
- Been used to generate genome-wide ORF collections for the pathogenic bacterium Brucella melitensis (Dricot et al, Genome Res 2004), the yeast Saccharomyces cerevisiae (Yu et al, Science 2008), and the construction of a Xenopus ORFeome is ongoing.