GOLDIE - Genome Comparison Database

The sketch of microbe cell(the rightmost picture) provided by Chris Woolverton - one of my collaborators

Comparative Genomics is an important aspect of bioinformatics. By comparing the genome sequence of a newly sequenced organism against the genome sequence of organisms with known functionality, the functionality of newly sequenced organism can be identified and studied. This computational technique complements wet lab techniques due to its efficiency and cost effecive nature.

In this research comparative genomics has been used to identify functionally equivalent and functionally similar genes, operons - groups of genes involved in a common functionality, metabolic pathways, genes with conserved functionality, and genes which are unique to a particular organism. The study has also been applied to analyze the regulation mechanism for translation and the evolution of microorganisms.

Automated identification of metabolic pathways, identification of genes unique to an organism, and the identification of conserved genes are all important for the development of specific anti-bacterial and specific bacteriostatic agents (drugs which arrest the growth of bacteria) and the identification of new pathways. There are many advantages of more specific antibacterial drugs such as reduced toxicity, reduced side effects, and smaller probability of resistant strains of pathogens (disease causing strains of bacteria).

Related Collaborators

  1. Peer Bork , EMBL, Heidelberg, Germany
  2. Peter Stuckey , Computer Science Department, University of Melbourne, Australia.
  3. Mikhail Gelfand, Institute of Protein Research, Russian Academy of Sciences, Puschino, Moscow region, Russia
  4. Terry Meyer, Department of Biochemistry, University of Arizona, Tucson, Arizona
  5. Christopher J. Woolverton , Department of Biological Sciences Kent State University, Kent, OH

Putative orthologs Database (sample version)

  1. E. coli str. K-12 vs. A. Tumefaciens Strain C_58
  2. E. coli str. K-12 vs. B. subtilis
  3. E. coli str. K-12 vs. Buchnera sp.
  4. E. coli str. K-12 vs. E. coli str. H0157
  5. E. coli str. K-12 vs. E. coli str. CFT073
  6. E. coli str. K-12 vs. H. influenzae
  7. E. coli str. K-12 vs. H. pylori strain J99
  8. E. coli str. K-12 vs. H. pylori strain 26995
  9. E. coli str. K-12 vs. L. lactis
  10. E. coli str. K-12 vs. M. acetivorans C
  11. E. coli str. K-12 vs. M. genitalium
  12. E. coli str. K-12 vs. M. janaschii
  13. E. coli str. K-12 vs. M. loti
  14. E. coli str. K-12 vs. M. pneumoniae
  15. E. coli str. K-12 vs. M. tuberculosis
  16. E. coli str. K-12 vs. N. meningitidis serogroup B strain MC58
  17. E. coli str. K-12 vs. P. abyssii
  18. E. coli str. K-12 vs. P. aeruginosa
  19. E. coli str. K-12 vs. P. multocida
  20. E. coli str. K-12 vs. P. syringae
  21. E. coli str. K-12 vs. P. horikoshii
  22. E. coli str. K-12 vs. P. furiosus VC1
  23. E. coli str. K-12 vs. R. prowazekii
  24. E. coli str. K-12 vs. S. flexneri 2a str. 301
  25. E. coli str. K-12 vs. S. typhii strain CT18
  26. E. coli str. K-12 vs. Synechocystis PCC6803
  27. E. coli str. K-12 vs. T. acidophilum
  28. E. coli str. K-12 vs. T. tengcongensis
  29. E. coli str. K-12 vs. T. maritima MSB8
  30. E. coli str. K-12 vs. T. pallidum
  31. E. coli str. K-12 vs. U. urealyticum
  32. E. coli str. K-12 vs. V. cholerae (both chromosomes)
  33. E. coli str. K-12 vs. V. parahaemolyticus RIMD (both chromosomes)
  34. E. coli str. K-12 vs. V. vulnificus CMCP6 (both chromosomes)
  35. E. coli str. K-12 vs. X. axonopodis Citri 306
  36. E. coli str. K-12 vs. X. fastidiosa
  37. E. coli str. K-12 vs. Yersinia Pestis strain CO92
  38. B. subtilis vs. M. tuberculosis
Gene-group Databases (sample version)

Genes with conserved functionality

Genomes: A. fulgidus, B. burgdoferi, B. subtilis, C. trachomatis, E. coli, H. influenzae, H. pylori, M. genitalium, M. janaschii, M. pneumoniae, M. thermoautotrophicum, M. tuberculosis, R. prowazekii, P. horikoshii, Synechocystis sp. PCC6803, and T. pallidum

Number of genomes Number of Genes Genes ( E. coli gene name / B. subtilis gene names for different gene names otherwise E. coli gene names)
17 55
{asnS, aspS}, alaS, cysS, dnaX, {efp, yeiP}, {ffh, ftsy}, fusA/fus, gltX, glyA, hisS, infB, ksgA, lon, metG, mopA/groEL, nusA, pheS, prlA/secY, recA, rplA, rplB, rplD, rplE, rplF, rplK, rplM, rplN, rplX, rplV, rpsB, rpsC, rpsD, rpsE, rpsG, rpsH, rpsI, rpsJ, rpsK, rpsL, rpsM, rpsQ, rpsS, rpoB, rpoC, serS, {tufA, tufB}, topA, {trxA, yfiG}, ygjD/ydiE, yhbZ/obg
16 24
adk, argS, clpB/clpC, eno, ftsZ, glyA, hflb, leuS, mrsA/ybbT, pepP/yqhT, pgk, pheT, pyrH/smbA, rplD, rplO, rplW, rpsM, rpsO, ruvB, secF, tmk, truA, uvrB, yfjB/yjbN
15 16
apt, ispB/gerCC, ndk, nth, orf.174/yluA, pnp/pnpA, prsA/prs, pyrG/ctrA, rplJ, rplR, rpsO, tpiA/tpi, mesJ/yacA, ycfF/hit, ycfH/yabD, ychF/yyaF

Specific Genes (incomplete - under construction) (with respect to E. coli compared to seventeen complete genomes mentioned above)

Pathogens: B. burgdoferi (Bb),C. trachomatis (Ct), H. influenzae (Hi), H. pylori (Hp), M. genitalium (Mg), M. pneumoniae (Mp), M. tuberculosis (Tb), R. prowazekii (Rp), T. pallidum (Tp) ( E. coli is abbreviated as Ec )

NOTE: The genes which are specific to more restricted group (group containing more genomes) is also specific to any subgroup. For example, hflc is also specific to the subgroup Ec-Hi-Rp

Genes ( E. coli name )
Ec-Bb-Hi-Hp-Rp-Tp: hflC
Ec-Bb-Ct-Hi-Hp: yjjT
Ec-Bb-Ct-Hi-Rp: dacA
Ec-Bb-Hi-Hp-Rp: pal
Ec-Ct-Hi-Mp-Rp: kdtA
Ec-Ct-Hi-Tb-Rp: cydB, orf.2883, yceC
Ec-Ct-Hi-Tp-Rp: nrdB
Ec-Ct-Mp-Tp-Rp: phnL
Ec-Bb-Hi-Tp: hflk, hrpA
Ec-Bb-Ct-Hi: greB
Ec-Bb-Ct-Rp: orf.762
Ec-Ct-Mg-Mp: yjcU
Ec-Ct-Hi-Tp: udp
Ec-Ct-Tb-Rp: uhpC
Ec-Hi-Tb-Rp: hupA
Ec-Hi-Hp-Tb: orf.1839
Ec-Hi-Tb-Tp: secE
Pathogens Genes ( E. coli name )
Ec-Mp-Tb: yhfV
Ec-Hi-Tb: fic, recC
Ec-Hi-Rp: hscA
Ec-Hi-Hp: ydeA
Ec-HI-Tp: asnA
Ec-Tb-Tp: add
Ec-Ct-Hi: aspC, orf.1597, orf.1600, orf.1602, trpR, tyrP, yigN, ytfL
Ec-Ct-Hp: cls, yejE, yjbC
Ec-Ct-Rp: orf.1142
Ec-Ct-Tb: appC, hisJ, orf.1575, orf.1724, xerC
Ec-Ct-Tp: fhiA, orf.3357, yrfE
Ec-Bb-Hi: pepD, orf.3153, yibQ
Ec-Hi-Rp: vacJ, secB, yfhE, bolA, cyaY, orf.2833
Ec-Hi-Hp: fucP, orf.634, sdaC, ykgB, phnA, orf.2936
Ec-Hi-Tb: plsB, glnD, tesB, orf.606, nadR, menC, tag, yijC, frdC frdD


Automated Genome Comparison, Evolution, and Pathogenicity

  1. A. K. Bansal, P. Bork, P. Stuckey, ``Automated Pair-wise Comparisons of Complete Microbial Genomes", Mathematical Modeling and Scientific Computing, Vol. 9, Issue 1, 1998, pp. 1 - 23

  2. A. K. Bansal and P. Bork, "Applying Logic Programming to Derive Novel Functional Information in Microbial Genomes," Proceedings of the First International Workshop on Practical Aspects of Declarative Languages , Lecture Notes in Computer Science, Publisher: Springer Verlag, No. 1551, (1999), pp. 274 - 289.

  3. A. Bansal, "An Automated Comparative Analysis of Seventeen Complete Microbial Genomes," Bioinformatics , (1999), Vol. 15, no. 11, pp. 900-908.

  4. V. Anderson and A. K. Bansal, "A Distributed Scheme for Efficient Pairwise Comparison of Large Complete Genomes," in the Proceedings of the International Conference of Information, Intelligence and Systems, Washington D. C., (1999), pp. 48-55.

  5. A. K. Bansal and T. E. Meyer, "Evolutionary Analysis by Whole Genome Comparisons," Journal of Bacteriology, Volume 184, No. 8, April 2002, 2260-2272

  6. T. E. Meyer and A. K. Bansal, "El\ evated CG Content in Hyperthermophile can Resolve Evolutionary Discrepancies \ between Analysis using Whole Genome Comparisons and 16SrRNA," Biochemi\ stry , August 2005; 44(34) pp 11458 - 11465


  1. A. Vitreschak, A. K. Bansal, M. S. Gelfand, "Conserved RNA structures regulate initiation of translation of Escerichia coli and Haemophilus influenzae ribosomal protein operons," Abstract of First International Conference on Bioinformatics of Genome Regulation and Structure, (1998), pp. 229

  2. A. Vitreschak, A. K. Bansal, I. I. Titov, M. S. Gelfand, Computer Analysis of Regulatory Patterns in Completely Sequenced Bacterial Genomes. Translation Initiation of Ribosomal Protein, Biofizika (1999), 44: 4, 601 - 610

Automated Reconstruction of Pathways

  1. A. K. Bansal, "A Framework of Automated Reconstruction of Microbial Metabolic Pathways," IEEE International Symposium on Bioinformatics and Biomedical Engineering , Washington, (2000), pp. 184-190.

  2. A. K. Bansal and C. Woolverton, "Applying Automatically Derived Gene-groups to Automatically Predict and Refine Microbial Pathways", IEEE Transactions of Knowledge and Data Engineering, Volume 15, No. 4, pp. 883-894

  3. A. K. Bansal, "Integrating Co-regulated Gene-groups and Pair-wise Genome Comparisons to Automate Reconstruction of Microbial Pathways," IEEE International Symposium on Bioinformatics and Biomedical Engineering , Washington, November 2001, pp. 209-216.