Educational and Training Curriculum
Glycolysis
Glycolysis is the sequence of reactions that converts glucose into pyruvate with the concomitant production of a relatively small amount of ATP. Glycolysis can be carried out anerobically (in the absence of oxygen) and is thus an especially important pathway for organisms that can ferment sugars. For example, glycolysis is the pathway utilized by yeast to produce the alcohol found in beer. Glycolysis also serves as a source of raw materials for the synthesis of other compounds. For example, 3 phosphoglycerate can be converted into serine, while pyruvate can be aerobically degraded by the Krebs or TCA cycle to produce much larger amounts of ATP.
BioTech Botanica
It is an education resource that discusses about the botanical compounds used in cancer treatment/research and about the plants that produce them.
Bioinformatics
Advances in molecular biology and the equipment available for research in this field have allowed the increasingly rapid sequencing of large portions of the genomes of several species. Several bacterial genomes, as well as those of some simple eukaryotes (e.g., Saccharomyces cerevisiae, or baker's yeast) have been sequenced in full. The Human Genome Project, designed to sequence all 24 of the human chromosomes, is also progressing. Popular sequence databases, such as GenBank and EMBL, have been growing at exponential rates. This deluge of information has necessitated the careful storage, organization and indexing of sequence information. Information science has been applied to biology to produce the field called Bioinformatics.
Bioinformatics deals with the creation and maintenance of databases of biological information. Nucleic acid sequences (and the protein sequences derived from them) comprise the majority of such databases. While the storage and or ganization of millions of nucleotides is far from trivial, designing a database and developing an interface whereby researchers can both access existing information and submit new entries is only the beginning.
Computational Biology
Is the analysis of sequencing of information gathered from Bioinformatics. Computational Biology is the name given to this process, and it involves the following:
The Creation of Sequence Databases
Most biological databases consist of long strings of nucleotides (guanine, adenine, thymine, cytosine and uracil) and/or amino acids (threonine, serine, glycine, etc.). Each sequence of nucleotides or amino acids represents a particular gene or protein (or section thereof), respectively. Sequences are represented in shorthand, using single letter designations. This decreases the space necessary to store information and increases processing speed for analysis. While most biological databases contain nucleotide and protein sequence information, there are also databases which include taxonomic information such as the structural and biochemical characteristics of organisms. The power and ease of using sequence information has however, made it the method of choice in modern analysis.
Search for Genes
The collecting, organizing and indexing of sequence information into a database, a challenging task in itself, provides the scientist with a wealth of information, albeit of limited use. The power of a database comes not from the collection of information, but in its analysis. A sequence of DNA does not necessarily constitute a gene. It may constitute only a fragment of a gene or alternatively, it may contain several genes. Genetic elements share common sequences, and it is this fact that allows mathematical algorithms to be applied to the analysis of sequence data. A computer program for finding genes will contain at least the following elements.
Elements of a Gene-seeking Computer Program
BioTech Protein Modelling
There are a myriad of steps following the location of a gene locus to the realization of a three-dimensional model of the protein that it encodes.
- Step One: Location of Transcription Start/Stop A proper analysis to locate a genetic locus will usually have already pinpointed at least the approximate sites of the transcriptional start and stop. This analysis is usually sufficient in determining protein structure. It is the start and end codons for translation that must be determined with accuracy.
- Step Two: Location of Translation Start/Stop The first codon in a messenger RNA sequence is almost always AUG. This reduces the number of candidate codons, the reading frame of the sequence must also be taken into consideration. There are six reading frames possible for a given DNA sequence, three on each strand, that must be considered, unless further information is available. Since genes are usually transcribed away from their promoters, the definitive location of this element can reduce the number of possible frames to three. There is not a strong concensus between different species surrounding translation start codons. Therefore, location of the appropriate start codon will include a frame in which they are not apparent abrupt stop codons. Incorrect reading frames usually predict relatively short peptide sequences. Therefore, it might seem deceptively simple to ascertain the correct frame. In bacteria, such is frequently the case. However, eukaryotes add a new obstacle to this process known as Introns.
- Step Three: Detection of Intron/Exon Splice Sites In eukaryotes, the reading frame is discontinuous at the level of the DNA because of the presence of introns. Unless one is working with a cDNA sequence in analysis, these introns must be spliced out and the exons joined to give the sequence that actually codes for the protein. Intron/exon splice sites can be predicted on the basis of their common features. Most introns begin with the nucleotides GT and end with the nucleotides AG. There is a branch sequence near the downstream end of each intron involved in the splicing event. There is a moderate concensus around this branch site.
- Step Four: Prediction of 3-D Structure With the completed primary amino acid sequence in hand, the challenge of modelling the three-dimensional structure of the protein awaits. This process uses a wide range of data and CPU-intensive computer analysis. Most often, one is only able to obtain a rough model of the protein, and several conformations of the protein may exist that are equally probable. The best analyses will utilize data from all the following sources:
Pattern Comparison: Alignment to known homologues whose conformation is more secure.
X-ray Diffraction Data: Most ideal when some data is available on the protein of interest. However, diffraction data from homologous proteins is also very valuable.
Physical Forces/Energy States: Biophysical data and analyses of an amino acid sequence can be used to predict how it will fold in space.
All of this information is used to determine the most probable locations of the atoms of the protein in space and bond angles. Graphical programs can then use this data to depict a three-dimensional model of the protein on the two-dimensional computer screen.
|