BIOINFORMATICS 2010 Abstracts


Full Papers
Paper Nr: 10
Title:

SYSTEMATIC POSITION AND PHYLOGENETIC RELATIONSHIPS OF THE CYCLOPHYLLIDEAN CESTODES - An In-silico Study using ITS2 rDNA and Sequence-structure Alignment

Authors:

Veena Tandon, Devendra Kumar Biswal, Pramod Kumar Prasad and Chenkual Malsawmtluangi

Abstract: The phylogenetic relationships and systematic position of cyclophyllidean cestodes have always been controversial and opinions of different authors on the systematic rank and content of this order have varied greatly. Molecular phylogenetic analysis based on ITS2 rDNA of 16 representatives spanning 6 different families (Mesocestoididae, Davaineidae, Anoplocephalidae, Taeniidae, Dipylidiidae and Hymenolepididae) of the Order Cyclophyllidea and one out group from the family Diphyllobothriidae of the Order Pseudophyllidea confirmed the monophyletic nature of the Order Cyclophyllidea. Further, the results were validated by bayesian analysis, primary sequence-structure alignment and subsequent molecular morphometrics analysis. At the major nodes all the trees from various analyses were similar. Mesocestoides was interestingly accommodated within Cyclophyllidea and served as a sister clade close to the families Taeniidae, Anoplocephalidae, Hymenolepididae and Dipylidiidae.

Paper Nr: 14
Title:

A SIMPLE ANALYTIC APPROACH FOR TRACKING RETINAL VESSELS AND MEASURING THEIR DIAMETERS

Authors:

Zafer Yavuz, Cevat Ikibas and Cemal Kose

Abstract: Retinal image processing provides tools for automatic diagnosis and monitoring of retinal diseases such as diabetic retinopathy (DR), age related macular degeneration (ARMD), glucoma and such. The properties of vessel structures on the other hand are widely utilized in locating morphologic structures such as optic disc and macula and in automatic diagnosis of the retinal diseases. Due to the importance of retinal vessels, we propose a simple approach for vessel tracking and measuring vessel diameter in retinal fundus images. Images having manually segmented retinal vasculatures are obtained from STARE database and used in this study. Our method first finds the midlines of the vessel network on the segmented images by employing Zhang-Suen thinning algorithm and then tracks the vessel branches through those midlines. Lastly, the diameters of the vessel segments in different parts of the vasculature are calculated along with the tracking operation. The performed test results show that the proposed automatic method is quite successfully tracks the vessel network and measure the diameter.

Paper Nr: 33
Title:

ReHap: AN INTEGRATED SYSTEM FOR THE HAPLOTYPE ASSEMBLY PROBLEM FROM SHOTGUN SEQUENCING DATA

Authors:

Filippo Geraci and Marco Pellegrini

Abstract: Single nucleotide polymorphism (SNP) is the most common form of DNA variation. The set of SNPs present in a chromosome (called the haplotype) is of interest in a wide area of applications in molecular biology and biomedicine. Personalized haplotyping of (portions of/all) the chromosomes of individuals is one of the most promising basic ingredients leading to effective personalized medicine (including diagnosis, and eventually therapy). Personalized haplotyping is getting now technically and economically feasible via steady progress in shotguns sequencing technologies (see e.g. the 1000 genomes project - A deep catalogue of human genetic variations). One key algorithmic problem in this process is to solve the haplotype assembly problem, (also known as the single individual haplotyping problem), which is the problem of reconstructing the two haplotype strings (paternal and maternal) using the large collection of short fragments produced by the PCR-based shotgun technology. Although many algorithms for this problem have been proposed in the literature there has been little progress on the task of comparing them on a common basis and on providing support for selecting the best algorithm for the type of fragments generated by a specific experiment. In this paper we present Re-Hap, an easy-to-use AJAX based web tool that provides a complete experimental environment for comparing five different assembly algorithms under a variety of parameters setting, taking as input user generated data and/or providing several fragment-generation simulation tools. This is the first published report of a comparison among five different haplotype assembly algorithms on a common data and algorithmic framework. This system can be used by researchers freely at the url: http://bioalgo.iit.cnr.it/rehap/.

Paper Nr: 35
Title:

BUILDING VERY LARGE NEIGHBOUR-JOINING TREES

Authors:

Martin Simonsen, Thomas Mailund and Christian N. S. Pedersen

Abstract: The neighbour-joining method by Saitou and Nei is a widely used method for phylogenetic reconstruction, made popular by a combination of computational efficiency and reasonable accuracy. With its cubic running time by Studier and Kepler, the method scales to hundreds of species, and while it is usually possible to infer phylogenies with thousands of species, tens or hundreds of thousands of species is infeasible. Recently we developed a simple branch and bound heuristic, RapidNJ, which significantly reduces the average running time. However, the O(n^2) space consumption of the RapidNJ method, and the NJ method in general, becomes a problem when inferring phylogenies with 10000+ taxa. In this paper we present two extentions of RapidNJ which reduce memory requirements and enable RapidNJ to infer very large phylogenetic trees efficiently. We also present an improved search heuristic for RapidNJ which improves RapidNJ’s performance on many data sets of all sizes.

Paper Nr: 37
Title:

STRUCTURE PREDICTION OF SIMPLE NON-STANDARD PSEUDOKNOT

Authors:

Thomas K. F. Wong and S. M. Yiu

Abstract: The secondary structure of an RNA molecule is known to be critical in its biological function. However, the problem of predicting the secondary structure of an RNA molecule based on its primary sequence is computationally difficult in the presence of pseudoknots. In general, the problem is NP-hard. Most of the existing algorithms aim at restricted classes of pseudoknots. In this paper, we consider a new class of pseudoknot structures, called simple non-standard pseudoknot, which can cover more complicated secondary structures found in existing databases. None of the previous algorithms can handle this class of pseudoknots. Only two of them, which run in O(m6) and O(m5) time where m is the length of the given RNA sequence, can handle certain cases in this new class. On the other hand, we provide a prediction algorithm that runs in O(m4) time for simple non-standard pseudoknots of degree 4 which already covers all known secondary structures of RNAs in this class.

Paper Nr: 45
Title:

CO-EVOLUTION IN HIV ENZYMES

Authors:

P. Boba, P. Weil, F. Hoffgaard and K. Hamacher

Abstract: Proteins as molecular phenotypes need to maintain their stability, fold, and the functionality throughout their individual and collective evolution. Such important properties are maintained by a selective pressure that reveals itself in sequence data sets. Small adaptive changes are usually possible, but in general the conservation of structure and function implies the co-evolution of amino acids within the molecule. We analyze two most important enzymes in the progression of viral infection by the human immunodeficiency virus (HIV) – namely the reverse transcriptase and the protease – under an information theoretical framework to derive insight into the selective pressure acting locally and globally on the enzymes. To this end we computed mutual information inside the proteins and between the proteins for some 40,000 sequences. We discuss the results of intra- and inter-protein co-evolution of residues in these enzymes and finally annotate important structural-evolutionary correlations. In particular we focus on the reverse transcriptase and a small signal indicating a potential coevolution between the protease and the reverse transcriptase. We convinced ourselves that our sampling is sufficiently large and that no normalization schemes needs to be applied. We conclude with a short outlook into potential implications for drug resistance development.

Paper Nr: 54
Title:

REACTION KERNELS - Structured Output Prediction Approaches for Novel Enzyme Function

Authors:

Katja Astikainen, Esa Pitkänen, Juho Rousu, Liisa Holm and Sándor Szedmák

Abstract: Enzyme function prediction problem is usually solved using annotation transfer methods. These methods are suitable in cases where the function of the new protein is previously characterized and included in the taxonomy such as EC hierarchy. However, given a new function that is not previously described, these approaches arguably do not offer adequate support for the human expert. In this paper, we explore a structured output learning approach, where enzyme function—an enzymatic reaction—is described in fine-grained fashion with so called reaction kernels which allow interpolation and extrapolation in the output (reaction) space. Two structured output models are learned via Kernel Density Estimation and Maximum Margin Regression to predict enzymatic reactions from sequence motifs. We bring forward two choices for constructing reaction kernels and experiment with them in the remote homology case where the functions in the test set have not been seen in the training phase. Our experiments demonstrate the viability of our approach.

Paper Nr: 70
Title:

PARALLEL CALCULATION OF SUBGRAPH CENSUS IN BIOLOGICAL NETWORKS

Authors:

Pedro Ribeiro, Fernando Silva and Luís Lopes

Abstract: Mining meaningful data from complex biological networks is a critical task in many areas of research. One important example is calculating the frequency of all subgraphs of a certain size, also known as the subgraph census problem. This can provide a very comprehensive structural characterization of a network and is also used as an intermediate step in the computation of network motifs, an important basic building block of networks, that try to bridge the gap between structure and function. The subgraph census problem is computationally hard and here we present several parallel strategies to solve this problem. Our initial strategies were refined towards achieving an efficient and scalable adaptive parallel algorithm. This algorithm achieves almost linear speedups up to 128 cores when applied to a representative set of biological networks from different domains and makes the calculation of census for larger subgraph sizes feasible.

Paper Nr: 71
Title:

LINEAR--TIME MATCHING OF POSITION WEIGHT MATRICES

Authors:

Nikola Stojanovic

Abstract: Position Weight Matrices are a popular way of representing variable motifs in genomic sequences, and they have been widely used for describing the binding sites of transcriptional proteins. However, the standard implementation of PWM matching, while not inefficient on shorter sequences, is too expensive for whole– genome searches. In this paper we present an algorithm we have developed for efficient matching of PWMs in long target sequences. After the initial pre–processing of the matrix it performs in time linear to the size of the genomic segment.

Short Papers
Paper Nr: 2
Title:

PMSGA: A FAST DNA FRAGMENT ASSEMBLER

Authors:

Juho Mäkinen, Jorma Tarhio and Sami Khuri

Abstract: The DNA fragment assembly is an essential step in DNA sequencing projects. Since DNA sequencers output fragments, the original genome must be reconstructed from these small reads. In this paper, a new fragment assembly algorithm, Pattern Matching based String Graph Assembler (PMSGA), is presented. The algorithm uses multipattern matching to detect overlaps and a minimum cost flow algorithm to detect repeats. Special care was taken to reduce the algorithm's run time without compromising the quality of the assembly. PMSGA was compared with well-known fragment assemblers. The algorithm is faster than other assemblers. PMSGA produced high quality assemblies with prokaryotic data sets. The results for eukaryotic data are comparable with other assemblers.

Paper Nr: 6
Title:

SEGMENTATION OF SES FOR PROTEIN STRUCTURE ANALYSIS

Authors:

Virginio Cantoni, Riccardo Gatti and Luca Lombardi

Abstract: The morphological complementarities of molecular surfaces provides insights for the identification and evaluation of binding sites. A quantitative characterization of these sites is an initial step towards protein based drug design. The final goal of the activity here presented is to provide a method that allows the identification of sites of possible protein-protein and protein-ligand interaction on the basis of the geometrical and topological structure of protein surfaces. The goal is to discover complementary regions (that is with concave and convex segments that match each others) among different molecules. In particular, we are considering the first step of this process: the segmentation of the protein surface in protuberances and cavities through an approach based on an analysis of the molecule Convex Hull and on the Distance Transform.

Paper Nr: 13
Title:

CELLMICROCOSMOS 4.1 - An Interactive Approach to Integrating Spatially Localized Metabolic Networks into a Virtual 3D Cell Environment

Authors:

Björn Sommer, Jörn Künsemöller, Norbert Sand, Arne Husemann, Madis Rumming and Benjamin Kormeier

Abstract: The high potential of Bioinformatics research concerning quantitative and qualitative data acquisition such as data warehouses, spatial structure prediction and 3D microscopy conveys the vision of generating a computational virtual cell. This paper discusses an approach which allows the creation and exploration of an abstract compartmented cell environment, which can be used for (semi-)automatic, species- and organelle-specific mapping and the comparison of metabolic data.

Paper Nr: 16
Title:

MODELING AND ANALYSIS OF BIRD FLU OUTBREAK WITHIN A POULTRY FARM

Authors:

Tertia Delia Nova, Herman Mawengkang and Masaji Watanabe

Abstract: Outbreak of avian influenza within a poultry farm is studied mathematically. A system of two nonlinear ordinary differential equations is introduced as a model. Unknown variables of these differential equations are populations of susceptible birds and infected birds. Analysis of the model shows that the most effective measure against outbreak of avian influenza within a poultry farm is a constant removal of infected birds, and that removal of infected birds can solely prevent an outbreak. The analysis also shows that vaccination is effective in conjunction with removal of infected birds, and that vaccination can not prevent an outbreak without the removal of infected birds.

Paper Nr: 20
Title:

A SUBSPACE METHOD FOR THE DETECTION OF TRANSCRIPTION FACTOR BINDING SITES

Authors:

Erola Pairo, Santiago Marco and Alexandre Perera

Abstract: Transcription Factor binding sites are short and degenerate sequences, located mostly at the promoter of the gene, where some proteins bind in order to regulate transcription. Locating these sequences is an important issue, and many experimental and computational methods have been developed. Algorithms to search binding sites are usually based on Position Specific Scoring Matrices (PSSM), where each position is treated independently. Mapping symbolical DNA to numerical sequences, a detector has been built with a Principal Component Analysis of the numerical sequences, taking into account covariances between positions. When a treatment of missing values is incorporated the Q-residuals detector, based on PCA, performs better than a PSSM algorithm. The performance on the detector depends on the estimation of missing values and the percentage of missing values considered in the model.

Paper Nr: 21
Title:

IMPROVED DISEASE OUTCOME PREDICTION BASED ON MICROARRAY AND CLINICAL DATA COMBINATION AND PRE-VALIDATION

Authors:

Jana Šilhavá and Pavel Smrž

Abstract: Combining relevant information from high-dimensional microarray data and low-dimensional clinical variables to predict disease outcome is important to improve treatment decisions. Such a combination may yield more accurate predictions than those obtained based on the use of microarray or clinical data alone. We propose a combination of logistic regression for clinical data and BinomialBoosting for microarray data. Then we propose its extension designed for redundant sets of data. Our approach combines microarray and clinical data at the level of decision integration. The extension includes pre-validation of models built with microarray and clinical data followed by weights calculation. Weights determine relevance of microarray and clinical models for data combination. Evaluations are performed with several redundant and non-redundant simulated datasets. Then some tests are applied to two real benchmark datasets. Our approach increases outcome prediction on non-redundant simulated datasets and does not decrease outcome prediction on redundant simulated datasets. Pre-validation of built models improves outcome of the prediction up to 4% in the case of real redundant dataset.

Paper Nr: 29
Title:

PROTEIN DOMAIN PHYLOGENIES - Information Theory and Evolutionary Dynamics

Authors:

K. Hamacher

Abstract: The ever-increasing wealth of whole-genome information prompts for phylogenies based on entire genomes. The quest for a good distance measure, however, poses a big challenge; e.g. because of large-scale evolutionary events such as genomic rearrangements or inversions. We introduce here an information theory driven measure that for the encoded protein domain composition of genomes as protein domains are key evolutionary entities. Thus the new method focuses on selective advantageous events. As evolving different protein domain compositions is more complex than single point mutations, the method makes longer evolutionary times accessible. Illustrating the new methodology we extract several phylogenetic trees for some 700 genomes, e.g. the separation of the three kingdoms of life, trees for mammals and bacillales, and a speculative result for plants (monocotyledons and dicotyledons). The method itself is shown to be robust against incomplete genome sampling. It has a consistent interpretation in both, information space at the sequence/information level and at the level of stochastic, evolutionary dynamics. In contrast to established protocols it becomes more accurate as more organisms are taken into account. Finally we show the equivalence to a (simplified) model of evolutionary dynamics of proteomes.

Paper Nr: 36
Title:

PROGNOSIS OF BREAST CANCER BASED ON A FUZZY CLASSIFICATION METHOD

Authors:

L. Hedjazi, T. Kempowsky-Hamon, M.-V. Le Lann and J. Aguilar-Martin

Abstract: Learning and classification techniques have shown their usefulness in the analysis of ana-cyto-pathological cancerous tissue data to develop a tool for the diagnosis or prognosis of cancer. The use of these methods to process datasets containing different types of data has become recently one of the challenges of many researchers. This paper presents the fuzzy classification method LAMDA with recent developments that allow handling this problem efficiently by processing simultaneously the quantitative, qualitative and interval data without any preamble change of the data nature as it must be generally done to use other classification methods. This method is applied to perform breast cancer prognosis on two real-world datasets and was compared with results previously published to prove the efficiency of the proposed method.

Paper Nr: 39
Title:

ESTIMATE VIGILANCE IN DRIVING SIMULATION BASED ON DETECTION OF LIGHT DROWSINESS

Authors:

Hong-Jun Liu, Qing-Sheng Ren and Hong-Tao Lu

Abstract: Avoiding fatal accidents caused by low vigilance level in driving is very important in our daily lives. Electroencephalography (EEG) has been proved very effective for measuring the level of vigilance. In this paper, we identify light drowsiness state from other states to estimate vigilance level decline by using support vector machine (SVM). Light drowsiness EEG is marked by alpha increasing to 50%. Alert EEG is marked by dominant beta activity and other EEG is labeled as sleep state. Samples of EEG data are trained in SVM program by using 4 features from each frequency band. Mutual information based feature selection method is used to reduce the dimension of features. The accuracy in classification of alert and light drowsiness reaches 91.5% on average.

Paper Nr: 41
Title:

HOLY-II: IMPROVED HIERARCHICALLY ORGANIZED LAYOUT FOR VISUALIZATION OF BIOCHEMICAL COMPLEX PATHWAYS

Authors:

Jyh-Jong Tsay, Bo-Liang Wu and Guo-Gen Huang

Abstract: Many complex pathways are described as hierarchical structures in which a pathway is recursively partitioned into several sub-pathways, and organized hierarchically as a tree. The hierarchical structure provides a natural way to visualize the global structure of a complex pathway. Recently, a hierarchically organized layout algorithm HOLY which takes the advantages of the hierarchical structures inherent in complex pathways has been proposed. In this paper, we present a new layout algorithm HOLY-II which follows the basic principle of HOLY, but improves HOLY by introducing a new algorithm for joining layouts, one of the crucial tasks in HOLY. Experiment shows that HOLY-II is able to produce layouts which clearly render both the global structures and the local structures of complex pathways, and gives better visualization for many examples from MetaCyc, CADLIVE and HOLY.

Paper Nr: 42
Title:

IN SILICO STUDY OF EXPRESSION PROFILES CORRELATION BETWEEN MICRORNAS AND CANCEROUS GENES

Authors:

Ka-Lok Ng and Chia-Wei Weng

Abstract: We investigate the possibility that microRNA can act as an oncogene or tumor suppressor gene. Experimentally verified microRNA target genes information (TarBase) are integrated with microRNA and mRNA expression data (NCI-60) to study this hypothesis, in which the Pearson correlation and Spearman rank coefficients are used to quantify these relations for nine cancer types. Correlation coefficients with negative values are used to filter out microRNA targets. Biological annotations of the targets are supplied by using the TAG, GO and KEGG records. The above information are utilized to provide a platform in identifying potential cancer related microRNAs. A web based interface is set up for information query and data display.

Paper Nr: 47
Title:

ON THE GRADIENT-BASED ALGORITHM FOR MATRIX FACTORIZATION APPLIED TO DIMENSIONALITY REDUCTION

Authors:

Vladimir Nikulin and Geoffrey J. McLachlan

Abstract: The high dimensionality of microarray data, the expressions of thousands of genes in a much smaller number of samples, presents challenges that affect the applicability of the analytical results. In principle, it would be better to describe the data in terms of a small number of metagenes, derived as a result of matrix factorisation, which could reduce noise while still capturing the essential features of the data. We propose a fast and general method for matrix factorization which is based on decomposition by parts that can reduce the dimension of expression data from thousands of genes to several factors. Unlike classification and regression, matrix decomposition requires no response variable and thus falls into category of unsupervised learning methods. We demonstrate the effectiveness of this approach to the supervised classification of gene expression data.

Paper Nr: 51
Title:

CODING BIOLOGICAL SYSTEMS IN A STOCHASTIC FRAMEWORK - The Case Study of Budding Yeast Cell Cycle

Authors:

Alida Palmisano

Abstract: In biology, modelling is mainly grounded in mathematics, and specifically on ordinary differential equations (ODEs). Using programming languages originally thought to describe networks of computers that exchange information is a complementary and emergent approach to analyze the dynamics of biological networks. In this work, we focus on the process algebra language called BlenX and we show that it is possible to easily reuse ODE models within this framework. In particular we focus on a well characterized biological network: the cell cycle of the budding yeast. This system has been studied in great details in the deterministic framework and data about a lot of mutants are available for the chosen model. It is interesting to note that the experimental phenotypic characterization of some mutants cannot be explained by the deterministic solution of the model, so in this work we propose a translation of the model in the stochastic framework in order to be able to verify if the inconsistencies are due to the noise that is affecting the system.

Paper Nr: 52
Title:

CONCEPTUAL MODELING OF HUMAN GENOME MUTATIONS - A Dichotomy Between what we Have and What we Should Have

Authors:

M. Ángeles Pastor, Verónica Burriel Coll and Óscar Pastor

Abstract: It is well-known in the bioinformatics domain that the millions of mutations and polymorphisms that occur in human populations are potential predictors of disease and any other type of human health related problems. Finding sound strategies for going from the Genotype to the Phenotype is probably the main challenge of the modern bioinformatics. Only with the sound knowledge provided by the IS theory, a systematic approach to large-scale analysis of Genotype-Phenotype correlations can be developed. The conceptual expressiveness of a well-known and widely-accepted database that stores the current information about genome mutations, Human Gene Mutation Database, is compared with the information that is relevant from a purely conceptual modelling perspective, and the result from this comparison is reported.

Paper Nr: 65
Title:

DETECTION OF NORMALITY/PATHOLOGY ON CHEST RADIOGRAPHS USING LBP

Authors:

Juan Manuel Carrillo-de-Gea and Ginés García-Mateos

Abstract: Since the discovery of X-rays and their applications, medical imaging has been a great help for radiologists in the diagnosis of diseases. In recent years, there has been a great effort in the computer vision community in the research of systems for the analysis and extraction of useful information from medical images. In this scenario, we have designed, implemented and validated a novel method to detect normality/pathology in chest radiographs, which constitutes the core of a computer-aided detection (CADe) system. Although the problem addressed is very complex and little explored, our approach is completely automatic, starting from the location of areas of interest using template matching techniques. The main novelty of our contribution is the application of a transformation known as local binary patterns (LBP) to these areas. LBP histograms are then used as input features for a classification system, which is ultimately responsible for the decision of normality/pathology. The results of our preliminary experiments are quite promising. With success rates in the best cases close to 90%, we believe that increased performance could be obtained with bigger training sets and more advanced classification systems, which will make these systems to be fully viable in the near future.

Paper Nr: 66
Title:

THE PLASMODIUM GLUTATHIONE S-TRANSFERASE - Bioinformatics Characterization and Classification into the Sigma Class

Authors:

Emilee E. Colón-Lorenzo, Adelfa E. Serrano, Hugh B. Nicholas Jr, Troy Wymore, Alexander J. Ropelewski and Ricardo González-Méndez

Abstract: Malaria is a global health problem caused by Plasmodium parasites. Glutathione S-transferase (GST) is involved in the conjugation of glutathione to drugs and toxic compounds. It is postulated that GST plays an important role in the development of drug resistance. The three-dimensional (3D) structure of Plasmodium falciparum GST (PfGST) has been solved and previous work indicates that the PfGST cannot be assigned to any of the known GST classes. We performed sequence analyses, structural modeling and alignment of GSTs from Plasmodium to known structures of the GST from other organisms to classify PfGST into a GST family. Sequence alignments using ClustalW, motif analysis using MEME, and phylogenetic analysis using MEGA4, of Plasmodium GSTs and 38 other GST sequences were done. The alignments and motifs show a close relationship to the alpha and sigma class of GSTs. The phylogenetic analysis places the Plasmodium GSTs in the sigma class. A comparison of PfGST with known structures of GSTs reveals high structural similarity to the sigma class GST, in particular within the H-site and C-terminus of the protein. These findings allow PfGST to be classified into the sigma class GSTs. These data may open new avenues for the development of novel antimalarials.

Paper Nr: 79
Title:

STUDY ON EFFECTS OF MICROORGANISM IN DEPOLYMERIZATION PROCESS OF XENOBIOTIC POLYMERS BY MODELING AND SIMULATON

Authors:

Masaji Watanabe and Fusako Kawai

Abstract: Effects of microorganism in biodegradation process of polyethylene glycol are studied by modeling and simulation. Dynamics of population of microorganism is taken into consideration in modeling of depolymerization process of exogenous type, and a mathematical model is described. A molecular factor of degradation rate is obtained by solving an inverse problem, and a time factor of degradation is obtained by analyzing the dynamics of population of microorganism. Once the time factor and the molecular factor of degradation rate are determined, a depolymerization process is simulated by solving an initial value problem.

Paper Nr: 82
Title:

STRUCTURAL MOTIF ENUMERATION IN TRANSCRIPTIONAL REGULATION NETWORKS

Authors:

Claire Luciano and Chun-Hsi Huang

Abstract: Network motifs are small connected subnetworks within a larger network that occur in statistically significant quantities and may indicate functional regions of the network. Network motif software tools employ algorithms that compare a network to randomly generated networks in order to identify subnetworks that occur in frequencies higher than would be expected by random chance. The transcriptional regulation network of E. coli has been represented as a network and evaluated using both full enumeration and an edge sampling algorithm. Several significant network motifs were identified, including feedforward loops and bipartite graphs. This paper applies both full enumeration and a different sampling algorithm, randomized enumeration, to the E. coli network using the newer software tool FANMOD. Evaluating the E. coli transcriptional regulation network with FANMOD also identified feedforward loops and bipartite graphs as significant network motifs. Sampling identified fewer and less significant motifs than full enumeration, however, sampling enables the evaluation of larger subgraph sizes.

Paper Nr: 83
Title:

PROTEIN FOLDING, MOLECULAR DOCKING, DRUG DESIGN - The Role of the Derivative “Drift” in Complex Systems Dynamics

Authors:

Corrado Giannantoni

Abstract: The relevance of Protein Folding is widely recognized. It is also well-known, however, that it is one of the dynamic problems in TDC considered as being intractable. In addition, even in the case of solutions obtainable in reasonable computation time, these always present a “drift” between the foreseen behavior of the biological system analyzed and the corresponding experimental results. A drift which is much more marked as the order of the system increases. Both the “intractability” of the problem and the above-mentioned “drifts”, as well as the insolubility of the problem in explicit terms (or at least in a closed form), can be overcome by starting from a different gnoseological approach. This suggests a new definition of derivative, the “incipient” derivative. The solution to the “Three-body Problem” obtained by means of IDC, and its extension to any number of bodies, allows us to assert that the folding of even a macroscopic protein, such as dystrophin for example, made up of about 100,000 atoms, can be carried out in a few minutes, when the model is run on next generation computers (1 Petaflop). The same methodology can also be applied to both Molecular Docking and computer-aided Drug Design.

Posters
Paper Nr: 17
Title:

SUPPORTING SCIENTIFIC BIOLOGICAL APPLICATIONS WITH SEAMLESS DATABASE ACCESS IN INTEROPERABLE E-SCIENCE INFRASTRUCTURES

Authors:

Sonja Holl, Morris Riedel, Bastian Demuth, Mathilde Romberg and Achim Streit

Abstract: In the last decade, computational biological applications have become very well integrated into e-Science infrastructures. These distributed resources, containing computing and data sources, provide a reasonable environment for computing and data demanding applications. The access to e-Science infrastructures is mostly enstablished via Grids, where Grid clients support scientists using different types of resources. This paper extends an instance of the infrastructure interoperability reference model to remove the lack by adding centralized access to distributed computational and database resources via a graphical Grid client.

Paper Nr: 32
Title:

DISTRIBUTED FREQUENCY SORTING IN SPECTRAL VIDEO ANALYSIS OF DNA SEQUENCES

Authors:

Anca Bucur, Jasper van Leeuwen and Nevenka Dimitrova

Abstract: DNA spectral analysis, i.e. the analysis of DNA spectrograms, has been proposed as a method to systematically investigate DNA patterns, which may correspond to relevant biological features. The Frequency Sorting method sorts the sequences in spectral domain based on their frequency content, and detects and groups those sequences exhibiting one or more strong patterns in the same frequencies. In this paper we propose a novel distributed algorithm for Frequency Sorting and report on the performance results of our implementation for the alignment in spectral domain of the human chromosome 21. Distributed Frequency Sorting enables efficient spectral alignment and allows for the easy detection of strong patterns in both single and multiple frequencies.

Paper Nr: 40
Title:

PROTEINS POCKETS ANALYSIS AND DESCRIPTION

Authors:

Virginio Cantoni, Riccardo Gatti and Luca Lombardi

Abstract: The development of computational techniques to guide the experimental processes is an important step for the determination of the protein functions. The purpose of the activity here described is the characterization of the active sites in protein surfaces and their quantitative representation. A few pocket parameters like volume, travel depth, mouth area and perimeter, amplitude parameters, interfacial area ratio, summit density and mean summit curvature are hierarchically accessible through a concavity tree that topologically represents the entire protein molecule. This structural representation is particularly useful for the evaluation of binding pockets, the comparison of the morphological similarity and the identification of potential ligand docking.

Paper Nr: 55
Title:

STATISTICAL ANALYSIS OF BIOMOLECULAR DATA USING UNICORE WORKFLOWS

Authors:

Marcelina Borcz, Rafał Kluszczyński and Piotr Bała

Abstract: Nowadays the role of e-Science is important, especially in the area of life sciences. Experiments and their analysis are carried out in collaboration of many scientific groups from institutes located all over the world. Moreover, they work with immense amount of data which usually needs to be processed statistically. Therefore, the need for computing power is increasing. It usually can not be supplied by a standard laboratory. That is why e-Science makes use of grid technology. UNICORE (Uniform Interface to Computing Resources) is a middleware enabling access to the Grid resources in a seamless and secure way. In this paper we present UNICORE gridbean for statistical R environment which enables to process statistically data on the Grid. Being used as a part of more complex workflow task it can analyze results given by another applications and calculate needed statistics. By presenting example workflow constructed in UNICORE Rich Client application, authors show power of the Chemomentum workbench built on UNICORE Grid system.

Paper Nr: 56
Title:

CONVEX SHAPE RETRIEVAL FROM EDGE MAPS BY THE USE OF AN EVOLUTIONARY ALGORITHM

Authors:

A. Nezhinsky, J. Kruisselbrink and F. Verbeek

Abstract: There is a need for a high-throughput approach for extracting biological shapes from images. The approach for automated extraction of convex biological shapes presented in this paper is an Evolutionary Algorithm. As opposed to existing model based segmentation methods this approach is uniform for different images, needs no training set and is initialized automatically. The process of finding the shape is considered an optimization problem and for that reason an Evolutionary Algorithm was a good candidate for a solution. The results show that the proposed Evolutionary Algorithm gives a fast solution for pattern recognition and shape extraction.

Paper Nr: 58
Title:

IMPROVING SEARCH FOR LOW ENERGY PROTEIN STRUCTURES WITH AN ITERATIVE NICHE GENETIC ALGORITHM

Authors:

Glennie Helles

Abstract: In attempts to predict the tertiary structure of proteins we use almost exclusively metaheuristics. However, despite known differences in performance of metaheuristics for different problems, the effect of the choice of metaheuristic has received precious little attention in this field. Particularly parallel implementations have been demonstrated to generally outperform their sequential counterparts, but they are nevertheless used to a much lesser extent for protein structure prediction. In this work we focus strictly on parallel algorithms for protein structure prediction and propose a parallel algorithm, which adds an iterative layer to the traditional niche genetic algorithm. We implement both the traditional niche genetic algorithm and the parallel tempering algorithm in a fashion that allows us to compare the algorithms and look at how they differ in performance. The results show that the iterative niche algorithm converges much faster at lower energy structures than both the traditional niche genetic algorithm and the parallel tempering algorithm.

Paper Nr: 59
Title:

RECURSIVE BAYESIAN NETS FOR PREDICTION, EXPLANATION AND CONTROL IN CANCER SCIENCE - A Position Paper

Authors:

Lorenzo Casini, Phyllis McKay Illari, Federica Russo and Jon Williamson

Abstract: The Recursive Bayesian Net formalism was originally developed for modelling nested causal relationships. In this paper we argue that the formalism can also be applied to modelling the hierarchical structure of physical mechanisms. The resulting network contains quantitative information about probabilities, as well as qualitative information about mechanistic structure and causal relations. Since information about probabilities, mechanisms and causal relations are vital for prediction, explanation and control respectively, a recursive Bayesian net can be applied to all these tasks. We show how a Recursive Bayesian Net can be used to model mechanisms in cancer science. The highest level of the proposed model will contain variables at the clinical level, while a middle level will map the structure of the DNA damage response mechanism and the lowest level will contain information about gene expression.

Paper Nr: 61
Title:

PICNIC - Portal-based Platform for MRI Processing of Neurodegenerative Diseases

Authors:

J. Delgado-Mengual, Y. Vives-Gilabert, A. Sainz-Ruiz, M. Delfino-Reznicek and B. Gómez-Ansón

Abstract: The use of medical image processing techniques is increasing, especially those applied to the early diagnosis of diseases, like neurodegenerative diseases. The software tools involved in it, are sometimes hard to use for medical researchers and hospitals don't have neither the hardware resources nor the personnel expertise, to accomplish the requirements. With these necessities born PICNIC, a technological hardware-software platform with a web portal interface that integrates MRI processing tools with a user-friendly interface, a database to manage clinical data and other services like the de-identification, visualization images and job’s monitoring.

Paper Nr: 62
Title:

MULTIVARIATE STUDY OF ACHEIS MOLECULES - Mapping Pharmacophoric Profile of AChEIs Via PCA

Authors:

Érica Cristina Moreno Nascimento and João Batista Lopes Martins

Abstract: Alzheimer's disease (AD) is a degenerative dementia. The causes of AD are not well determined, and the most popular strategy for AD treatment is the cholinergic hypothesis, that consists in the use of drugs with an inhibitory effect on acetylcholinesterase (AChE) enzyme, to prevent the decrease on the neurotransmitter (acetylcholine) concentration in synaptic clefts. Structural, electronic and spatial parameters of 10 drugs with known inhibitory effect on AChE (AChEI) were determined. The parameter values were obtained by means of calculations at B3LYP/6-31+G(d,p) level. The multivariate analysis of principal components (PCA) method was applied to 18 parameters to determine the pharmacophoric profile. PCA study was performed to reduce the sample space of properties and get the ones that are major AChEI components.

Paper Nr: 64
Title:

TOWARDS THE EVOLUTION OF LEGACY APPLICATIONS TO MULTICORE SYSTEMS - Experiences Parallelizing R

Authors:

Gonzalo Vera and Remo Suppi

Abstract: Current innovations in processor performance, focused to keep the growth rate of the last years, are mainly based on providing several processing units within the same chip. With new underlying multicore processors, traditional sequential applications have to be adapted with parallel programming techniques to take advantage of the new processing capabilities. There exists a great variety of libraries, middlewares, and frameworks to assist the parallelization of such applications. However, in many cases, specially with classical scientific applications, due to several limitations ranging from technical incompatibilities to simply lack of knowledge, this evolution cannot always be achieved. We here present our experiences providing an alternative for two situations where former contributions could not provide a satisfactory solution to our needs: adapting a mature non-thread-safe C coded application, the R language interpreter, and providing support for the automatic parallelization of R scripts in multicore systems.

Paper Nr: 74
Title:

SHANNON ENTROPY AND FRACTAL ANALYSIS FOR THE 16S RIBOSOMAL RNA AND COX2 MT-DNA SEQUENCES IN PRIMATES INCLUDING NEANDERTHAL

Authors:

N. Gadura, Todd Holden, G. Tremberger Jr, E. Cheung, P. Schneider, D. Lieberman and T. Cheung

Abstract: The primate mt-DNA 16S rRNA and COX2 sequences, including Neanderthal sequences, were studied using nucleotide frequency, mono- and di-nucleotide entropy, and fractal dimension. The fractal dimension was computed with the Higuchi method when a nucleotide sequence is expressed as a numerical sequence where each nucleotide is assigned its proton number. The results shows that the C+G percent correlates with the fractal dimension with R-square value of around 0.88 (N = 8) for both gene sequences. The Di- and mono-nucleotide entropy is also well correlated with similar R-square values. For the COX2 gene, the human and Neanderthal cluster at high entropy suggests that chimp, gorilla, and orangutan were subjected to a higher selection pressure for this gene. The human COX2 has less entropy than the Neanderthal COX2 consistent with the presence of some selection pressure.

Paper Nr: 81
Title:

METHYLMALONIC ACIDURIAS - mut0/mut- and cblC Defects in Portuguese Population

Authors:

Célia Nogueira, Marta Marques and Laura Vilarinho

Abstract: The methylmalonic acidurias (MMAs) are metabolic disorders resulting from deficient methylmalonyl-CoA mutase (MCM) activity, a vitamin B12-dependent enzyme that uses adenosylcobalamin (Ado-Cbl) as a cofactor. Several mutant genetic classes that cause MMA are known based on biochemical, enzymatic and genetic complementation analysis. The mut0/mut- defects result from deficiency of MCM, while the cblA, cblB and the variant 2 form of cblD complementation groups are linked to processes unique to Ado-Cbl synthesis. The cblC, cblD and cblF complementation groups are associated with defective methyl-cobalamin synthesis as well. Mutations in the genes associated with most of these defects have been described. In this study we investigate at molecular level four patients with mut0/mut- MMA phenotype and 19 Portuguese patients with cblC defect. We found four different mutations already described in the literature, in each MUT and MMACHC genes, respectively. Our data showed an evident difference in the prevalence of these two diseases, compared with other countries worldwide.