# BIOINFORMATICS 2017 Abstracts

Full Papers
Paper Nr: 4
Title:

### Efficient Analysis of Homeostasis of Gene Networks with Compositional Approach

Authors:

#### Sohei Ito, Kenji Osari, Shigeki Hagihara and Naoki Yonezaki

Abstract: Homeostasis is an important property of life. Thanks to this property, living organisms keep their cellular conditions within an acceptable range to function normally. To understand mechanisms of homeostasis and analyse it, the systems biology approach is indispensable. For this purpose, we proposed a qualitative approach to model gene regulatory networks with logical formulae and formulate the homeostasis in terms of a kind of logical property – called realisability of linear temporal logic. This concise formulation of homeostasis naturally yields the method for analysing homeostasis of gene networks using realisability checkers. However, the realisability problem is well-known for its high computational complexity – double-exponential in the size of a formula – and the applicability of this approach will be limited to small gene networks, since the size of formula increases as the network does. To overcome this limitation, we leverage a compositional method to check realisability in which a formula is divided into a few sub-formulae. The difficulty in compositional approach is that we do not know how we obtain a good division. To tackle this issue, we introduce a new clustering algorithm based on a characteristic function on formulae, which calculates the size of formulae and the variation of propositions. The experimental results show that our method gives a good division to benefit from the compositional method.

Paper Nr: 12
Title:

### Reconstruction of Mitochondrial Genotypes from Diverse next Generation Sequencing Datasets

Authors:

#### Peter Ulz, Michael R. Speicher and Gerhard G. Thallinger

Abstract: The exponential growth of sequence databases in recent years opens up a lot of possibilities for reanalysis of public datasets. Here, we reanalyzed sequencing data from various experimental procedures to reconstruct the mitochondrial genome from sequence data of human samples. In a first step eight human cell lines were used to validate the approach and to ensure consistent genotype information across different library preparation techniques. Subsequently, 19,337 sequencing datasets were downloaded and checked for single-nucleotide variants and insertion or deletion events. We show that the mitochondrial genome can be inferred from many different library preparation techniques. We also generated reference mitochondrial genomes for eight cell lines. This approach may be used for sample identification as well as a general approach to study the mitochondrial genome from public sequencing data.

Paper Nr: 14
Title:

### CNV-LDC: An Optimized CNV Detection Method for Low Depth of Coverage Data

Authors:

#### Ayyoub Salmi, Sara El Jadid, Ismail Jamail, Taoufik Bensellak, Romain Philippe, Veronique Blanquet and Ahmed Moussa

Abstract: Recent improvements in technologies showed much greater variance of our genome than we thought. A part of this variance is due to submicroscopic chromosomal deletions/duplications called Copy Number Variations (CNVs). For some of these CNVs, it was clearly demonstrated that they play an important role in disease susceptibility, including complex diseases and Mendelian diseases. Last advances in next-generation sequencing have made fast progress in analyzing data for CNVs, in so far as they promise to improve the sensitivity in detection. This has led to the development of several new bioinformatics approaches and algorithms for detecting CNVs from this data for the four common methods: Assembly Based, Split Read, Read-Paired mapping, and Read Depth. Here we focus on the RD method that is able to detect the exact number of CNVs in comparison with the other methods. We propose an alternative method for detecting CNVs from short sequencing reads, CNV-LDC (Copy Number Variation-Low Depth of Coverage), that complements the existing method named CNV-TV (Copy Number Variation-Total Variation). We optimize the signal modeling and threshold step to lift the performance in low depth of coverage. Results of this new approach have been compared to various recent methods on different simulated data using small and large CNVs.

Paper Nr: 18
Title:

### Protein Disorder Prediction using Information Theory Measures on the Distribution of the Dihedral Torsion Angles from Ramachandran Plots

Authors:

#### Jonny A. Uribe, Julián D. Arias-Londoño and Alexandre Perera-Lluna

Abstract: This paper addresses the problem of order/disorder prediction in protein sequences from alignment free methods. The proposed approach is based on a set of 11 information theory measures estimated from the distribution of the dihedral torsion angles in the amino acid chain. The aim is to characterize the energetically allowed regions for amino acids in the protein structures, as a way of measuring the rigidity/flexibility of every amino acid in the chain, and the effect of such rigidity on the disorder propensity. The features are estimated from empirical Ramachandran Plots obtained using the Protein Geometry Database. The proposed features are used in conjunction with well-established features in the state of the art for disorder prediction. The classification is performed using two different strategies: one based on conventional supervised methods and the other one based on structural learning. The performance is evaluated in terms of AUC (Area Under the ROC Curve), and three suitable performance metrics for unbalanced classification problems. The results show that the proposed scheme using conventional supervised methods is able to achieve results similar than well-known alignment free methods for disorder prediction. Moreover, the scheme based on structural learning outperforms the results obtained for all the methods evaluated, including three alignment-based methods.

Paper Nr: 20
Title:

### Automatic Feature Selection in the SOPFs Dissolution Profiles Prediction Problem

Authors:

#### J. E. Salazar Jiménez, J. D. Sánchez Carvajal, B. Quiros-Gómez and J. D. Arias-Londoño

Abstract: This work addressed the problem of dimensionality reduction in the drug dissolution profile prediction task. The learning problem is assumed as a multi-output learning task, since dissolution profiles are recorded in non-uniform sampling times, which avoid the use of basic function-on-scalar regression approaches. Ensemblebased tree methods are used for prediction, and also for the selection of the most relevant features, because they are able to deal with high dimensional feature spaces, when the number of training samples is small. All the drugs considered corresponds to rapid release solid oral pharmaceutical forms. Six different feature selection schemes were tested, including sequential feature selection and genetic algorithms, along with a feature scoring procedure, which was proposed in order to get a consensus about the best subset of variables. The performance was evaluated in terms of the similitude factor used in the drug industry for dissolution profile comparison. The feature selection methods were able to reduce the dimensionality of the feature space in 79.2%, without loss in the performance of the prediction system. The results confirm that in the dissolution profile prediction problem, especially for different solid oral pharmaceutical forms, variables from different components and phases of the drug development must be considered.

Paper Nr: 22
Title:

### SKraken: Fast and Sensitive Classification of Short Metagenomic Reads based on Filtering Uninformative k-mers

Authors:

#### Davide Marchiori and Matteo Comin

Abstract: The study of microbial communities is an emerging field that is revolutionizing many disciplines from ecology to medicine. The major problem when analyzing a metagenomic sample is to taxonomic annotate its reads in order to identify the species in the sample and their relative abundance. Many tools have been developed in the recent years, however the performance in terms of precision and speed are not always adequate for these very large datasets. In this work we present SKraken an efficient approach to accurately classify metagenomic reads against a set of reference genomes, e.g. the NCBI/RefSeq database. SKraken is based on k-mers statistics combined with the taxonomic tree. Given a set of target genomes SKraken is able to detect the most representative k-mers for each species, filtering out uninformative k-mers. The classification performance on several synthetic and real metagenomics datasets shows that SKraken achieves in most cases the best performances in terms of precision and recall w.r.t. Kraken. In particular, at species level classification, the estimation of the abundance ratios improves by 6% and the precision by 8%. This behavior is confirmed also on a real stool metagenomic sample where SKraken is able to detect species with high precision. Because of the efficient filtering of uninformative $k$-mers, SKraken requires less RAM and it is faster than Kraken, one of the fastest tool. Availability: https://bitbucket.org/marchiori_dev/skraken Corresponding Author: comin@dei.unipd.it

Paper Nr: 23
Title:

### Complementary Domain Prioritization: A Method to Improve Biologically Relevant Detection in Multi-Omic Data Sets

Authors:

#### Benjamin A. Neely and Paul E. Anderson

Abstract: As the speed and quality of different analytical platforms increase, it is more common to collect data across multiple biological domains in parallel (\textit{i.e.}, genomics, transcriptomics, proteomics, and metabolomics). There is a growing interest in algorithms and tools that leverage heterogeneous data streams in a meaningful way. Since these domains are typically non-linearly related, we evaluated whether results from one domain could be used to prioritize another domain to increase the power of detection, maintain type 1 error, and highlight biologically relevant changes in the secondary domain. To perform this feature prioritization, we developed a methodology called Complementary Domain Prioritization that utilizes the underpinning biology to relate complementary domains. Herein, we evaluate how proteomic data can guide transcriptomic differential expression analysis by analyzing two published colorectal cancer proteotranscriptomic data sets. The proposed strategy improved detection of cancer-related genes compared to standard permutation invariant filtering approaches and did not increase type I error. Moreover, this approach detected differentially expressed genes that would not have been detected using filtering alone while also highlighted pathways that might have otherwise been overlooked. These results demonstrate how this strategy can effectively prioritize transcriptomic data and drive new hypotheses, though subsequent validation studies are still required.

Paper Nr: 29
Title:

### Prediction of Essential Genes based on Machine Learning and Information Theoretic Features

Authors:

#### Dawit Nigatu and Werner Henkel

Abstract: Computational tools have enabled a relatively simple prediction of essential genes (EGs), which would otherwise be done by costly and tedious gene knockout experimental procedures. We present a machine learning based predictor using information-theoretic features derived exclusively from DNA sequences. We used entropy, mutual information, conditional mutual information, and Markov chain models as features. We employed a support vector machine (SVM) classifier and predicted the EGs in 15 prokaryotic genomes. A fivefold cross-validation on the bacteria E. coli, B. subtilis, and M. pulmonis resulted in AUC score of 0.85, 0.81, and 0.89, respectively. In cross-organism prediction, the EGs of a given bacterium are predicted using a model trained on the rest of the bacteria. AUC scores ranging from 0.66 to 0.9 and averaging 0.8 were obtained. The average AUC of the classifier on a one-to-one prediction among E. coli, B. subtilis, and Acinetobacter is 0.85. The performance of our predictor is comparable with recent and state-of-the art predictors. Considering that we used only sequence information on a problem that is much more complicated, the achieved results are very good.

Paper Nr: 30
Title:

### A Qualitative Framework Dedicated to Toxicology

Authors:

#### Benjamin Miraglio, Gilles Bernot, Jean-Paul Comet and Christine Risso-de Faverney

Abstract: Emerging constraints have led the toxicology community to complete the classical paradigm of toxicology with the study of molecular events underlying the toxicity of a chemical substance. This evolution motivates the emergence of new modelling approaches for toxicology. In this article, we introduce a qualitative rulebased formalism dedicated to the domain of toxicology. This new formalism departs from other rule-based formalisms such as BioChAM because it directly encodes possible alterations of equilibrium, instead of making equilibriums emerge from the dynamics of the model. Using a simple example of the energy metabolism, we show that this formalism is able to describe both the normal evolution of a biological system and its possible toxic disruptions.

Paper Nr: 46
Title:

### Graph-based Analysis of Genetic Features Associated with Mobile Elements in Crohn’s Disease and Healthy Gut Microbiomes

Authors:

#### Julia Warnke-Sommer and Hesham Ali

Abstract: Horizontal gene transfer is a major driver of bacterial evolution and adaptation to niche environments. This holds true for the complex microbiome of the human gut. Crohn’s disease is a debilitating condition characterized by inflammation and gut bacteria dysbiosis. In previous research, we analyzed transposase associated antibiotic resistance genes in Crohn’s disease and healthy gut microbiome metagenomics data sets using a graph mining approach. Results demonstrated that there were significant differences in the type and bacterial distribution of transposase-associated antibiotic resistance genes in the Crohn’s and healthy data sets. In this paper, we extend the previous research by considering all gene features associated with transposase sequences in the Crohn’s disease and healthy data sets. Results demonstrate that some transposase-associated features are more prevalent in Crohn’s disease data sets than healthy data sets. This study may provide insights into the adaptation of bacteria to gut conditions such as Crohn’s disease.

Short Papers
Paper Nr: 17
Title:

### Modeling of Cardiac Component of Subarachnoid Space Changes in Apnoea Resulting as a Function of Blood Pressure and Blood Flow Parameters - Two Mechanizm of Regulation

Authors:

#### Kamila Mazur, Renata Kalicka, Andrzej F. Frydrychowski and Pawel J. Winklewski

Abstract: Experiments were performed in a group of 19 healthy, non-smoking volunteers. The experiment consisted of three apnoeas, sequentially: 30 s apnoea, 60 s apnoea and maximal, that could be done, apnoea. The breath-hold was separated for 5 minutes rest. The following parameters were measured and obtained for further analysis: blood parameters, artery diameter of the internal carotid artery, end-tidal CO2 in expired air, the cardiac (from 0.5 to 5.0 Hz) and slow (< 0.5 Hz) components of subarachnoid space width signal. As a result of the experiment, we observed two different reactions, using the same experimental procedure. It seemed to indicate two different operating modes and two separate models. As a consequence, there are two subsets of slow subarachnoid space width responses to breath-hold in humans. A positive subarachnoid space width changes (slow) component depends on changes in heart rate, pulsatility index and cerebral blood flow velocity. A negative subarachnoid space width changes component is driven by heart rate changes and pulsatility index changes. The different heart-generated arterial pulsation response to experimental breath-hold provides new insights into our understanding of the complex mechanisms governing the adaptation to apnoea in humans. We propose a mathematical methodology that can be used in further clinical research.

Paper Nr: 28
Title:

### Splice Site Prediction: Transferring Knowledge Across Organisms

Authors:

#### Simos Kazantzidis, Anastasia Krithara and George Paliouras

Abstract: As more genomes are sequenced, there is an increasing need for automated gene prediction. One of the subproblems of the gene prediction, is the splice sites recognition. In eukaryotic genes, splice sites mark the boundaries between exons and introns. Even though, there are organisms which are well studied and their splice sites are known, there are plenty others which have not been studied well enough. In this work, we propose two transfer learning approaches for the splice site recognition problem, which take into account the knowledge we have from the well-studied organisms. We use different representations for the sequences such as the n-gram graph representation and a representation based on biological motifs. Furthermore, we study the case where more than one organisms are available for training and we incorporate information from the phylogenetic analysis between organisms. An extensive evaluation has taken place. The results indicate that the proposed representations and approaches are very promising.

Paper Nr: 33
Title:

### Consensus Clustering for Cancer Gene Expression Data - Large-Scale Analysis using Evidence Accumulation Approach

Authors:

#### Isidora Šašić, Sanja Brdar, Tatjana Lončar-Turukalo, Helena Aidos and Ana Fred

Abstract: Clustering algorithms are extensively used on patient tissue samples in order to group and visualize the microarray data. The high dimensionality and probe specific noise make the selection of the appropriate clustering algorithm an uneasy task. This study presents a large-scale analysis of three clustering algorithms: k-means, hierarchical clustering (HC) and evidence accumulation clustering (EAC) on thirty-five cancer gene expression data sets selected to benchmark the performance of the clustering algorithms. Separated performance analysis was done on data sets from Affymetrix and cDNA chip platforms to examine the possible influence of the microarray technology. The study revealed no consistent algorithm ranking can be inferred, though in general EAC presented the best compromise of adjusted rand index (ARI) and variance. However, the results indicated that ARI variance under repeated k-means initializations offers useful information on the need to implement more complex clustering techniques. If repeated K-means converges to the same partition, also confirmed by the HC clustering, there is no need to run EAC. However, under moderate or highly variable ARI in repeated K-means, EAC should be used to reduce the uncertainty of clustering and unveil the data structure.

Paper Nr: 37
Title:

### A Branch and Bound for the Large Live Parsimony Problem

Authors:

#### Rogério Güths, Guilherme P. Telles, Maria Emilia M. T. Walter and Nalvo Almeida

Abstract: In the character-based phylogeny reconstruction for n objects and m characters, the input is an nm-matrix such that position i; j keeps the state of character j for the object i and the output is a binary rooted tree, where the input objects are represented as leaves and each node v is labeled with a string of m symbols v1 : : :vm, v j representing the state of character j, with minimal number of state changes along the edges of the tree, considering all characters. This is called the Large Parsimony Problem. Live Phylogeny theory generalizes the phylogeny theory by admitting living ancestors among the taxonomic objects. This theory suits cases of fast-evolving species like virus, and phylogenies of non-biological objects like documents, images and database records. In this paper we analyze problems related to most parsimonious tree using Live Phylogeny. We introduce the Large Live Parsimony Problem (LLPP), prove that it is NP-complete and provide a branch and bound solution. We also introduce and solve a simpler version, Small Live Parsimony Problem (SLPP), which is used in the branch and bound.

Paper Nr: 42
Title:

### Homozygosity Mapping using Whole-Exome Sequencing: A Valuable Approach for Pathogenic Variant Identification in Genetic Diseases

Authors:

#### Jorge Oliveira, Rute Pereira, Rosário Santos and Mário Sousa

Abstract: In the human genome, there are homozygous regions presenting as sizeable stretches, or ‘runs’ of homozygosity (ROH). The length of these ROH is dependent on the degree of shared parental ancestry, being longer in individuals descending from consanguineous marriages or those from isolated populations. Homozygosity mapping is a powerful tool in clinical genetics. It relies on the assumption that, due to identity-by-descent, individuals affected by a recessive disease are likely to have homozygous markers surrounding the disease locus. Consequently, the analysis of ROH shared by affected individuals in the same kindred often helps to identify the disease-causing gene. However, scanning the entire genome for blocks of homozygosity, especially in sporadic cases, is not a straight-forward task. Whole-exome sequencing (WES) has been shown to be an effective approach for finding pathogenic variants, particularly in highly heterogeneous genetic diseases. Nevertheless, the huge amount of data, especially variants of unknown clinical significance, and the presence of false-positives due to sequencing artifacts, makes WES analysis complex. This paper briefly reviews the different algorithms and bioinformatics tools available for ROH identification. We emphasize the importance of performing ROH analysis using WES data as an effective way to improve diagnostic yield.

Paper Nr: 43
Title:

### How to Disassemble a Virus Capsid - A Computational Approach

Authors:

#### Claudio Alexandre Piedade, António E. N. Ferreira and Carlos Cordeiro

Abstract: In contrast with the assembly process of virus particles, which has been the focus of many experimental and theoretical studies, the disassembly of virus protein capsids, a key event during infection, has generally been overlooked. Although the nature of the intracellular triggers that promote subunit disassembly may be diverse, here we postulate that the order of subunit removal is mainly determined by each virus structural geometry and the strength of subunit interactions. Following this assumption, we modelled the early stages of virus disassembly of T =1 icosahedral viruses, predicting the sequence of removal of up to five subunits in a sample of 51 structures. We used combinatorics and geometry, to find non-geometrically identical capsid fragments and estimated their energy by three different heuristics based on the number of weak inter-subunit contacts. We found a main disassembly pathway common to a large group of viruses consisting of the removal of a triangular trimer. Densoviruses lose a square-shaped tetramer while Human Adenoviruses lose a pentagonshaped pentamer. Results were virtually independent of the heuristic measure used. These findings suggest that particular subunit interactions might be an important target for novel antiviral drugs designed to interfere with capsid disassembly.

Posters
Paper Nr: 5
Title:

### Ensemble Learning-based Prediction of Drug-pathway Interactions based on Features Integration

Authors:

#### Mingyuan Xin, Jun Fan and Zhenran Jiang

Abstract: Recently, developing computational methods to explore drug-pathway interaction relationships has attracted attention for their potentiality in discovering unknown targets and mechanisms of drug actions. However, mining suitable features of drugs and pathways is challenging for available prediction methods. This paper performed an ensemble learning-based method to predict potential drug-pathway interactions by integrating different drug-based and pathway-based features. The main characteristic of our method lies in using the Relief algorithm for feature selection and regarding three ensemble methods (AdaBoost, Bagging and Random Subspace) for classifiers. Cross validation results showed the AdaBoost algorithm that based on the Decision Tree classifier can obtain a higher prediction accuracy, which indicated the effectiveness of ensemble learning. Moreover, some new predicted interactions were validated by database searching, which demonstrated its potentiality for further biological experiment investigation.

Paper Nr: 7
Title:

### Search of Periodicity Regions in the Genome A.thaliana - Periodicity Regions in the A.thaliana Genomes

Authors:

#### E. V. Korotkov, F. E. Frenkel and M. A. Korotkova

Abstract: A mathematical method was developed in this study to determine tandem repeats in a DNA sequence. A multiple alignment of periods was calculated by direct optimization of the position-weight matrix (PWM) without using pairwise alignments or searching for similarity between periods. Random PWMs were used to develop a new mathematical algorithm for periodicity search. The developed algorithm was applied to analyze the DNA sequences of A.thaliana genome. 13997 regions having a periodicity with length of 2 to 50 bases were found. The average distance between regions with periodicity is ~9000 nucleotides. A significant portion of the revealed regions have periods consisting of 2 nucleotide, 10-11 nucleotides and periods in the vicinity of 30 nucleotides. No more than ~30% of the periods found were discovered early. The sequences found were collected in a data bank from the website: http://victoria.biengi.ac.ru/cgi-in/indelper/index.cgi. This study discussed the origin of periodicity with insertions and deletions.

Paper Nr: 16
Title:

### Distinguishing between MicroRNA Targets from Diverse Species using Sequence Motifs and K-mers

Authors:

#### Malik Yousef, Waleed Khalifa, İlhan Erkin Acar and Jens Allmer

Abstract: A disease phenotype is often due to dysregulation of gene expression. Post-translational regulation of protein abundance by microRNAs (miRNAs) is, therefore, of high importance in, for example, cancer studies. MicroRNAs provide a complementary sequence to their target messenger RNA (mRNA) as part of a complex molecular machinery. Known miRNAs and targets are listed in miRTarBase for a variety of organisms. The experimental detection of such pairs is convoluted and, therefore, their computational detection is desired which is complicated by missing negative data. For machine learning, many features for parameterization of the miRNA targets are available and k-mers and sequence motifs have previously been used. Unrelated organisms like intracellular pathogens and their hosts may communicate via miRNAs and, therefore, we investigated whether miRNA targets from one species can be differentiated from miRNA targets of another. To achieve this end, we employed target information of one species as positive and the other as negative training and testing data. Models of species with higher evolutionary distance generally achieved better results of up to 97% average accuracy (mouse versus \textit{Caenorhabditis elegans}) while more closely related species did not lead to successful models (human versus mouse; 60%). In the future, when more targeting data becomes available, models can be established which will be able to more precisely determine miRNA targets in hostpathogen systems using this approach.

Paper Nr: 19
Title:

### Performance Analysis of Spatial Laser Speckle Contrast Implementations

Authors:

#### Pedro G. Vaz, Anne Humeau-Heurtier, Edite Figueiras, Carlos Correia and João Cardoso

Abstract: This work presents an analysis of the performances for four different implementations used to compute laser speckle contrast on images. Laser speckle contrast is a widely used imaging technique for biomedical applications. These implementations were tested using synthetic laser speckle patterns with different resolutions, speckle sizes, and contrasts. From the applied methods, three implementations are already known in the literature. A new alternative is proposed herein, which relies on two-dimensional convolutions, in order to improve the image processing time without compromising the contrast assessment. The proposed implementation achieves a processing time two orders of magnitude lower than the analytical evaluation. The goal of this technical manuscript is to help the developers and researchers in computing laser speckle contrast images.

Paper Nr: 26
Title:

### DEACT: An Online Tool for Analysing Complementary RNA-Seq Studies - A Case Study of Knockdown and Upregulated FLI1 in Breast Cancer Cells

Authors:

#### Katherine Duchinski, Margaret Antonio, Dennis Watson and Paul Anderson

Abstract: Understanding the genetic basis of disease may lead to the development of life-saving diagnostics and therapeutics. RNA-sequencing (RNA-seq) gives a snapshot of cellular processes via high-throughput transcriptome sequencing. Meta-analysis of multiple RNA-Seq experiments has the potential to (a) elucidate gene function under different conditions and (b) compare results in replicate experiments. To simplify such meta-analyses, we created the Dataset Exploration And Curation Tool (DEACT), an interactive, user-friendly web application. DEACT allows users to (1) interactively visualize RNA-Seq data, (2) select genes of interest through the user interface, and (3) download subsets for downstream analyses. We tested DEACT using two complementary RNA-seq studies resulting from knockdown and gain-of-function FLI1 in an aggressive breast cancer cell line. We performed fixed gene-set enrichment analysis on four subsets of genes selected through DEACT. Each subset implicated different metabolic pathways, demonstrating the power of DEACT in driving downstream analysis of complementary RNA-Seq studies.

Paper Nr: 31
Title:

### Comparative Study on Data Mining Techniques Applied to Breast Cancer Gene Expression Profiles

Authors:

#### Sérgio Mosquim Júnior and Juliana de Oliveira

Abstract: Breast cancer has the second highest incidence among all cancer types and is the fifth cause of cancer related death among women. In Brazil, breast cancer mortality rates have been rising. Cancer classification is intricate, mainly when differentiating subtypes. In this context, data mining becomes a fundamental tool to analyze genotypic data, improving diagnostics, treatment and patient care. As the data dimensionality is problematic, methods to reduce it must be applied. Hence, the present study aims at the analysis of two data mining methods (i.e., decision trees and artificial neural networks). Weka® and MATLAB® were used to implement these two methodologies. Decision trees appointed important genes for the classification. Optimal artificial neural network architecture consists of two layers, one with 99 neurons and the other with 5. Both data mining techniques were able to classify data with high accuracy.

Paper Nr: 38
Title:

### On Robust Reachability of Input/State Switched Asynchronous Sequential Machines

Authors:

#### Seong Woo Kwak and Jung–Min Yang

Abstract: Switched asynchronous sequential machines are composite systems consisting of a number of single asynchronous machines, or submachines, and a rule that orchestrates switching operations between submachines. In this paper, we investigate robust reachability of switched asynchronous machines. If each submachine has equivalent state space with one another, it can be used in fault recovery against any unauthorized state transition caused by transient faults. The robust reachability of switched asynchronous machines is addressed in terms of simple matrix expressions. The use of robust reachability in fault-tolerant corrective control is also outlined.

Paper Nr: 39
Title:

### Distance-based Live Phylogeny

Authors:

#### Graziela S. Araújo, Guilherme P. Telles, Maria Emília M. T. Walter and Nalvo F. Almeida

Abstract: The Distance-Based Live Phylogeny Problem generalizes the well-known Distance-Based Phylogeny Problem by admitting live ancestors among the taxonomic objects. This problem suites in cases of fast-evolving species that co-exist and are ancestors/descendants at the same time, like viruses, and non-biological objects like documents, images and database records. For n objects, the input is an nn-matrix where position i; j represents the evolutionary distance between the objects i; j. Output is an unrooted, weighted tree where the objects may be represented either as leaves or as internal nodes, and the distances between pairs of objects in the tree are equal to the distances in the corresponding positions in the matrix. When the matrix is additive, it is easy to find such a tree. In this work we prove that the problem of minimizing the residual differences between path-lengths along the tree and pairwise distances in the matrix is computationally hard when the matrix is not additive. We propose a heuristic, called Live-NJ, to solve the problem that reconstructs the evolutionary history based on the well-known Neighbor-Joining algorithm. Results shown that Live-NJ performs better when compared to NJ, being a promising approach to solve the Distance-Based Live Phylogeny Problem.

Paper Nr: 41
Title:

### A Web- and Cloud- based Service for the Clinical Use of a CAD (Computer Aided Detection) System - Automated Detection of Lung Nodules in Thoracic CTs (Computed Tomographies)

Authors:

#### M. E. Fantacci, A. Traverso, S. Bagnasco, C. Bracco, D. Campanella, G. Chiara, E. Lopez Torres, A. Manca, D. Regge, M. Saletta, M. Stasi, S. Vallero, L. Vassallo and P. Cerello

Abstract: M5L, a Web-based Computer-Aided Detection (CAD) system to automatically detect lung nodules in thoracic Computed Tomographies, is based on a multi-thread analysis by independent subsystems and the combination of their results. The validation on 1043 scans of 3 independent data-sets showed consistency across data-sets, with a sensitivity of about 80% in the 4-8 range of False Positives per scan, despite varying acquisition and reconstruction parameters and annotation criteria. To make M5L CAD available to users without hardware or software new installations and configuration, a Software as a Service (SaaS) approach was adopted. A web front-end handles the work (image upload, results notification and direct on-line annotation by radiologists) and the communication with the OpenNebula-based cloud infrastructure, that allocates virtual computing and storage resources. The exams uploaded through the web interface are anonymised and analysis is performed in an isolated and independent cloud environment. The average processing time for case is about 20 minutes and up to 14 cases can be processed in parallel. Preliminary results on the on-going clinical validation shows that the M5L CAD adds 20% more nodules originally overlooked by radiologists, allowing a remarkable increase of the overall detection sensitivity.