HPCLatAm 2013

KEYNOTES

Hierarchical N-body algorithms for the Exascale era

AUTHOR: Prof. Lorena Barba (Boston University, USA)

SCHEDULE: Monday, July 29, 9:00

ABSTRACT: to be completed

SHORT BIO: to be completed
Download full presentation
HPC in the Cloud Computing era: challenges, models and tools

AUTHOR: Prof. Pascal Bouvry (Computer Science and Communications Research Unit, University of Luxembourg, Luxembourg)

SCHEDULE: Monday, July 29, 10:00

ABSTRACT: Recently the paradigm of cloud computing emerged as a new way to present e-services. Newest offers claim that infrastructure-as-a-service approach can also be used for providing High Performance Computing Services. This presentation investigates the new opportunity, looking at benchmarking the current generation of hypervisors, and exploring the need of new tools including simulators and computational models. In particular an open-source cloud simulator called Greencloud and a communication-aware Direct Acyclic Graph model are presented.

SHORT BIO: Pascal Bouvry earned his undergraduate degree in Economical & Social Sciences and his Master degree in Computer Science with distinction ('91) from the University of Namur, Belgium. He went on to obtain his Ph.D. degree ('94) in Computer Science with great distinction at the University of Grenoble (INPG), France. His research at the IMAG laboratory focused on Mapping and scheduling task graphs onto Distributed Memory Parallel Computers. Next, he performed post-doctoral research on coordination languages and multi-agent evolutionary computing at CWI in Amsterdam.
Dr Bouvry gained industrial experience as manager of the technology consultant team for FICS (SONE) a world leader in electronic financial services. Next, he worked as CEO and CTO of SDC, a Saigon-based joint venture between SPT (a major telecom operator in Vietnam), Spacebel SA (a Belgian leader in Space, GIS and Healthcare), and IOIT, a public research and training center. After that, Dr Bouvry moved to Montreal as VP Production of Lat45 and Development Director for MetaSolv Software (ORCL), a world-leader in Operation Support Systems for the telecom industry (e.g. AT&T, Worldcom, Bell Canada, etc).
Dr. Bouvry is currently serving as Professor in the (CSC) research unit of the Faculty of Sciences, Technology of Luxembourg University. Pascal Bouvry is also faculty of the Interdisciplinary Center of Security, Reliability and active in various scientific committees and technical workgroups (IEEE CIS Cloud Computing vice-chair, IEEE TCSC GreenIT steering committee, ERCIM WG,ANR, COST TIST, etc.)
Download full presentation
GPUs para la Computación de Alto Desempeño: Logros y Perspectivas Futuras

AUTHOR: Prof. Manuel Ujaldón (Universidad de Málaga, Spain)

SCHEDULE: Thursday, July 30, 8:45

ABSTRACT: to be completed

SHORT BIO: to be completed
Cloud Computing: Helping Humanity to reach the next Final Frontier

AUTHOR: José Luis Vázquez Poletti (Universidad Complutense de Madrid, Spain)

SCHEDULE: Tuesday, July 30, 9:45

ABSTRACT: As another tool used by Humanity for expanding its frontiers, Cloud Computing was born and evolved in consonance with the different challenges in which has been applied. Thanks to its seamless computing resource provision model, dynamism and elasticity, this paradigm has been brought into the spotlight by the Space scientific community and in particular that devoted to the exploration of Planet Mars. This is the case of Space Agencies in need of great amounts of on demand computing resources and with a budget to take care of. The Red Planet represents the next limit to be reached by Humanity, attracting the attention of many countries as a destination for the next generation manned spaceflights. However, theres is still much research to do on Planet Mars and many computational needs to fulfill. This talk will review the cloud computing approach by NASA and then it will focus on the Mars MetNet Mission, with which the speaker is actively collaborating. This Mission is being put together by Finland, Russia and Spain, and aims to deploy several tens of weather stations on the Martian surface. The Atmospheric Science research is a crucial area in the exploration of the Red Planet and represents a great opportunity for harnessing and improving current computing tools, and establish interesting collaborations between countries.

SHORT BIO: Dr. José Luis Vázquez-Poletti is an Assistant Professor in Computer Architecture at Complutense University of Madrid (UCM, Spain), and a Cloud Computing Researcher at the Distributed Systems Architecture Research Group. He is (and has been) directly involved in EU funded projects, such as EGEE (Grid Computing) and 4CaaSt (PaaS Cloud), as well as many Spanish national initiatives. From 2005 to 2009 his research focused in application porting onto Grid Computing infrastructures, activity that let him be "where the real action was". These applications pertained to a wide range of areas, from Fusion Physics to Bioinformatics. During this period he achieved the abilities needed for profiling applications and making them benefit of distributed computing infrastructures. Additionally, he shared these abilities in many training events organized within the EGEE Project and similar initiatives. Since 2010 his research interests lie in different aspects of Cloud Computing, but always having real life applications in mind, specially those pertaining to the High Performance Computing domain.
Website: http://dsa-research.org/jlvazquez
Linkedin: http://www.linkedin.com/in/jlvazquezpoletti
Twitter: @jlvazpol
Download full presentation (29MB)
HPC for Biomedical Research

AUTHOR: Mariano Vázquez (Barcelona Supercomputing Center, Spain)

SCHEDULE: Tuesday, July 30, 15:00

ABSTRACT: Biomechanics is the application of mechanical principles to living organisms. This includes bioengineering, the research and analysis of the mechanics of living organisms and the application of engineering principles to and from biological systems. This research and analysis can be carried forth on multiple levels, from the molecular, wherein biomaterials such as collagen and elastin are considered, all the way up to the tissue and organ level. Some simple applications of Newtonian mechanics can supply correct approximations on each level, but a fair balance between precise details and large scales demand the use of continuum mechanics.
Computational Biomechanics problems are so overwhelmingly complex that we cannot expect significant advances in the field until efficient simulation tools are running in parallel machines with at least hundreds of processors and prepared to run in hundreds of thousands.
This talk addresses the creation of a simulation platform for Biomechanical problems based on HPC techniques, which can be used in supercomputing architectures to deal with complex biomedical problems, most of all at organ level. It is based on the simulation code Alya System, which is already programmed as an efficient tool for parallel environments. This platform is adapted, firstly, to the wide variety of specific problems found when modelling Biophysical systems, thanks to the participation of Bio-Engineers, Physiologists and Medical Doctors in the project. Secondly, it is continuously ported to the present and future architectures forming the supercomputing ecosystem, thanks to the expertise provided by HPCM researchers.

SHORT BIO: Mariano Vázquez es Jefe del grupo "High Performance Computational Mechanics" en Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) de España. Su grupo desarrolla herramientas de simulación para grandes ordenadores desde la formulación básica numérica hasta la escritura de programas paralelos. Junto con Guillaume Houzeaux, son los principales desarrolladores de Alya, un código de mecánica computacional basado en técnicas de HPC. Alya resuelve fluidos, sólidos, electromagnetismo, cuerpos en colisión, flujo térmico, radiación o combustión. En la actualidad, en Alya trabajan unos 30 investigadores, liderados por Mariano Vázquez y Guillaume Houzeaux. Sus principales líneas de investigación son métodos numéricos para flujo compresible, estabilización de ecuaciones hiperbólicas, mecánica de sólidos en grandes deformaciones, interacción fluido estructura, programación paralela. Sus principales líneas de aplicación son Aeronáutica, Biomecánica y Física de la Atmósfera. Licenciado en Ciencias Físicas por la Universidad de Buenos Aires, Argentina, en 1993. Doctor en Ciencias Físicas por la Universitat Politécnica de Catalunya (UPC), España, en 1999. Ha realizado estancias post-doctorales en el Pole Scientifique Univ. Paris VI / Dassault Aviation (en multigrid para flujo turbulento compresible e incompresible, financiado con una beca Marie Curie de la EC) y en INRIA Sophia Antipolis (optimización de formas por el método del adjunto), ambos en Francia, durante 3 años. Fue consultor de la empresa Gridsystems (grid computing) en Palma de Mallorca (España) y profesor lector en la Universitat de Girona (España).
Download full presentation

return to main page

FEATURED TALKS

LarTop50 results

AUTHOR: Dra. Marcela Printista (Universidad Nacional de San Luis, Argentina)

SCHEDULE: Monday, July 29, 15:30

ABSTRACT: This presentation has the purpose to present the first list with the statistics of the fastest computer systems of Latin America, based on the performance measured by the Linpack system. The LARTop50 goal is to collect and share information about the status of supercomputing in the region. The collected information can be very useful for the regional scientific community and the productive sector specialized in this field.

SHORT BIO: Alicia Marcela Printista es Doctora en Ciencias de la Computación (Universidad Nacional de San Luis, Argentina), 2004. Magister en Ciencias de la Computación (Universidad Nacional de San Luis, Argentina), 2001. Licenciada en Ciencias de la Computación (Universidad Nacional de San Luis, Argentina), 1992. Ha intervenido en proyectos en relación a las áreas de modelos de programación paralela, modelos de predicción paralela, motores de búsqueda en la web y simulación de alta performance. Actualmente es Profesora Adjunta dedicación exclusiva efectiva de la carrera de Licenciatura en Ciencias de la Computación (Universidad Nacional de San Luis, Argentina). Tiene la categoría 1 en el programa de incentivos para docentes investigadores (SPU, Argentina). Docente de los posgrados Maestría y Doctorado en Cs de la Computación y Mg. en Ingeniería del Software. Vice-Decana de la Facultad de Ciencias Físico Matemáticas y Naturales de la UNSL. Período 2007-2010 y 2010-2013. Co-Directora de la Maestría en Ciencias de la Computación.
Download full presentation
SNCAD Overview and Activities

AUTHORS: Dr. Carlos García Garino (ITIC and Facultad de Ingeniería - Universidad Nacional de Cuyo. Mendoza, Argentina) and Dra. Marcela Printista (Universidad Nacional de San Luis. San Luis, Argentina)

SCHEDULE: Monday, July 29, 16:20

return to main page

EDITORS

A. Marcela Printista, Facultad de Ciencias Físico Matemática y Naturales, Universidad Nacional de San Luis. Ejército de los Andes 950. Tel: +54 (0266) 4520300. San Luis (D5700HHW). Argentina.
Carlos García Garino, Facultad de Ingeniería, Universidad Nacional de Cuyo (UNCuyo)/ITIC, Centro Universitario. Tel: +54 (0261) 4135000. Mendoza (5500). Argentina.

FULL PAPER PRESENTATIONS

Paper Session I: Evolutionary Computation and Scheduling

Dynamic Scheduling Based on Particle Swarm Optimization for Cloud-based Scientific Experiments

PRESENTER: Elina Pacini

SCHEDULE: Monday, July 29, 11:30

AUTHORS: Elina Pacini (ITIC and Facultad de Ingeniería - Universidad Nacional de Cuyo. Mendoza, Argentina), Cristian Mateos (ISISTAN-CONICET - UNICEN University. Tandil, Buenos Aires, Argentina) and Carlos García Garino (ITIC and Facultad de Ingeniería - Universidad Nacional de Cuyo. Mendoza, Argentina)

ABSTRACT: Parameter Sweep Experiments (PSEs) allow scientists to perform simulations by running the same code with different input data, which results in many CPU-intensive jobs and thus computing environments such as Clouds must be used. Our goal is to study private Clouds to execute scientific experiments coming from multiple users, i.e., our work focuses on the Infrastructure as a Service (IaaS) model where custom Virtual Machines (VM) are launched in appropriate hosts available in a Cloud. Then, correctly scheduling Cloud hosts is very important and it is necessary to develop efficient scheduling strategies to appropriately allocate VMs to physical resources. Here, scheduling is however challenging due to its inherent NP-completeness. We describe and evaluate a Cloud scheduler based on Particle Swarm Optimization (PSO). The main performance metrics to study are the number of Cloud users that the scheduler is able to successfully serve, and the total number of created VMs, in online (non-batch) scheduling scenarios. Besides, the number of intra-Cloud network messages sent are evaluated. Simulated experiments performed using CloudSim and a job data from real scientific problems show that our scheduler succeeds in balancing the studied metrics compared to schedulers based on Random assignment and Genetic Algorithms.

Downloads: paper and presentation.
Optimizing Small-World Properties in VANETs with a Parallel Multi-Objective Coevolutionary Algorithm

PRESENTER: Grégoire Danoy

SCHEDULE: Monday, July 29, 11:50

AUTHORS: Grégoire Danoy, Julien Schleich (Computer Science and Communications Research Unit, University of Luxembourg, Luxembourg), Bernabé Dorronsoro (Laboratoire d'Informatique Fondamentale de Lille, University of Lille, France) and Pascal Bouvry (Computer Science and Communications Research Unit, University of Luxembourg, Luxembourg)

ABSTRACT: Abstract. Cooperative coevolutionary evolutionary algorithms differ from standard evolutionary algorithms architecture in that the population is split into subpopulations, each of them optimizing only a subvector of the global solution vector. All subpopulations cooperate by broadcasting their local partial solutions such that each subpopulation can evaluate complete solutions. Cooperative coevolution has recently been used in evolutionary multi-objective optimization, but few works have exploited its parallelization capabilities or tackled real-world problems. This article proposes to apply for the first time a state-of-the-art parallel asynchronous cooperative coevolutionary variant of the non-dominated sorting genetic algorithm II (NSGA-II), named CCNSGA-II, on the injection network problem in vehicular ad hoc networks (VANETs). This multi-objective optimization problem, consists in finding the minimal set of nodes with backend connectivity, referred to as injection points, to constitute a fully connected overlay that will optimize the small-world properties of the resulting network. Recently, the well-known NSGA-II algorithm was used to tackle this problem on realistic instances in the city-center of Luxembourg. In this work we compare the performance of the CCNSGA-II to the original NSGA-II in terms of both quality of the obtained Pareto front approximations and execution time speedup.

Downloads: paper and presentation.
Two Models for Parallel Differential Evolution

PRESENTER: María Laura Tardivo

SCHEDULE: Monday, July 29, 12:10

AUTHORS: María Laura Tardivo (Departamento de Computación, Universidad Nacional de Río Cuarto, Córdoba. Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza, Universidad Tecnológica Nacional, Mendoza. Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)), Paola Guadalupe Caymes-Scutari (Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza, Universidad Tecnológica Nacional, Mendoza. Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)), Miguel Méndez-Garabetti (Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Informaci& oacute;n, Facultad Regional Mendoza, Universidad Tecnológica Nacional, Mendoza. Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)) and Germán Bianchini (Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza, Universidad Tecnológica Nacional, Mendoza. Argentina)

ABSTRACT: In the area of scientific research there are countless optimization problems that can not be exactly solved by a computer in a reasonable time. Advances in computing science have addressed with these problems developing different techniques that attempt to approximate the exact solutions. Among them, the Differential Evolution (DE) algorithm is a method of common choice. Numerous applications have demonstrated the potential of the method in problem solving, naming efficiency, convergence and robustness. Moreover, by the nature of the algorithm, there are several approaches for transforming its sequential processing scheme into a parallel one, so as to increase the computational speed without neglecting the solutions quality. This paper presents two parallel alternatives to the classical Differential Evolution algorithm. Both proposals are based on an island model, a ring interconnection topology and a population migration strategy, whose advantages and drawbacks are presented. They have been proved with a set of benchmark functions considering different configurations for the parameters of DE, and they have also been analyzed according to explicit performance measurements.

Downloads: paper and presentation.
List Scheduling Heuristics for Virtual Machine Mapping in Cloud Systems

PRESENTER: Bernabé Dorronsoro

SCHEDULE: Monday, July 29, 12:30

AUTHORS: Sergio Nesmachnow, Santiago Iturriaga (Centro de Cálculo, Facultad de Ingeniería, Universidad de la República, Uruguay), Bernabé Dorronsoro (LIFL, University of Lille, France), El-Ghazali Talbi (LIFL, University of Lille, France) and Pascal Bouvry (Computer Science and Communications Research Unit, University of Luxembourg, Luxembourg)

ABSTRACT: This article introduces the formulation of the Virtual Machine Planning Problem in cloud computing systems. It deals with the efficient allocation of a set of virtual machine requests from customers into the available prebooked resources the broker has in a number of cloud providers, maximizing the broker profit. Eight list scheduling heuristics are proposed to solve the problem, by taking into account different criteria for mapping request to available virtual machines. The experimental evaluation analyzes the profit, makespan, and flowtime results of the proposed methods over a set of 400 problem instances that account for realistic workloads and scenarios using real data from cloud providers.

Downloads: paper and presentation.
Heterogeneous Resource Allocation in the OurGrid Middleware: a Greedy Approach

PRESENTER: Miguel Da Silva

SCHEDULE: Monday, July 29, 12:50

AUTHORS: Miguel Da Silva and Sergio Nesmachnow (Centro de Cálculo, Facultad de Ingeniería, Universidad de la República, Uruguay)

ABSTRACT: OurGrid is an open source grid middleware that enables the creation of peer-to-peer computational grids to speed up the execution of bag-of-tasks applications. This article addresses the scheduling problem arising when the participants of the grid contribute with heterogeneous resources having different computing power, by studying the application of a greedy approach for selecting and assigning resources to jobs submitted for execution in a cooperative grid. The proposed method has been incorporated to the OurGrid code. The experimental analysis performed over a set of 90 realistic problem instances following both the related and unrelated machines model demonstrates that significant execution time improvements over the standard scheduling policy are obtained: about 30-35% overall, and 25-30% for large grid scenarios.

Downloads: paper and presentation.
Evolutionary-Statistical System with Island Model for Forest Fire Spread Prediction (Short Paper)

PRESENTER: Miguel Méndez-Garabetti

SCHEDULE: Monday, July 29, 13:10

AUTHORS: Miguel Méndez-Garabetti (Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza, Universidad Tecnológica Nacional, Mendoza. Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)), Germán Bianchini (Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza, Universidad Tecnológica Nacional, Mendoza. Argentina), María Laura Tardivo (Departamento de Computación, Universidad Nacional de Río Cuarto, Córdoba. Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza, Universidad Tecnológica Nacional, Mendoza. Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)) and Paola Caymes-Scutari (Laboratorio de Investigación en Cómputo Paralelo/Distribuido (LICPaD), Departamento de Ingeniería en Sistemas de Información, Facultad Regional Mendoza, Universidad Tecnológica Nacional, Mendoza. Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET))

ABSTRACT: Models are used in many areas of science to represent different systems. These models must be fed with input parameters representing some particular conditions and provide an output representing system evolution. A particular case where models are useful is forest fire spread prediction. However, in most cases, models present a series of limitations. Such restrictions are due to the need for a large number of input parameters and, usually, such parameters have some degree of uncertainty due to an inability to measure them in real time. To overcome this drawback and improve the quality of the prediction, several methods have been developed, among them S2F2M and ESS. This work proposes an improvement of the latter method, which incorporates the Island Model to the Parallel Evolutionary Algorithm. As a result of this development, we expect to obtain improvements in the quality of the prediction due to the increase in the diversity of cases generated because of the incorporation of the Island Model.

Downloads: paper and presentation.

Paper Session II: GPU Architecture and Applications

Percolation Study of Samples on 2D Lattices using GPUs (Short Paper)

PRESENTER: D.A. Matoz Fernández

SCHEDULE: Tuesday, July 30, 11:15

AUTHORS: D.A. Matoz Fernández, P. M. Pasinetti and A.J. Ramirez-Pastor (Departamento de Física, Instituto de Física Aplicada, Universidad Nacional de San Luis-CONICET, San Luis, Argentina)

ABSTRACT: We study the percolation problem of sites on 2D lattices of various geometries, using general purpose graphic processing units (GPGPU). The implementation of a component labeling parallel algorithm in CUDA and their generalization to different geometries, is discussed. The results of performance for this algorithm on a GPU versus the corresponding sequential implementation of reference on a CPU were analyzed. We present different alternatives of implementation, considering the generation of samples in both the CPU host and in the GPU itself, and discussing the synchronization problems that arise. Finally, a new scheme able to take full advantage of the inherent massiveness of the GPU processing simultaneously a great number of samples is presented, showing an significant improvement in the overall performance.

Downloads: paper and presentation.
Towards a Distributed GPU-Accelerated Matrix Inversion

PRESENTER: Gerardo Ares

SCHEDULE: Tuesday, July 30, 11:35

AUTHORS: Gerardo Ares (Bull, São Paulo, Brazil, Instituto de Computación, Universidad de la República, Montevideo, Uruguay), Pablo Ezzatti (Instituto de Computación, Universidad de la República, Montevideo, Uruguay)and Enrique S. Quintana-Ortí (Dpto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Spain)

ABSTRACT: We present an extension of a GPU-based matrix inversion algorithm for distributed memory contexts. Specifically, we implement and evaluate a message-passing variant of the Gauss-Jordan method (gje) for matrix inversion on a cluster of nodes equipped with GPU hardware accelerators. The experimental evaluation of the proposal shows a significant runtime reduction when compared with both the distributed non-GPU implementation of gje and a conventional method based on the LU factorization (as implemented in ScaLAPACK). In addition to this, our proposal leverages the aggregated capacity of the GPU memories in the cluster to overcome the constraints imposed by the reduced memory space of these devices.

Downloads: paper and presentation.
Solving 3D Viscous Incompressible Navier-Stokes Equations using CUDA

PRESENTER: Santiago Costarelli

SCHEDULE: Tuesday, July 30, 11:55

AUTHORS: Santiago Costarelli, Mario Storti, Rofrigo Paz, Lisandro Dalcín and Sergio Idelsohn ( CIMEC - INTEC, Santa Fe, Argentina)

ABSTRACT: A CUDA implementation of the 3D viscous incompressible Navier-Stokes equations is proposed using as advection operator the BFECC (Back and Forth Error Compensation and Correction) scheme. The Poisson problem for pressure is solved with a CG (Conjugated Gradient) preconditioning the system with FFTs (Fast Fourier Transforms). Study cases such as Lid-Driven Cavity and Flow Past Circular Cilinder, both 2D and 3D, are solved in order to check accuracy and obtain performance measurements.

Downloads: paper and presentation.
A GPU Implementation for Improved Granular Simulations using LAMMPS

PRESENTER: Emmanuel Nicolás Millán

SCHEDULE: Tuesday, July 30, 12:15

AUTHORS: Emmanuel Nicolás Millán, Christian Ringl, Carlos Bederián, María Fabiana Piccoli, Carlos García Garino and Herbert M.

ABSTRACT: Abstract. Granular mechanics plays an important role in many branches of science and engineering, from astrophysics applications in planetary and interstellar dust clouds, to processing of industrial mixtures and powders. In this context, a granular simulation model with improved adhesion and friction, is implemented within the open source code LAMMPS (lammps.sandia.gov). The performance of this model is tested in both CPU and GPU (Graphics Processing Unit) clusters, comparing with performance for the LAMMPS implementation of another often used interaction model, the Lennard-Jones potential. Timing shows accelerations of ~4-10x for GPUs versus CPUs, with good parallel scaling in a hybrid GPU-CPU cluster.

Downloads: paper and presentation.
Permutation Index and GPU to efficiently Solve Many Queries

PRESENTER: Olga Mariela Lopresti

SCHEDULE: Tuesday, July 30, 12:35

AUTHORS: Olga Mariela Lopresti, Natalia Miranda. María Fabiana Piccoli and Nora Reyes (LIDIC. Universidad Nacional de San Luis, San Luis, Argentina)

ABSTRACT: Similarity search is a fundamental operation for applications that deal with multimedia data. For a query in a multimedia database it is meaningless to look for elements exactly equal to a given one as query. Instead, we need to measure the similarity (or dissimilarity) between the query object and each object of the database. The similarity search problem can be formally defined through the concept of metric space, which provides a formal framework that is independent of the application domain. In a metric database, the objects from a metric space can be stored and similarity queries about them can be efficiently answered. In general, the search efficiency is understood as minimizing the number of distance calculations required to answer them. Therefore, the goal is to preprocess the dataset by building an index, such that queries can be answered with as few distance computations as possible. However, with very large metric databases is not enough to preprocess the dataset by building an index, it is also necessary to speed up the queries by using high performance computing, as GPU. In this work we show an implementation of a pure GPU architecture to build the Pemutation Index, used for approximate similarity search on databases of different data nature. Our proposal is able to solve many queries at the same time.

Downloads: paper and presentation.
Performance Analysis of a Symmetric Cryptography Algorithm on GPU and GPU Cluster

PRESENTER: Adrián Pousa

SCHEDULE: Tuesday, July 30, 12:55

AUTHORS: Adrián Pousa, Victoria Sanz and Armando De Giusti (Instituto de Investigación en Informática LIDI - Facultad de Informática, Universidad Nacional de La Plata, Buenos Aires, Argentina)

ABSTRACT: This article presents a performance analysis of the symmetric encryption algorithm AES (Advanced Encryption Standard) on a machine with one GPU and a cluster of GPUs, for cases in which the memory required by the algorithm is more than that of a GPU. Two implementations were carried out, based on C language, that use the tool CUDA in the case of a single GPU and a combination of CUDA and MPI in the case of the cluster of GPUs. The experimental work carried out shows how communications in the GPU cluster negatively affect algorithm total computation time.

Downloads: paper and presentation.
Encrypting video streams using OpenCL code on-demand

PRESENTER: not presented

SCHEDULE: not presented

AUTHORS: Juan P. D'Amato (PLADEMA, Facultad de Ciencias Exactas, Universidad Nacional del Centro de la Provincia de Buenos Aires, Buenos Aires and CONICET, Argentina), Marcelo Vénere (PLADEMA, Facultad de Ciencias Exactas, Universidad Nacional del Centro de la Provincia de Buenos Aires, Buenos Aires, Argentina)

ABSTRACT: The amount of multimedia information transmitted through the web is very high and increasing. Generally, this kind data is not correctly protected, since users do not appreciate the information that images and videos may contain. In this work, we present an architecture for managing safely multimedia transmission channels. The idea is to encrypt and encode images or videos in an efficient and dynamic way. The main novelty is the use of on-demand parallel code written in OpenCL. The algorithms and data structure are known only at communication time what we suppose increases the robustness against possible attacks. We conducted a complete description of the proposal and several performance tests with different known algorithms.

Downloads: paper

Paper Session III: CPU and Multicore Architectures and Applications

Many-core Tile64 vs. Multi-core Intel Xeon: Bioinformatics Performance Comparison

PRESENTER: Myriam Kurtz

SCHEDULE: Tuesday, July 30, 16:00

AUTHORS: Myriam Kurtz (Dep. Informática, Universidad Nacional de Misiones, Misiones, Argentina), Francisco J. Esteban (Servicio de Informática, Universidad de Córdoba, Córdoba, Spain), Pilar Hernández (Instituto de Agricultura Sostenible (IAS-CSIC), Córdoba, Spain), Juan A. Caballero (Dep. Estadística, Universidad de Córdoba, Córdoba, SpainM), Antonio Guevara (Dep. Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain), Gabriel Dorado (Dep. Bioquímica y Biología Molecular, Universidad de Córdoba, Córdoba, Spain) and Sergio Gálvez (Dep. Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain)

ABSTRACT: The performance of the many-core Tile64 versus the multi-core Xeon x86 architecture on bioinformatics has been compared. We have used the pair-wise algorithm MC64-NW/SW that we have previously developed to align nucleic acid (DNA and RNA) and peptide (protein) sequences for the benchmarking, being an enhanced and parallel implementation of the Needleman-Wunsch and Smith-Waterman algorithms. We have ported the MC64-NW/SW (originally developed for the Tile64 processor), to the x86 architecture (Intel Xeon Quad Core and Intel i7 Quad Core processors) with excellent results. Hence, the evolution of the x86-based architectures towards coprocessors like the Xeon Phi should represent significant performance improvements for bioinformatics.

Downloads: paper and presentation.
Towards a Massively Parallel Simulations with PFEM-2

PRESENTER: Juan Marcelo Gimenez

SCHEDULE: Tuesday, July 24, 16:20

AUTHORS: Juan Marcelo Gimenez and Norberto Marcelo Nigro (Centro de Investigaciones en Mecánica Computacional (CIMEC), UNL/CONICET, Santa Fe, Facultad de Ingeniería y Ciencias Hídricas - Universidad Nacional del Litoral, Santa Fe. Argentina)

ABSTRACT: In this work an implementation of the Particle Finite Element Method Two (PFEM-2) based on the distributed-memory architecture is presented. PFEM-2 consists on a material derivative based formulation of the transport equations with an hybrid spatial discretization which uses an eulerian mesh and lagrangian particles. Strategies for the parallelization of eulerian methods based on mesh or lagrangian solutions based on particles which solve fluid-dynamics problems are widely studied separately, however not enough works treat the use of both approaches together. Typical solutions for domain-distribution on eulerian frames are not proper to balance the work-load on some lagrangian stages, then to achieve good performance must be analyzed the use of weighted decomposition to the partitioning. Performance analysis of the implementation running over a beowulf cluster are presented. The weighted partitioning can be used to improve the speed-up when the diffusion of the problem is low, on the other hand, with large diffusion a classical eulerian decomposition is the best choice. However the overall cpu-time required to solve the presented incompressible flow cases with the PFEM-2 method is lower than using classical eulerian solvers, which give auspicious future thinking in solving massively parallel simulations.

Downloads: paper, presentation and videos(72MB).
Trading Off Performance for Power-Energy in Dense Linear Algebra Operations

PRESENTER: Pablo Ezzatti

SCHEDULE: Tuesday, July 24, 16:40

AUTHORS: Peter Benner (Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany), Pablo Ezzatti(Instituto de Computación, Universidad de la República, Montevideo, Uruguay), Enrique Quintana-Ortí (Dpto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I, Castellón, Spain) and Alfredo Ramón (Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany)

ABSTRACT: We analyze the performance-power-energy balance of a conventional Intel Xeon multicore processor and two low-power architectures -an Intel Atom processor and a system with a quad-core ARM Cortex A9+NVIDIA Quadro 1000M- using a high performance implementation of Gauss-Jordan elimination (GJE) for matrix inversion. The blocked version of this algorithm employed in the experimental evaluation mostly comprises matrix-matrix products, so that the results from the evaluation carry beyond the simple matrix inversion and are representative for a wide variety of dense linear algebra operations/codes.

Downloads: paper and presentation.
Strategies to Optimize the LU Factorization Algorithm on Multicore Computers

PRESENTER: Gustavo Wolfmann

SCHEDULE: Tuesday, July 24, 17:00

AUTHORS: Janet Soler, Javier Ortiz and Gustavo Wolfmann (Laboratorio de Computación, Facultad Cs. Exactas Físicas y Naturales, Universidad Nacional de Córdoba)x

ABSTRACT: The number of cores in multicore computers has an irreversible tendency to increase. Also, computers with multiple sockets to insert multicore chips are based on a complex hardware design and are becoming more common. To parallelize the algorithms that run on this type of computers in order to obtain a higher performance rate, is a goal that can only be achieved by taking into account hardware architecture. As hardware evolves, so must software. This leads to old parallelization strategies quickly become obsolete. This paper presents a series of alternatives for parallelization the LU factorization algorithm and its results intended to running on a multicore system. Simple strategies lead to poor results. This study presents complex strategies that merge double levels of parallelism with asynchronous scheduling whose results reach up to the State-of-the-art in the field and even go further.

Downloads: paper and presentation.

Paper Session IV: Prospective and On going Projects

Use of the PGAS Model for High Performance Computing in Beowulf Clusters (Short Paper)

PRESENTER: Jorge D'Elia

SCHEDULE: Tuesday, July 30, 17:50

AUTHORS: Jorge D'Elia, Lisandro Dalcín, Sofía Sarraf, Ezequiel López, Laura Battaglia, Gustavo Ríos Rodriguez, Victorio Sonzogni (Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC), Instituto de Desarrollo Tecnológico para la Industria Química (INTEC), Universidad Nacional del Litoral - CONICET, Santa Fe, Argentina)

ABSTRACT: The Partitioned Global Address Space (PGAS) is a parallel programming model that has been developed for distributed memory computers. Furthermore, it can be used in High Performance Computing (HPC) on Beowulf clusters oriented to scientific and engineering applications through computational mechanics. As it is known, the PGAS model is the basis for, among others, some multi-paradigm programming languages such as the UPC (Unified Parallel C) and the Coarray Fortran (CAF or Fortran 2008), as well as the library Global Arrays (GA). All these resources are extensions to provide one-side communication. This work summarizes some of the activities carried out in one of the clusters available in CIMEC, as well as some ideas for a Message Passing Interface (MPI) implementation of coarrays on a fortran compiler.

Downloads: paper and presentation.
Exploring the Use of Light Threads to Improve the Instruction Level Parallelism

PRESENTER: Esteban Mocskos

SCHEDULE: Tuesday, July 24, 18:05

AUTHORS: David Gonzalez Márquez (Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina), Adrián Cristal Kestelman (Barcelona Supercomputing Center, Artificial Intelligence Research Institute - CSIC, Barcelona, Spain) and Esteban Mocskos (Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina)

ABSTRACT: One of the main components of any general-purpose machine is the microprocessor, this component can be found at the heart of every machine: from standard servers and high performance computing nodes to portable mobile platforms. Its main task is to correctly execute programs as fast as it can, having the production cost and consumption as border conditions. The research in processor architectures centers in optimizing the processor design according to the specified functionality and having into account present and future technologies. This optimization can be based on different parameters: performance, consumption, production cost, surface. In this work, we propose a novel mechanism combining software and hardware that allows to improve the Instruction Level Parallelism using simple cores and light threads. A modified processor is implemented using a simulation tool and two examples are presented: sorting an array and filtering a matrix. In both cases, promising results are obtained.

Downloads: paper and presentation.
High Performance Computing in Medical Physics (Short Paper)

PRESENTER: Roberto Isoardi

SCHEDULE: Tuesday, July 24, 18:20

AUTHORS: Roberto Isoardi and Pablo A. Cappagli (Comisión Nacional de Energía Atómica, Fundación Escuela de Medicina Nuclear Mendoza, Argentina)

ABSTRACT: Within the myriad applications of Physics in Medicine, there are two major fields in terms of their relevance in clinical practice: Medical Imaging and Radiation Therapy. Both areas make extensive use of computational resources in order to provide a prompt response to physicians, if possible in real time. Although execution times were dramatically reduced in the last decade with faster than ever CPUs, it is still common to wait several minutes and on some occasions, several hours for certain processing tasks to yield clinically useful results. Some frequent examples include tomographic image reconstruction, internal dosimetry calculation and radiotherapy planning. Acceleration of such processes may be sometimes vital or extremely important, not only for the patient -whose quality of life improvement is the ultimate goal-, but also for optimizing professional work in a busy hospital environment. In recent years, Medical Physics benefited greatly from the implementation of new computing strategies for several applications, particularly making use of GPU. This short paper describes some of the current fields of our expertise in Medical Physics, where High Performance Computing (HPC) plays indeed a key role.

Downloads: paper and presentation.
Wireless Sensor Networks: A Software as a Service Approach

PRESENTER: Lucas Iacono

SCHEDULE: Tuesday, July 24, 18:35

AUTHORS: Lucas Iacono(Instituto de Microelectrónica. Facultad de Ingeniería, Universidad de Mendoza, Mendoza. ITIC, Universidad Nacional de Cuyo. Mendoza, Argentina), Carlos García Garino (ITIC, Universidad Nacional de Cuyo, Mendoza. Facultad de Ingeniería, Universidad Nacional de Cuyo. Mendoza, Argentina), Osvaldo Marianetti (Instituto de Microelectrónica. Facultad de Ingeniería, Universidad de Mendoza, Mendoza, Argentina) and Cristina Párraga (DICYTyV, Universidad de Mendoza, Mendoza, Argentina)

ABSTRACT: This paper presents a new integration system to achieve the WSNs remote access through Cloud Computing Services. The proposed system provides easy and reliable remote access to data and settings of agro-meteorological WSNs. In order to implement the proposed system experiments with two Cloud Computing Services, Globus Online and Google Drive were conducted. In the paper different experiments are discussed in order to show the proposed system capabilities. The tests shown that agro-meteorological WSNs, composed by nodes with limited resources, can be easy accessed through Cloud Computing Services.

Downloads: paper and presentation.

return to main page