GARUDA: India’s National Grid Computing Initiative
N. Mohan Ram, Chief Investigator – GARUDA
S. Ramakrishnan, Director General – C-DAC
GARUDA1 is a collaboration of science researchers and experimenters on a nationwide grid of computational nodes, mass storage and scientific instruments that aims to provide the technological advances required to enable data and compute intensive science for the 21st century. One of GARUDA’s most important challenges is to strike the right balance between research and the daunting task of deploying innovation into some of the most complex scientific and engineering endeavors being undertaken today.
Building a commanding position in Grid computing is crucial for India. By allowing researchers to easily access supercomputer-level processing power and knowledge resources, grids will underpin progress in Indian science, engineering and business. The challenge facing India today is to turn technologies developed for researchers into industrial strength business tools.
The Department of Information Technology2 (DIT), Government of India has funded the Centre for Development of Advanced Computing3 (C-DAC) to deploy the nationwide computational grid ‘GARUDA’ which will connect 17 cities across the country in its Proof of Concept (PoC) phase with an aim to bring “Grid” networked computing to research labs and industry. GARUDA will accelerate India’s drive to turn its substantial research investment into tangible economic benefits.
GARUDA aims at strengthening and advancing scientific and technological excellence in the area of Grid and Peer-to-Peer technologies. The strategic objectives of GARUDA are to:
- Create a test bed for the research and engineering of technologies, architectures, standards and applications in Grid Computing
- Bring together all potential research, development and user groups who can help develop a national initiative on Grid computing
- Create the foundation for the next generation grids by addressing long term research issues in the strategic areas of knowledge and data management, programming models, architectures, grid management and monitoring, problem solving environments, grid tools and services
The following key deliverables have been identified as important to achieving the GARUDA objectives:
- Grid tools and services to provide an integrated infrastructure to applications and higher-level layers
- A Pan-Indian communication fabric to provide seamless and high-speed access to resources
- Aggregation of resources including compute clusters, storage and scientific instruments
- Creation of a consortium to collaborate on grid computing and contribute towards the aggregation of resources
- Grid enablement and deployment of select applications of national importance requiring aggregation of distributed resources
To achieve the above objectives, GARUDA brings together a critical mass of well-established researchers from 45 research laboratories and academic institutions that have formulated an ambitious program of activities.
The major components of GARUDA (Figure 1) include the computing resources, high-speed communication fabric, middleware & security mechanisms, tools to support program development, collaborative environments, data management and grid monitoring & management. Access portals and specialized problem solving environments provide a seamless user interface to the Grid.
In the initial phase, the PARAM4 Clusters at C-DAC labs in Bangalore, Pune, Hyderabad and Chennai will power the Grid. This provides a heterogeneous resource environment with clusters based on AIX, Solaris and Linux environments. The PARAM clusters are powered by PARAMNet5 interconnect and C-DAC’s HPCC6 software. The PARAMNet system area network has 2.5 Gbps links and exports the Kshipra7 lightweight communication protocol conforming to Virtual Interface Architecture (VIA) and MPI Application Programming Interface. The HPCC software provides a complete solution for creating and executing parallel programs on UNIX clusters through high performance communication protocols and a rich set of program development, system management and software engineering tools. This software is available on AIX, Solaris and Linux cluster environments. As the project progresses, GARUDA partners are expected to contribute resources including specialized scientific instruments.
The GARUDA network is a Layer 2/3 MPLS Virtual Private Network (VPN) connecting select institutions at 10/100 Mbps with stringent quality and service level agreements. The multi-services network with a total backbone throughput of 2.43 Gbps, connects 17 cities (Figure 2) covering 45 research and academic institutions across the country. It is expected to support not only the traffic requirements of high performance computing applications but also other requirements like that of IP-based collaborative environments enabled through video conferencing and Access Grid.
A dedicated Grid monitoring and management centre at C-DAC, Bangalore helps in managing and monitoring all the components in the Grid. State-of-the-art display walls and advanced software like Paryaveekshanam (Figure 3), developed at C-DAC, help in effectively monitoring the health and utilization of various components of the Grid. A mobile agent framework for monitoring the Grid resources and also for automatic update of software releases is being explored.
Proposed research activities include exploring advanced network services, development of novel architectures, integration of network services into the Grid middleware, deployment of IPv6 and alternate protocols to overcome the shortcomings of IP over high-speed networks. This fabric is a pre-cursor to the next generation Gigabit network and is being deployed in collaboration with ERNET8 – a scientific society under the Department of Information Technology. A simulation model of the network is being developed to understand the impact of change in traffic profiles on the performance and in providing inputs to decide on the architecture of the fabric for the next phase of the project.
Recent trends in Grid Computing indicate that the standardization of the Grid programming model and associated management services is still under progress. The Open Grid Services Architecture9 (OGSA) represents an evolution towards a Grid system architecture based on Web services. OGSA compliant, higher-level functions are beginning to be implemented. Therefore, GARUDA has adopted a pragmatic approach for using existing Grid infrastructure and Web Services technologies. The deployment of grid tools and services for GARUDA will be based on a judicious mix of in-house developed components, the Globus Toolkit (GT) and industry grade components. GT210 will be deployed on the GARUDA grid for operational requirements while researchers will experiment with GT411 at the Grid labs.
The resource management and scheduling in GARUDA is based on a deployment of industry grade schedulers in a hierarchical architecture. At the cluster level, scheduling is achieved through Load Leveler12 for AIX platforms and Torque13 for Solaris and Linux clusters. At the Grid level, the Moab14 scheduler from Cluster Resources15 interfaces with the various cluster level schedulers to transparently map user requests onto available resources in the Grid. Moab supports advanced features including intelligent data staging, co-allocation and multi-sourcing, service monitoring and management, sovereignty (local vs. central management policies), virtual private cluster and virtual private grid. Moab interfaces with Globus for data and user management, job staging and security.
To enable data oriented applications, GARUDA provides an integrated but distributed data storage architecture by deploying the Storage Resource Broker16 (SRB) from Nirvana.17 SRB creates and maintains a Global Namespace across multiple heterogeneous and distributed storage systems in the Grid. The Global Namespace is a hierarchical organization of all collections, sub-collections, and data objects in the SRB Federation, independent of their physical storage infrastructure. Access is virtualized through a single sign-on and interface through one common set of APIs. The SRB provides advanced services including transparent data load and retrieval, data replication, persistent migration, data backup and restore, and secure queries. Data security is ensured through the following mechanisms: authentication, authorization, tickets, encryption, access control lists, audit trails and role based classification of users. SRB achieves performance through parallel I/O, bulk operations, latency minimization, scalable implementation and a transaction based architecture. Data management is also automated through launch of daemons automatically on start-up.
Program Development Environment (PDE) enables users to carry out an entire program development life cycle for the Grid. During the program development cycle the user prototypes, implements, debugs and tunes his application. PDEs help users to express, manipulate and manage complex workflows, and also facilitate development using Grid specific programming languages such as scripting languages and workflow languages. They also help to reduce the complexities of understanding different environments as they act as a standard environment across all the resources of the Grid.
The GARUDA PDE includes basic program development tools such as editors and compilers; program analysis tools like debuggers and profilers; workflow environments and tools that help in porting, conversion and scalability. For a seamless interface to the user it would be ideal if all these components are made available through an Integrated Development Environment (Grid IDE). In the initial phase of the project the work will be focused towards delivering a debugger for the grid environment. This debugger will have features similar to DIViA,18 which is an integrated debugging environment available on the PARAM clusters.
The GARUDA portal, which provides the user interface to the Grid resources, hides the complexity of the Grid from the users. It allows submission of both sequential and parallel jobs and also provides job accounting facilities. Problem Solving Environments (PSE) in the domains of Bio-informatics, Cryptanalysis and Community Atmospheric Model support the entire cycle of problem solving for the specific domains by supporting problem formulation, algorithm selection, numerical simulation and solution visualization.
Access to the Grid resources can either be through the high-speed communication fabric or over the Internet. Access through satellite based communication channels is also being explored as part of a research initiative to integrate the GARUDA terrestrial grid with a satellite based grid. Research on Semantic Grids is underway in collaboration with MIT,19 Chennai. The initial focus will be on publishing and intelligent discovery of Grid services. These capabilities will be integrated with the GARUDA portal to make access to the Grid seamless to the users.
Applications of national importance that require aggregation of geographically distributed resources will be developed and deployed on the GARUDA Grid. Natural Disaster Management and bio-informatics applications that are characterized by intensive computing and data access requirements are being targeted during this phase.
C-DAC in association with a partner research institution will mine data from a network of sensors deployed over vast disaster prone regions and upload it to GARUDA as input to forecast models appropriate to various stages of disaster management. This will enable timely dissemination of disaster information to user agencies for effective mitigation and relief measures.
C-DAC’s Bio-informatics Resource and Applications Facility20 (BRAF) on the PARAM Supercomputing facility is accessible for the bio-informatics research community involved in insilico molecule identification and new drug discovery. The enormity of data and complexity of algorithms require tremendous computational cycles and storage. This demands effective use of grid resources beyond those available at any single location.
A public website www.garudaindia.in provides the required mechanism for the GARUDA Grid community to exchange and disseminate information periodically. GARUDA publications, technical reports and newsletters can be accessed through this site. C-DAC will organize a set of thematic workshops and conferences on a regular basis. Training activities will be organized to ensure that the users of GARUDA are kept abreast of the latest technological advancements in Grid Computing.
GARUDA will demonstrate the power of the Grid by deploying select applications of national importance over the test bed. It will eliminate the barriers to the coordinated use of national resources, regardless of the physical location of these resources and their users. For the first time in the country, it will provide a persistent and supported set of Grid infrastructure and deployable services. This infrastructure will provide a range of new Grid services addressing issues of resource discovery, secure access, resource monitoring and management, distributed data management and the like. Our goal in creating this national infrastructure is to enable novel approaches to scientific computing based on emerging concepts, the outcome of which will lead to revolutionary changes in a wide range of scientific disciplines across the country.