Judul | : | GAMAcloud V 1.0 [a cloud technology architecture on the box] |
Abstrak | : |
Technology research and business events needs support and services from information technology infrastructure with reliable and qualified levels. The cloud technology open the opportunities for enterprises and organizations to take the potential capacity and challenge for discovering the new research area and business channels. The cloud technology needs to address the challenge on high performance computation and large volume data managements. This delivering the challenge to launch the cloud architecture that well-design and well-implemented for bridging the gaps between research activities and business matters.
CloudBox is the design of cloud technology platfrom in the single box. CloudBox developed with reusable technology from the un-used PC and configure as single architecture of machines. The research use three development phases. The phase on cluster collection model is design to implement cluster environment in many node. The phase on cluster on the box will implement the architecture single architecture to run the cluster environment. The phase called cloud in the box is establishing the prototype of single system on the box. We belive this approach will addressed the need of high performance technology environment in many purposes.
Latarbelakang:
The business environment has being change rapidly. Many area of business have being adapted the change of the technology, customer, trend, and lifestyle of their customers. It was happened because of the new communication channel between enterprise and customer without border. We has seen, how the social network has became the marketing channel, the newspaper became the campaign of the best and worst services of enterprises, and the blog became the references for buy several products.
The enterprise realized they cannot avoid the change, they need to transform and adapt the change into their business processes. The enterprise manage the change became the opportunities to scale up the business and scale out the sales strategies. The big enterprise, maybe just need to adjust the way how to implement their corporate strategies, but for small medium enterprise, they need to develop.
The change of business environment has open the challenge the small medium enterprise to be the same line with the enterprises. The face the competition head to head, and reach the potential customer across the loyal customer and new customer. The customer itself just seen how the can enjoy the services without big change of use.
The technology that called Cloud technology has responsibilities on this situation. The technology has the potential resource to help enterprises to compete the business and the risk to manage. Cloud technology has already used by enterprises in many ways. They use for marketing activities, to know the competitive products and service specification, to learn solve the problem and discuss the future directions. The SME is also need on the issues. They just need channel to deliver the idea, consult, do, monitor, and get advices.
The world of science is very dynamics. One innovations or discovery in one area is also will have many discoveries in other area. The innovation itself can count in every minute and every second. The innovation and scientific works have involving many data from many resources. Interdisciplinary or multidisciplinary research has needs on data collection, for example the needs on data share and use of data. The need of long term access and use of digital scientific data has become the major interest in every organization. The data also has become very sensitive when the data need to share and access for another organization. The data is also need to change because of the change on the data landscape, technological matters, and user needs of dynamic data.
The digital scientific data have previous and future challenges. The data that produce by research work need to answering the demand on data collaboration. The research need to work with other research, it can be similar or multidisciplinary research. It is a must, because the world now needs the problem solving research and high quality research. Lee Dirks (2009) giving the sample of the future needs of scientific digital data, for example is research reporting. The research reports now have characteristics. It must have multi perceptions and dynamically can be customized to each user. The report also can have an access and use, in the term of seeing or following the research workflows and outputs from the lab experiments. The report can be exported into an electronic lab workbench in order to reconstruct the same or similar experiment. It is also giving capabilities to researcher working with multiple reports and having the ability to mash up data and workflows across experiments. The researches have facilities to implement new analyses methods and visualizations and to perform new experiments.
The digital scientific data is a proof of evidence. We need it as evidence of research experiments. We use digital data as proof of findings that presenting the high quality results. The data representing the work of the research, open opportunities for another research, and enable to enhance the previous research by developing better research in the future.
The scientific work has become very challenging and very important in every organization. The scientific work is running as research and development of new solutions, based on the organization subject. The Government needs scientific work to deliver the best, reliable and
accountable services to citizens. The enterprise needs scientific works to make sure they have new products and quality improvement on it. The Education sector has made scientific works as the leading environment in building academics atmosphere and culture.
There is transient information or unfilled demand for storage because of the growing of information. The Science 3.0 term now becomes the “hot” issues in scientific work. The challenge in this term is about the demand on data collaboration on scientific works. Multidisciplinary research is necessary; it needed to produces problem solving research, and high quality research. The Scientific technology is facing unique challenges to answer the needs of technology that can serve the exploding of digital data. The industry is giving the standard requirements on the long-term use and access of digital data. The need on data integration, annotation capabilities, provenance or quality of data, and security is the major issues to answer by Science 3.0 concepts .
In data grid environment, many large –scale scientific experiments, involving large data volume and simulations generate large data format, in the multiple site distributed storages. Furthermore, these data are shared between the researchers, research organizations, users, and
industries for data analysis, data visualization, and so forth. Several data replication techniques, including Globus toolkit , have been developed to support high-performance data accesses to the remotely produced scientific data. Data-intensive, high performance computing applications require the efficient management and transfer of terabytes or Petabytes of data in wide-area, and distributed computing environment. It also needs to be able to transfer large subsets of these datasets to local sites or other remote resources for processing. It is creating local copies or replicas to overcome long wide-area data transfer latencies. The data management environment must provide replication services, storing services, metadata services, security services such as authentication of users and control over who is allowed to access the data.
Data grid environment require distributed data intensive applications that met by the creation of a data grid infrastructure that provide a set of orthogonal, application- independent services that can then be combined and specialized in different ways to meet the specific requirements of applications. Data grid infrastructure can usefully build on capabilities provided by emerging Grid said Foster and Kesselman (2009), such as resource access, resource discovery, and authentication services. Digital information collections are heterogeneous, vast and growing at rate that out spaces from our ability to manage and preserve them.
Increasing amount of information are being created and maintained in digital format, includes objects from virtually every discipline and type. All of these objects need to be preserved and maintained for long term periods, be it out of legal requirements, and represented the basis of business models, the constitute valuable cultural heritage, need of the evidence and proof of scientific experiments, and personal reasons and value.
Studi Literatur :
The nature of the digital data to be managed will change. The data archives manage relatively few, large datasets. The data holdings mostly comprise simple forms of data. The public archive and libraries rarely archive digital objects. There is large number of system environments to address, technology changes fast, making formats, hardware and media obsolete. In the future, data archive will be called to manage additionally large numbers of small datasets. There will be demand to hold more complex data objects and interrelated or interdependent collections of objects. The public archives and libraries will have to handle vast numbers of small digital objects. The numbers of environment will continue grow, as nobody will “uninvent” old environment which spawned data. There is no reason to suspect this rate change will decrease, it may even accelerate.
The world of science is very dynamics. One innovations or discovery in one area maybe will have
some discovery in other area. The innovation itself maybe can count in every minute and every second. The scientific works also involving many data from many resources. Interdisciplinary or multidisciplinary research has some needs on data collection, for example, the needs on data share or use same sources of data. The scientific data has become massive and rapidly growing following the development and the rapid growth of scientific works. IDC (2005) has reported in The Exploding Digital Universe Report describe that the digital universe will grow 1—fold in five years, from about 160-170 Exabytes in 2006 to >16,000 Exabytes in 2011, the rse. The reports also have found that needs of storage and the growing of information will be increase.
In the year 2005 a new particle accelerator, the Large Hadron Collider (LHC), is scheduled to be in operation at CERN, the European Organization for Nuclear Research. In this project, they have produced Four High Energy Physics (HEP) experiments to produce several Petabytes of data per year over a life time of 15 to 20 years. CERN focused on establishment of the technology components for the implementation of new world-wide Data grid and demonstrated the effectiveness of this new technology through the large scale deployment of end to end application
experiments involving the real users. They have demonstrated the ability to build, connect and effectively manage large general-purpose, data intensive computer clusters constructed from low cost components.
They are few more projects related with Grid Data projects. Globus and Legion, which were directed towards computational grids but now also adding support for distributed data management and integrating this with the infrastructures. Globus itself produced Global Access to
Secondary Storage (GASS) API, the component of the toolkit which performs tasks related to data management. It is limited to providing remote file I/O operations, management of local file caches and file transfers in a client- server model with support for multiple protocols.
The Particle Physics Data Grids project is focused on developing a grid infrastructure which can support high speed data transfers and transparent access . It addresses replica management, high performance networking and interfacing with different storage brokers. The Grid Physics Network project has goal to pursue an aggressive program of fundamental IT research focused on realizing the concept of virtual data. Storage Request Broker addresses issues related to providing a uniform interface to heterogeneous storage systems and accessing replicated data over multiple sites. It provides ways to access data sets based on their attributes rather than physical location using metadata catalog.
The main initiatives of this research are looking at data management and preservation issues in data grid environment, and work in collaboration with these on-going efforts, and use the middleware developed by them if it satisfies our requirements. The most important is at a system which would integrate or interact with these researches so that end-users can benefit form the efforts being put in form all over the globe.
The word Science 3.0 actually is similar with Web 3.0. The word Science 3.0 has named for representing the next generation of scientific works. The implementation of Science 3.0 in research work can explain by samples in the research reporting methods. Lee Dirks, 2009 IS giving the sample of the future needs of research reporting. The Science 3.0 can enable what we call as live research reports. The reports have multi perceptions and dynamically can be customized to each user. The report also can have an access and use, in the term of seeing or following the research workflows and outputs from the lab experiments. The report can be exported into an electronic Lab workbench in order to reconstruct the same or similar experiment. It is also giving capabilities to researcher working with multiple reports and having the ability to
mash up data and workflows across experiments. The researches have facilities to implement new analysis methods and visualizations and to perform new experiments.
In technological environment, the demand on scientific work environment is the infrastructure that can manage massive, large and distributed computing processes. It is involving many large volume scientific works, involving huge data volume and simulations to generate and rendering large data. The data were share and open between the scientists, research enterprises, end-users, and industries. Several application projects to support the scientific collaboration have already existed. The open source software is one sample how the collaboration can produces the high quality software. The collaboration in the software development that shown in SourceForge is showing us from the beginning of project, every steps of developments, and the release or publishing the products involving the communities.
Metode :
The design architecture will explain the system architecture of the frameworks, the software architectures, and implementation. The architecture adapted from Science-Forge research that introduced a collaborative scientific framework design. The system architectures is consist of three components; there application, framework application, PC based cluster, and storage. Storage is the component that handling the data archives process. The storage works under the framework to manage and execute the dynamic data presentation. The main component is the framework application. It is handled the data collection activities, data processing and give the direction the data archive to storage. Access interface is use for managing the data administration from users to access the data. It is functionally important to make sure the data is accessible secure methods. Access node is components to verify and the data presenting to users as needed. We use develop PC Based Cluster in the implementation to run the system design architectures. We use MySQL Cluster as storage, and GForge system for framework application. The implementation of PC based cluster is using the MPI configuration to allow the workload of the process and allocates the activities.
Design :
we defined workload management for distributed scheduling and resource management. Data management develops and integrates tools and middleware infrastructure to manage and share large data volume in high-throughput production-quality grid environment.
Workload management consist of two major components, they are data management and preservation and monitoring services. Data management and preservation is managing the storing, access, and preservation tack for each of data. The monitoring services giving the evaluation and logs when the data failed in each of the task. We have used Globus Middleware services to monitoring and enabling end-user and administrator access to status and error information in data grid environment. Physical fabrics layers consist of fabric management work packages, networking packages and mass storage management packages. The management packages support for everything from the compute and network hardware up through the operating system, workload and application software. The network package use network infrastructure in laboratories to provide virtual private network between the computational and data resources forming data grids testbed. Mass storage management interfacing existing mass storage management system to multiple site grid data management system.
Hasil yang diharapkan :
The research result will consist of
- Two prototype products, there are:
a. Cluster on the box, that we called GamaCloudBox v.0.1
b. GamaBox v.1.0
- One Paper Conference (submission)