Back in June, I posted some information on social graph engines triggered by my interest in Google Pregel. I still expect this area of social analytics to be a key capability organizations will need to focus on as they deal with large-scale processing of social graph data.I came across the info below (related to an Apache project) and thought I would pass it along:
Apache HAMA: An Introduction to Bulk Synchronization Parallel on Hadoop
Welcome to Hama project
Hama (means a hippopotamus in Korean) is a distributed scientific package on Hadoop for massive matrix and graph data. It is currently in incubation with Apache. The main goal of Hama is to provide computational tools for data-intensive scientific and industrial areas. It consists of two packages, which are the matrix package and the graph package.
- Scientific simulation and modeling
- Matrix-vector/matrix-matrix multiply
- Soving linear systems
- Scientific graphs
- Information retrieval
- Sorting
- Finding eigenvalues and eigenvectors
- Computer graphics and computational geometry
- Matrix multiply
- Computing matrix determinate
For more information about Hama, please see the Hama wiki.
FrontPage - Hama Wiki
Hama (means a hippopotamus in Korean) is a distributed scientific package on Hadoop for massive matrix and graph data. It is currently in incubation with Apache. The main goal of Hama is to provide computational tools for data-intensive scientific and industrial areas. It consists of two packages, which are the matrix package and the graph package.
Architecture
BSP
The BSP package is a implementation of BSP over Hadoop RPC(sockets). By using a BSP model which is based on the concept of a superstep, during which processes perform computations using local data, a more rapid and sensitive program will be allowed.
Matrix
Graph
Shell/DSL
Hama DSL (Domain Specific Language) in Groovy -- Work in progress
Hama Shell -- Work in progress
The Graph Package (Angrapa)
The graph package, called Angrapa, is an large-scale graph data management framework for analytical processing. It is still in heavy development. Angrapa will employ massive parallelism on Hadoop, and It aims to achieve the scalability for processing tera bytes or peta bytes graph data. Angrapa will be used in a variety of scientific and industrial areas, such as data mining, machine learning, information retrieval, bioinformatics, and social networks, required to process large-scale graph data.

Comments