- About the Manual
- Install R Base on Hadoop
- Install R Studio on Hadoop
- Install RHadoop packages
RHadoop is a collection of four R packages that allow users to manage and analyze data with Hadoop.
- plyrmr– higher level plyr-like data processing for structured data, powered by rmr
- rmr– functions providing Hadoop MapReduce functionality in R
- rhdfs– functions providing file management of the HDFS from within R
- rhbase– functions providing database management for the HBase distributed database from within R
This manual is direct for R and Hadoop 2.4.0 integration on Ubuntu 14.04
We assume, that the user would have below two running up before starting R and Hadoop integration
– Ubuntu 14.04
– Hadoop 2.x +
Read my blog to learn more about here on how setting-up-a-single-node-hadoop-cluster.
Pre – requisite:
Once Hadoop installation is done, make sure that all the processes are running:
Run the command jps on your terminal and the result should look similar to below screen shot:
Step 1: Click on the Ubuntu-software center.
Step 2: Open Ubuntu Software Center in full screen mode, if the size of the screen is small then we cannot see the search option,Search R-base and click on the First link. Click on install
Step 3: Once installation has done open your terminal. Type the command R and your r console will be open.
You can perform any operation on this R console for example, to plot a graph of some variables:-
We can see the graph of this plot function below screenshot:
If we want to come out from R console then give the command
If you want to save workspace then type y otherwise type n.
c is for continue on the same workspace.
Step 7: Now we install R-studio in ubuntu.
- Open your browser and download r-studio. I downloaded RStudio 0.98.953 – Debian 6+/Ubuntu 10.04+ (32-bit) — this is actually a file: rstudio-0.98.953-amd32.deb
Go to download folder, right click on the download file and open file with Ubuntu Software Center and click on install.
Go on terminal and type R, you can see R console and R studio.
Install RHadoop packages
Step1: Install thrift
sudo apt-get install libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev
$ cd /tmp
If the below does not work please manually download the thrift jar
$ sudo wget https://dist.apache.org/repos/dist/release/thrift/0.9.0/thrift-0.9.0.tar.gz | tar zx
$ cd thrift-0.9.0/
$ sudo make install
$ thrift –help
Step 2: Install supporting R packges:
install.packages(c(“rJava”, “Rcpp”, “RJSONIO”, “bitops”, “digest”, “functional”, “stringr”, “plyr”, “reshape2”, “dplyr”, “R.methodsS3”, “caTools”, “Hmisc”), lib=”/usr/local/R/library”)
Step 3: Download below packages from https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads
In R terminal run the commands to install packages. Replace <path> to suit your downloaded file location
sudo gedit /etc/R/Renviron
Install RHadoop (rhdfs, rhbase, rmr2 and plyrmr)
Install relevant packages:
install.packages(“rhdfs_1.0.8.tar.gz”, repos=NULL, type=”source”)
install.packages(“rmr2_3.1.2.tar.gz”, repos=NULL, type=”source”)
install.packages(“plyrmr_0.3.0.tar.gz”, repos=NULL, type=”source”)
install.packages(“rhbase_1.2.1.tar.gz”, repos=NULL, type=”source”)
You’ll find youtube vedio and step by step instruction about installing R in Hadoop in the following link.
Rdatamining: R on Handoop – Step by step instructions
Youtube: Word count map reduce program in R
Revolution Analytics: RHadoop packages
Install R-base Guide
In the next blog post I’ll show a sample sentiment analysis using map reduce in R using rmr package.