Over the weekend, I wanted to learn a little more about distributed computing, and Hadoop seemed like a good starting point.
To learn Hadoop, I really wanted to get my hands on it to give it a spin.
So now let?s see, how can we try to get Hadoop running on a MacOS. There are many ways to install Java and Hadoop, but in this article, we will use homebrew as the method of installation.
Modes of Hadoop
Also, there are 3 different modes of Hadoop. We will only be place our focus on using the Pseudo-distributed mode in this article.
- Stand-alone mode
- Pseudo-distributed mode
- Distributed mode
Before moving ahead to install anything, it is important to get SSH working locally on the MacOS first.
Check that you have SSH enabled properly.
$ ssh localhost
If it prompts you for your password and returns the Last login time, you are good to skip the next step.
To enable SSH in MacOS, go to System Preference > Sharing, enable ?Remote Login? and ?Allow access for: All Users?
System Preference > Sharing. Enable Remote Login and Allow Access for All Users
Once all is setup, test it with ssh localhost again. If you see the Last login time, you are good to go, if not you might face the following problems.
ssh: connect to host localhost port 22: Connection refused
To fix this, you need to first check that remote login is actually OFF.
$ sudo systemsetup -getremoteloginRemote Login: off
If you see the above message, it tells you that remote login is off, proceed to turn on remote login.
$ sudo systemsetup -setremotelogin on$ ssh localhost
If all is well, you should see the last login time. If you are not greeted with Last login? , then you might need to generate ssh keys (which are generated in the ~/.ssh/id_rsa.pub file) concatenate it to ~/.ssh/authorized_keys .
$ ssh-keygen -t rsa$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Finally test with ssh localhost, this should get you the Last login: ? message.
Install Java using brew cask. As of this writing, Java 9 is not compatible with Hadoop 2.8.2 yet. So to prevent receiving errors such as the following
WARNING: An illegal reflective access operation has occurredWARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/local/Cellar/hadoop/2.8.2/libexec/share/hadoop/common/lib/hadoop-auth-2.8.2.jar) to method sun.security.krb5.Config.getInstance()WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtilWARNING: Use –illegal-access=warn to enable warnings of further illegal reflective access operationsWARNING: All illegal access operations will be denied in a future release …
We will specifically be using Java version 8.
$ brew tap caskroom/versions$ brew cask install java8
Install Hadoop using brew.
Depending on your xcode version, you might need to update xcode. For me, I updated xcode from App Store.
$ brew install hadoop
After a successful installation, you should see the following messages shown in the screenshot below.
The Hadoop version installed for me was 2.8.2. For the rest of this article do remember to replace this version number with what is applicable to your version.
Successful installation of Hadoop 2.8.2
Hadoop was installed under /usr/local/Cellar/hadoop. In normal circumstances, brew would have automatically created a symlink from /usr/local/opt/hadoop to /usr/local/Celler/hadoop/<your-version-of-hadoop>
For simplicity, we will refer this to /usr/local/opt/hadoop from here on.
Next, you have to start configuring a couple of files. Go to /usr/local/opt/hadoop . In there you will need to make some changes or create the following files
In hadoop-env.sh look for
export HADOOP_OPTS=”$HADOOP_OPTS -Djava.net.preferIPv4Stack=true”
Replace it with, remember to change the <JDK_VERSION> with what you currently have.
export HADOOP_OPTS=”$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc=”export JAVA_HOME=”/Library/Java/JavaVirtualMachines/<JDK_VERSION>/Contents/Home”
In core-site.xml, you will configure the HDFS address and port number.
<!– Put site-specific property overrides in this file. –><configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/Cellar/hadoop/hdfs/tmp</value> <description>A base for other temporary directories</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:8020</value> </property></configuration>
In mapred-site.xml you will configure the jobtracker address and port number in map-reduce. If you cannot find mapred-site.xml, copy from mapred-site.xml.template first.
$ sudo cp mapred-site.xml.template mapred-site.xml
Add the following into mapred-site.xml .
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property></configuration>
In hdfs-site.xml , set the dfs.replication from the default value of 1 to 3.
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>
Finally, the last step before starting to launch the different services would be to format the HDFS.
$ cd /usr/local/opt/hadoop$ hdfs namenode -format
Go to /usr/local/opt/hadoop/sbin , there you can use the following scripts
# To start HDFS service$ ./start-dfs.sh# To stop HDFS service$ ./stop-dfs.sh
Next, go to your browser to visit the link http://localhost:50070 . You should see the following page
Hadoop running successfully on http://localhost:50070
MapReduce Framework Services
Remember to go execute all of this in /usr/local/opt/hadoop/sbin
# To start Yarn$ ./start-yarn.sh# To stop Yarn$ ./stop-yarn.sh
To check that Yarn is working properly, you can visit http://localhost:8088 to see that it is running well.
MapReduce Framework running properly
Constantly having to go and type each command to start and stop services can be bother some, so instead you can actually use
# to start all services$ ./start-all.sh# to stop all services$ ./stop-all.sh
Lastly, you can add the environment variables to /etc/profile
export HADOOP_HOME=”/usr/local/opt/hadoop”export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
With this, you can now use the scripts anywhere. For example,
# you can use this in any directory$ start-dfs.sh$ stop-dfs.sh$ start-yarn.sh$ stop-yarn.sh
After installation is done, you can also check out some simple operations that can be done on hdfs.