Install Hadoop On Windows Without Cygwin Commands

In this article I will elaborate on steps to install single not prseudo-distribution of Hadoop (or local hadoop cluster with Yarn, Namenode, Datanode & HDFS). This site states typical installation guide which requires compilation of Hadoop from source. Preparing to Install Hadoop. Once you have a server ready (multiple servers if you want to install an actual Hadoop cluster) you’ll want to install Cygwin. If you aren’t familiar, Cygwin is basically a Linux bash shell for Windows. Cygwin is open source and is available as a free download from.

This article will show you how to install Hadoop and Hive in Windows 7. Since information on installing Hadoop in Windows 7 without Cloudera is relatively rare, so I thought I’d write up on this. Let’s check out the softwares that we need for Hadoop installation – Supported Windows OSs: Hadoop supports Windows Server 2008 and Windows Server 2008 R2, Windows Vista and Windows 7. For installation purposes we are going to make use of Windows 7 Professional Edition, SP 1. Microsoft Windows SDK: Download and install Microsoft Windows SDK v7.1 to get the tools, compilers, headers and libraries that are necessary to run Hadoop.

Cygwin: Download and install Unix command-line tool Cygwin to run the unix commands on Windows as per your 32-bit or 64-bit windows. Cygwin is a distribution of popular GNU and other Open Source tools running on Microsoft Windows. Maven: Download and install Maven 3.1.1. The installation of Apache Maven is a simple process of extracting the archive and adding the bin folder with the mvn command to the PATH.

Open a new command prompt and run “mvn -v” command to verify the installation as below – Protocol Buffers 2.5.0: Download Google’s Download and extract to a folder in C drive. Version should strictly be 2.5.0 for installing Hive. Setting environment variables: Check environment variable value from command prompt, e.g. Echo%JAVAHOME% C: Program Files Java jdk1.7.051 If nothing is shown on executing the above command, you need to set the JAVAHOME path variable. Go to My Computer right-click Properties Advanced System settings System Properties Advanced tab Environment Variables button. Click on ‘New’ button under System Variables and add – Variable Name: JAVAHOME Variable Value: C: Program Files Java jdk1.7.051 Note: Edit the Path environment variable very carefully. Select the whole path and go to the end to append the new environment variables.

Deleting any path variable may lead to non functioning of some programs. Adding to PATH: Add the unpacked distribution’s bin directory to your user PATH environment variable by opening up the My Computer right-click Properties Advanced System settings System Properties Advanced tab Environment Variables button, then adding or selecting the PATH variable in the ‘System variables’ with the value C: Program Files apache-maven-3.3.9 bin Edit Path variable to add bin directory of Cygwin (say C: cygwin64 bin), bin directory of Maven (say C: Program Files apache-maven-3.3.9 bin) and installation path of Protocol Buffers (say C: protoc-2.5.0-win32). Download and install CMake: Download and install CMake (Windows Installer) from here. Official Apache Hadoop releases do not include Windows binaries, so you have to download sources and build a Windows package yourself. Download Hadoop sources tarball hadoop-2.6.4-src.tar.gz and extract to a folder having short path (say C: hdp) to avoid runtime problem due to maximum path length limitation in Windows. Note: Do not use the Hadoop binary, as it is bereft of Windowsutils.exe and some Hadoop.dll files. Native IO is mandatory on Windows and without it the Hadoop installation will not work on Windows.

Instead, build from the source code using Maven. It will download all the required components. For building Hadoop from Native IO Source code, 1. Extract hadoop-2.2.0.tar.gz to a folder (say C: hdp) 2.

Add Environment Variable HADOOPHOME=“C: hdp hadoop-2.6.4-src” and edit Path variable to add bin directory of HADOOPHOME, e.g. C: hdp hadoop-2.6.4-src bin.

Before moving to the next step make sure you have the following variables set in your Environment variables window. JAVAHOME = C: hdp Java jdk1.7.065 PATH = C: Windows Microsoft.NET Framework64 v4.0.30319;C: Program Files CMake bin;C: protoc-2.5.0-win32;C: Program Files apache-maven-3.3.9 bin;C: cygwin64 bin;C: cygwin64 usr sbin;%JAVAHOME% bin;C: WINDOWS system32;C: WINDOWS;C: WINDOWS System32 Wbem;C: WINDOWS system32 WindowsPowerShell v1.0; Note: If the JAVAHOME environment variable is set improperly, Hadoop will not run. Set environment variables properly for JDK, Maven, Cygwin and Protocol-buffer. If you still get a JAVAHOME not set properly error, then edit the”C: hadoop bin hadoop-env.cmd” file, locate set JAVAHOME = and provide the JDK path (with no spaces).

Running Maven Package Select Start – All Programs – Microsoft Windows SDK v7.1(as an administrator) and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (C: hdp hadoop-2.6.4-src). Execute maven package with options -Pdist, native-win -DskipTests -Dtar to create Windows binary tar distribution by executing the below command. Mvn package -Pdist,native-win -DskipTests -Dtar You will get a long list of commands running for some time while the build process is running. When build will be successful, you will get the screen like below – If everything goes well in the previous step, then the native distribution hadoop-2.2.0.tar.gz will be created inside C: hdp hadoop-dist target hadoop-2.6.4 directory. Extract the newly created Hadoop Windows package to the directory of choice (eg.

C: hdp hadoop-2.6.4) Testing and Configuring Hadoop Installation 1. Configuring Hadoop for a Single Node (pseudo-distributed) Cluster. As part of configuring HDFS, update the files: 1. Near the end of “C: hdp hadoop-2.6.4 etc hadoop hadoop-env.cmd” add following lines: set HADOOPPREFIX=C: hdp hadoop-2.6.4 set HADOOPCONFDIR=%HADOOPPREFIX% etc hadoop set YARNCONFDIR=%HADOOPCONFDIR% set PATH=%PATH%;%HADOOPPREFIX% bin 2.

Modify “C: hdp hadoop-2.6.4 etc hadoop core-site.xml” with following: fs.default.name hdfs://0.0.0.0:19000 3. Modify “C: hdp hadoop-2.6.4 etc hadoop hdfs-site.xml” with: dfs.replication 1 4. Finally, make sure “C: hdp hadoop-2.6.4 etc hadoop slaves” has the following entry: localhost Create C: tmp directory as the default configuration puts HDFS metadata and data files under tmp on the current drive. As part of configuring YARN, update files: 1.

Add following entries to “C: hdp hadoop-2.6.4 etc hadoop mapred-site.xml”, replacing%USERNAME% with your Windows user name: mapreduce.job.user.name%USERNAME% mapreduce.framework.name yarn yarn.apps.stagingDir /user/%USERNAME%/staging mapreduce.jobtracker.address local 2.

Build and Install Hadoop 2.x or newer on Windows 1. Introduction Hadoop version 2.2 onwards includes native support for Windows. The official Apache Hadoop releases do not include Windows binaries (yet, as of January 2014). However building a Windows package from the sources is fairly straightforward. Hadoop is a complex system with many components. Some familiarity at a high level is helpful before attempting to build or install it or the first time. Familiarity with Java is necessary in case you need to troubleshoot.

Building Hadoop Core for Windows 2.1. Choose target OS version The Hadoop developers have used Windows Server 2008 and Windows Server 2008 R2 during development and testing. Windows Vista and Windows 7 are also likely to work because of the Win32 API similarities with the respective server SKUs. We have not tested on Windows XP or any earlier versions of Windows and these are not likely to work.

Any issues reported on Windows XP or earlier will be closed as Invalid. Do not attempt to run the installation from within Cygwin.

Cygwin is neither required nor supported. Choose Java Version and set JAVAHOME Oracle JDK versions 1.7 and 1.6 have been tested by the Hadoop developers and are known to work. Make sure that JAVAHOME is set in your environment and does not contain any spaces.

If your default Java installation directory has spaces then you must use the instead e.g. C: Progra1 Java. Instead of c: Program Files Java. Getting Hadoop sources The current stable release as of August 2014 is 2.5.

The source distribution can be retrieved from the ASF download server or using subversion or git. From the or a mirror.

Subversion URL:. Git repository URL: git://git.apache.org/hadoop-common.git. After downloading the sources via git, switch to the stable 2.5 using git checkout branch-2.5, or use the appropriate branch name if you are targeting a newer version. Installing Dependencies and Setting up Environment for Building The file in the root of the source tree has detailed information on the list of requirements and how to install them. It also includes information on setting up the environment and a few quirks that are specific to Windows. It is strongly recommended that you read and understand it before proceeding.

A few words on Native IO support Hadoop on Linux includes optional Native IO support. However Native IO is mandatory on Windows and without it you will not be able to get your installation working. You must follow all the instructions from BUILDING.txt to ensure that Native IO support is built correctly. Build and Copy the Package files To build a binary distribution run the following command from the root of the source tree. Mvn package -Pdist,native-win -DskipTests -Dtar Note that this command must be run from a Windows SDK command prompt as documented in BUILDING.txt.

A successful build generates a binary hadoop.tar.gz package in hadoop-dist target. The Hadoop version is present in the package file name. If you are targeting a different version then the package name will be different. Installation Pick a target directory for installing the package.

We use c: deploy as an example. Extract the tar.gz file (e.g. Hadoop-2.5.0.tar.gz) under c: deploy.

This will yield a directory structure like the following. If installing a multi-node cluster, then repeat this step on every node.

C: deploydir Volume in drive C has no label. Volume Serial Number is 9D1F-7BAC Directory of C: deploy 08:11 AM. 08:28 AM bin 08:28 AM etc 08:28 AM include 08:28 AM libexec 08:28 AM sbin 08:28 AM share 0 File(s) 0 bytes 3. Starting a Single Node (pseudo-distributed) Cluster This section describes the absolute minimum configuration required to start a Single Node (pseudo-distributed) cluster and also run an example job.

Example HDFS Configuration Before you can start the Hadoop Daemons you will need to make a few edits to configuration files. The configuration file templates will all be found in c: deploy etc hadoop, assuming your installation directory is c: deploy. First edit the file hadoop-env.cmd to add the following lines near the end of the file.

Set HADOOPPREFIX=c: deploy set HADOOPCONFDIR=%HADOOPPREFIX% etc hadoop set YARNCONFDIR=%HADOOPCONFDIR% set PATH=%PATH%;%HADOOPPREFIX% bin Edit or create the file core-site.xml and make sure it has the following configuration key: fs.default.name hdfs://0.0.0.0:19000 Edit or create the file hdfs-site.xml and add the following configuration key: dfs.replication 1 Finally, edit or create the file slaves and make sure it has the following entry: localhost The default configuration puts the HDFS metadata and data files under tmp on the current drive. In the above example this would be c: tmp. For your first test setup you can just leave it at the default. Example YARN Configuration Edit or create mapred-site.xml under%HADOOPPREFIX% etc hadoop and add the following entries, replacing%USERNAME% with your Windows user name.