Getting started with Hivemall on Docker
This page introduces how to run Hivemall on Docker.
Caution
This docker image contains a single-node Hadoop enviroment for evaluating Hivemall. Not suited for production uses.
Requirements
- Docker Engine 1.6+
- Docker Compose 1.10+
Build image
You have two options in order to build a hivemall docker image:
Using docker-compose
$ docker-compose -f resources/docker/docker-compose.yml build
Using docker command
$ docker build -f resources/docker/Dockerfile .
Note
You can skip building images if you try to use a pre-build docker image from Docker Hub. However, since the Docker Hub repository is experimental one, the distributed image is NOT built on the "latest" commit in our master branch.
Run container
If you built an image by yourself, it can be launched by either docker-compose or docker command:
By docker-compose
$ docker-compose -f resources/docker/docker-compose.yml up -d && docker attach hivemall
You can edit resources/docker/docker-compose.yml as needed.
For example, setting volumes options enables to mount your local directories to the container as follows:
volumes:
  - "../../:/opt/hivemall/" # mount current hivemall dir to `/opt/hivemall` ($HIVEMALL_PATH) on the container
  - "/path/to/data/:/root/data/" # mount resources to container-side  `/root/data` directory
By docker command
Find a local docker image by docker images, and hit:
$ docker run -p 8088:8088 -p 50070:50070 -p 19888:19888 -it ${docker_image_id}
Refer Docker reference for the command detail.
Similarly to the volumes option in the docker-compose file, docker run has --volume (-v) option: 
$ docker run ... -v /path/to/local/hivemall:/opt/hivemall
Running pre-built Docker image in Docker Hub
Caution
This part is experimental. Hivemall in the pre-built image might be out-of-date compared to the latest version in our master branch.
You can find pre-built Hivemall docker images in this repository.
- Check the latest tag first
- Pull pre-build docker image from Docker Hub: $ docker pull hivemall/latest:20170517
- Launch the pre-build image:$ docker run -p 8088:8088 -p 50070:50070 -p 19888:19888 -it hivemall/latest:20170517
Run Hivemall on Docker
- Type hiveto run (.hivercautomatically loads Hivemall functions)
- Try your Hivemall queries!
Accessing Hadoop management GUIs
- YARN http://localhost:8088/
- HDFS http://localhost:50070/
- MR jobhistory server http://localhost:19888/
Note that you need to expose local ports e.g., by -p 8088:8088 -p 50070:50070 -p 19888:19888 on running docker image.
Load data into HDFS (optional)
You can find an example script to load data into HDFS in $HOME/bin/prepare_iris.sh.
  The script loads iris dataset into iris database:
# cd $HOME && ./bin/prepare_iris.sh
# hive
hive> use iris;
hive> select * from iris_raw limit 5;
OK
1       Iris-setosa     [5.1,3.5,1.4,0.2]
2       Iris-setosa     [4.9,3.0,1.4,0.2]
3       Iris-setosa     [4.7,3.2,1.3,0.2]
4       Iris-setosa     [4.6,3.1,1.5,0.2]
5       Iris-setosa     [5.0,3.6,1.4,0.2]
Once you prepared the iris database, you are ready to move on to our multi-class classification tutorial.
Build Hivemall (optional)
In the container, Hivemall resource is stored in $HIVEMALL_PATH.
You can build Hivemall package by cd $HIVEMALL_PATH && ./bin/build.sh.