To create a working Plumb installation you’ll need a YARN/HDFS cluster (comes with Apache Hadoop, you’ll need version 3.1 or above) and a MySQL server (MariaDB is recommended). Please refer to the Hadoop/MariaDB documentation on how to install them.
In order to run Plumb you’ll need to set up two daemons:
HLE (Hard-Link Emulation) daemon. Written in Python. We recommend using a Python virtual environment to install dependencies and run this daemon. In this guide we’ll refer to it as hled.
Plumb Application Server. Written in Java. We’ll refer to it as plumbd. It can be run on the same or different system as hled.
For simplicity, we’ll run both daemons under the same user id as the one configured to run Hadoop.
In this document, we’ll install code under /opt/plumb
:
mkdir /opt/plumb
cd /opt/plumb
wget https://ant.isi.edu/software/plumb/plumb-1.5.2.tar.gz
tar zxvf plumb-1.52.tar.gz
The above will create /opt/plumb/lander_hard_link_emulation
and /opt/plumb/BigDataProcessing
directories containing HLE and Plumb codebases respectively.
A dedicated MySQL database that will house all of HLE records must be created. Run the following script on the server running the database giving three parameters:
lander_hard_link_emulation/scripts/create_database.sh <dbname> <username> <password>
When prompted, enter the server’s mysql root-user password.
We recommend installing HLED under python3 virtual environment. To set it up, you can run:
lander_hard_link_emulation/scripts/create_venv.sh [VENV_PATH]
This will create a virtual environment under VENV_PATH
given (VENV_PATH
defaults to ./venv
).
If you’re installing under (existing directory) /opt/plumb
, you’ll run:
cd /opt/plumb
lander_hard_link_emulation/scripts/create_venv.sh
Copy and edit the HLED system configuration:
cp lander_hard_link_emulation/wip/conf/sysConfInfo.py-template lander_hard_link_emulation/wip/conf/sysConfInfo.py
#edit lander_hard_link_emulation/wip/conf/sysConfInfo.py
Edit the values marked xxx to record the details of your installation. In particular, update the values
of <dbname>
, <username>
, and <password>
to the values you entered when you
created the database above.
Copy and edit the HLED (admin) user configuration:
cp lander_hard_link_emulation/wip/conf/usrConfInfo.py-template lander_hard_link_emulation/wip/conf/usrConfInfo.py
#edit lander_hard_link_emulation/wip/conf/usrConfInfo.py
Edit the values marked xxx to record the details of your installation. (TODO: add more details here)
Run:
cd lander_hard_link_emulation/wip/conf/
VENV_PATH/bin/python3 sysInit.py
You’ll need to use the value for VENV_PATH
used when creating the virtual environment.
The system and users need keys to use the service. Key generation is described in a separate key generation document. All individual users of PLUMB as well as the admin user must have keys to be able to use the system.
Copy and edit the plumbd configuration:
cp BigDataProcessing/JobServer/ApplicationServer/config-template BigDataProcessing/JobServer/ApplicationServer/config
#edit BigDataProcessing/JobServer/ApplicationServer/config
Edit the values marked xxx to record the details of your installation.
These commands build JARs necessary to run plumbd:
cd BigDataProcessing/JobServer/ApplicationServer
make
(This assumes you have java-jdk, maven, xxx-anything-else? installed OR should we distribute pre-built code?)
We recommend using systemd
to start/stop hled and plumbd. Templates for these daemons are included in the distribution.
First, edit systemd/hled.service
and systemd/plumbd.service
files and fill in user and group names to those used to run
plumb (we recommend using the same user/group as that of the Yarn/Hdfs superuser). When this is done:
Put them into place:
sudo cp systemd/*.service /etc/systemd/system/
Reload systemd list of daemons:
sudo systemctl daemon-reload
Start HLE and Plumb daemons using systemd:
sudo systemctl start hled sudo systemctl start plumbd