The general flow for setting up a Kerberized environment is as follows:
-
Create keytab files for services (Vertica and HDFS)
-
Create /etc/krb5.conf files for services
-
Kerberize services
On the client side:
-
Setup JAAS configuration file
-
Pass in Kerberos options to the connector
We will start with the service configuration. Click here to go to the Client configuration guide. Note that the steps are meant to be followed in order.
Depending on whether you use Linux KDC or Active Directory as your KDC, you should follow one of the two following guides. In both guides, we assume that the Kerberos realm is example.com.
On CentOS, yum install krb5-server krb5-libs krb5-workstation (These packages will be different depending on which Linux distro you use)
In the /etc folder, create a file named krb5.conf with the following contents:
[logging]
default = FILE:/var/logkrb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
[realms]
EXAMPLE.COM = {
kdc = localhost
admin_server = localhost
}
[domain_realm]
.example.com = EXAMPLE.COM
example.com = EXAMPLE.COM
Neither Vertica or the connector will automatically renew Kerberos tickets. You should make sure the ticket lifetime is greater than a given Spark operation will take to complete, or have your own mechanism for renewing such tickets.
Run the following commands:
kdb5_util -P 'admin' create
systemctl start kadmin.service
systemctl start krb5kdc.service
# Create admin
kadmin.local -q "addprinc -pw admin admin/admin"
echo "*/[email protected] *" | tee -a /var/kerberos/krb5kdc/kadm5.acl
# Add user principal. Here, we have decided to create a user named user1 with password user1.
kadmin.local -q "addprinc -pw user1 user1"
kadmin.local -q "addprinc -randkey vertica/[email protected]"
kadmin.local -q "ktadd -norandkey -k vertica.keytab vertica/[email protected]"
# Create service principals and keytab files for Vertica and HDFS
kadmin.local -q "addprinc -randkey hdfs/[email protected]"
kadmin.local -q "ktadd -norandkey -k hdfs.keytab hdfs/[email protected]"
Before moving on to the next step, copy the newly created vertica.keytab file to the / directory on the Vertica host. Copy the hdfs.keytab file to the /root directory on the HDFS host.
Run the following powershell script to create the Vertica and HDFS service users and keytab files:
# Remove vertica user if it exists already
dsrm "CN=vertica,CN=Users,DC=example,DC=com" -noprompt
# Create service user named vertica in the example.com realm
dsadd user "CN=vertica,CN=Users,DC=example,DC=com" -pwd $$V3rtic4$$
# Add the service principal name to a user by adding the ldap attribute to the cn of the user. The SPN will be vertica/vertica.example.com.
setspn -a vertica/vertica.example.com vertica
# Export keytab as vertica.keytab and map user
ktpass /princ vertica/[email protected] /out vertica.keytab /mapuser example\vertica /crypto ALL /pass $$V3rtic4$$ /ptype KRB5_NT_PRINCIPAL
# Remove hdfs user if it exists already
dsrm "CN=hdfs,CN=Users,DC=example,DC=com" -noprompt
# Create service user named hdfs in the example.com realm
dsadd user "CN=hdfs,CN=Users,DC=example,DC=com" -pwd $$V3rtic4$$
# Add the service principal name to a user by adding the ldap attribute to the cn of the user. The SPN will be hdfs/hdfs.example.com.
setspn -a hdfs/hdfs.example.com hdfs
# Export keytab as hdfs.keytab and map user
ktpass /princ hdfs/[email protected] /out hdfs.keytab /mapuser example\hdfs /crypto ALL /pass $$V3rtic4$$ /ptype KRB5_NT_PRINCIPAL
Before moving on to the next step, copy the newly created vertica.keytab file to the / directory on the Vertica host. Copy the hdfs.keytab file to the /root directory on the HDFS host. Also, create a user principal for testing Kerberos connections later. Here, we have decided to create a user named "user1".
dsadd user "CN=user1,CN=Users,DC=example,DC=com" -pwd $$V3rtic4$$
In the /etc folder, create a file named krb5.conf with the following contents:
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
[realms]
EXAMPLE.COM = {
kdc = kerberos.example.com
admin_server = kerberos.example.com
}
[domain_realm]
.example.com = EXAMPLE.COM
example.com = EXAMPLE.COM
Note that kerberos.example.com is the hostname of our KDC. Yours may be different, so change this accordingly.
Execute the following SQL commands in vsql:
ALTER DATABASE exampledb SET KerberosServiceName = 'vertica';
ALTER DATABASE exampledb SET KerberosHostName = 'vertica.example.com';
ALTER DATABASE exampledb SET KerberosRealm = 'EXAMPLE.COM';
ALTER DATABASE exampledb SET KerberosKeytabFile = '/vertica.keytab';
CREATE USER user1;
GRANT ALL ON SCHEMA PUBLIC TO user1;
CREATE AUTHENTICATION kerberos METHOD 'gss' HOST '0.0.0.0/0';
ALTER AUTHENTICATION kerberos enable;
GRANT AUTHENTICATION kerberos TO user1;
CREATE AUTHENTICATION debug METHOD 'trust' HOST '0.0.0.0/0';
ALTER AUTHENTICATION debug enable;
GRANT AUTHENTICATION debug TO dbadmin;
Restart the Vertica database:
/opt/vertica/bin/adminTools -t stop_db -d exampledb
/opt/vertica/bin/adminTools -t start_db -d exampledb
Make sure that the dbadmin user owns the keytab file: chown dbadmin /vertica.keytab
In the /etc folder, create a file named krb5.conf with the following contents:
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
[realms]
EXAMPLE.COM = {
kdc = kerberos.example.com
admin_server = kerberos.example.com
}
[domain_realm]
.example.com = EXAMPLE.COM
example.com = EXAMPLE.COM
Update core-site.xml to set the following properties:
hadoop.security.authentication = kerberos
hadoop.security.authorization = truehadoop.security.auth_to_local =
RULE:[2:$1/$2@$0](.*/.*@EXAMPLE.COM)s/.*/root/
DEFAULT
hadoop.rpc.protection = authentication
hadoop.proxyuser.root.groups = *
hadoop.proxyuser.root.hosts = *
hadoop.proxyuser.superuser.hosts = *
hadoop.proxyuser.superuser.groups = *
Below is an example core-site.xml file that we have successfully tested Kerberos with:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdfs.example.com:8020</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[2:$1/$2@$0](.*/.*@EXAMPLE.COM)s/.*/root/
DEFAULT
</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.superuser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.superuser.groups</name>
<value>*</value>
</property>
</configuration>
Update hdfs-site.xml to set the following properties:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>hdfs.example.com:50090</value>
<description>
The secondary namenode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.address</name>
<value>hdfs.example.com:50010</value>
<description>
The address where the datanode server will listen to.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>hdfs.example.com:50075</value>
<description>
The datanode http server address and port.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.data.transfer.protection</name>
<value>authentication</value>
</property>
<property>
<name>dfs.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>dfs.http.address</name>
<value>hdfs.example.com:50070</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.namenode.https-address</name>
<value>hdfs.example.com:50070</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>hdfs</value>
<description>
The actual address the RPC server will bind to. If this optional address is
set, it overrides only the hostname portion of dfs.namenode.rpc-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node listen on all interfaces by
setting it to 0.0.0.0.
</description>
</property>
<property>
<name>dfs.namenode.servicerpc-bind-host</name>
<value>hdfs</value>
<description>
The actual address the service RPC server will bind to. If this optional address is
set, it overrides only the hostname portion of dfs.namenode.servicerpc-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node listen on all interfaces by
setting it to 0.0.0.0.
</description>
</property>
<property>
<name>dfs.namenode.http-bind-host</name>
<value>hdfs</value>
<description>
The actual adress the HTTP server will bind to. If this optional address
is set, it overrides only the hostname portion of dfs.namenode.http-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node HTTP server listen on all
interfaces by setting it to 0.0.0.0.
</description>
</property>
<property>
<name>dfs.namenode.https-bind-host</name>
<value>hdfs</value>
<description>
The actual adress the HTTPS server will bind to. If this optional address
is set, it overrides only the hostname portion of dfs.namenode.https-address.
It can also be specified per name node or name service for HA/Federation.
This is useful for making the name node HTTPS server listen on all
interfaces by setting it to 0.0.0.0.
</description>
</property>
<!-- Enable NFS -->
<!-- https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html -->
<property>
<name>dfs.namenode.accesstime.precision</name>
<value>3600000</value>
<description>
The access time for HDFS file is precise up to this value.
The default value is 1 hour. Setting a value of 0 disables
access times for HDFS.
</description>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/[email protected]</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/root/hdfs.keytab</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>hdfs/[email protected]</value>
</property>
<property>
<name>dfs.web.authentication.keytab.file</name>
<value>/root/hdfs.keytab</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/[email protected]</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/root/hdfs.keytab</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.nfs3.dump.dir</name>
<value>/data/hdfs-nfs/</value>
</property>
<property>
<name>dfs.nfs.exports.allowed.hosts</name>
<value>* rw</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
</configuration>
In order for Vertica to communicate with HDFS, you will need to make HDFS an SSL server and Vertica an SSL client.
On the HDFS host, in the same directory as the other HDFS configuration files, create/update ssl-server.xml with the following properties:
<configuration>
<property>
<name>ssl.server.keystore.keypassword</name>
<value>password</value>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value>password</value>
</property>
<property>
<name>ssl.server.keystore.location</name>
<value>/root/.keystore</value>
</property>
</configuration>
Still on the HDFS host, run the following commands:
keytool -genkey -keyalg RSA -alias hdfs -keystore /root/.keystore -validity 500 -keysize 2048
keytool -export -alias hdfs -keystore /root/.keystore -rfc -file hdfs.cert
Copy hdfs.cert to the / directory on the Vertica host.
On the Vertica host, run the following commands:
mkdir -p /etc/pki/tls/certs/
cp /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt
cat /hdfs.cert >> /etc/pki/tls/certs/ca-bundle.crt
In the /etc folder, create a file named krb5.conf with the following contents:
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
[realms]
EXAMPLE.COM = {
kdc = kerberos.example.com
admin_server = kerberos.example.com
}
[domain_realm]
.example.com = EXAMPLE.COM
example.com = EXAMPLE.COM
Add -Djava.security.auth.login.config=/jaas.config
to your JAVA_OPTS environment variable.
Create the JAAS config in the / directory (the path specified in the previous step):
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=false
useTicketCache=true
doNotPrompt=true;
};
Since we specified in the JAAS config to use the ticket cache, we will need to run kinit user1. When prompted for the password, enter "user1", as we specified earlier when creating the user principal.
Run your client application code passing in the following options to Spark:
user="user1"
kerberos_service_name="vertica"
kerberos_host_name="vertica.example.com"
jaas_config_name="Client"
Note: This is not a complete list. You will still need to pass in the host name, db, port, etc.
If Kerberos is properly configured, then you'll be able to successfully read/write from/to Vertica without needing to provide a password.