Skip to content

Latest commit

 

History

History
374 lines (320 loc) · 17 KB

README.md

File metadata and controls

374 lines (320 loc) · 17 KB

3FS Setup Guide

This section provides a manual deployment guide for setting up a six-node cluster with the cluster ID stage.

Installation prerequisites

Hardware specifications

Node OS IP Memory SSD RDMA
meta Ubuntu 22.04 192.168.1.1 128GB - RoCE
storage1 Ubuntu 22.04 192.168.1.2 512GB 14TB × 16 RoCE
storage2 Ubuntu 22.04 192.168.1.3 512GB 14TB × 16 RoCE
storage3 Ubuntu 22.04 192.168.1.4 512GB 14TB × 16 RoCE
storage4 Ubuntu 22.04 192.168.1.5 512GB 14TB × 16 RoCE
storage5 Ubuntu 22.04 192.168.1.6 512GB 14TB × 16 RoCE

RDMA Configuration

  1. Assign IP addresses to RDMA NICs. Multiple RDMA NICs (InfiniBand or RoCE) are supported on each node.
  2. Check RDMA connectivity between nodes using ib_write_bw.

Third-party dependencies

In production environment, it is recommended to install FoundationDB and ClickHouse on dedicated nodes.

Service Node
ClickHouse meta
FoundationDB meta

FoundationDB

  1. Ensure that the version of FoundationDB client matches the server version, or copy the corresponding version of libfdb_c.so to maintain compatibility.
  2. Find the fdb.cluster file and libfdb_c.so at /etc/foundationdb/fdb.cluster, /usr/lib/libfdb_c.so on nodes with FoundationDB installed.

Step 0: Build 3FS

Follow the instructions to build 3FS. Binaries can be found in build/bin.

Services and clients

The following steps show how to install 3FS services in /opt/3fs/bin and the config files in /opt/3fs/etc.

Service Binary Config files NodeID Node
monitor monitor_collector_main monitor_collector_main.toml - meta
admin_cli admin_cli admin_cli.toml
fdb.cluster
- meta
storage1
storage2
storage3
storage4
storage5
mgmtd mgmtd_main mgmtd_main_launcher.toml
mgmtd_main.toml
mgmtd_main_app.toml
fdb.cluster
1 meta
meta meta_main meta_main_launcher.toml
meta_main.toml
meta_main_app.toml
fdb.cluster
100 meta
storage storage_main storage_main_launcher.toml
storage_main.toml
storage_main_app.toml
10001~10005 storage1
storage2
storage3
storage4
storage5
client hf3fs_fuse_main hf3fs_fuse_main_launcher.toml
hf3fs_fuse_main.toml
- meta

Step 1: Create ClickHouse tables for metrics

Import the SQL file into ClickHouse:

clickhouse-client -n < ~/3fs/deploy/sql/3fs-monitor.sql

Step 2: Monitor service

Install monitor_collector service on the meta node.

  1. Copy monitor_collector_main to /opt/3fs/bin and config files to /opt/3fs/etc, and create log directory /var/log/3fs.
    mkdir -p /opt/3fs/{bin,etc}
    mkdir -p /var/log/3fs
    cp ~/3fs/build/bin/monitor_collector_main /opt/3fs/bin
    cp ~/3fs/configs/monitor_collector_main.toml /opt/3fs/etc
  2. Update monitor_collector_main.toml to add a ClickHouse connection:
    [server.monitor_collector.reporter]
    type = 'clickhouse'
    
    [server.monitor_collector.reporter.clickhouse]
    db = '3fs'
    host = '<CH_HOST>'
    passwd = '<CH_PASSWD>'
    port = '<CH_PORT>'
    user = '<CH_USER>'
  3. Start monitor service:
    cp ~/3fs/deploy/systemd/monitor_collector_main.service /usr/lib/systemd/system
    systemctl start monitor_collector_main

Note that

  • Multiple instances of monitor services can be deployed behind a virtual IP address to share the traffic.
  • Other services communicate with the monitor service over a TCP connection.

Step 3: Admin client

Install admin_cli on all nodes.

  1. Copy admin_cli to /opt/3fs/bin and config files to /opt/3fs/etc.
    mkdir -p /opt/3fs/{bin,etc}
    rsync -avz meta:~/3fs/build/bin/admin_cli /opt/3fs/bin
    rsync -avz meta:~/3fs/configs/admin_cli.toml /opt/3fs/etc
    rsync -avz meta:/etc/foundationdb/fdb.cluster /opt/3fs/etc
  2. Update admin_cli.toml to set cluster_id and clusterFile:
    cluster_id = "stage"
    
    [fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'

The full help documentation for admin_cli can be displayed by running the following command:

/opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml help

Step 4: Mgmtd service

Install mgmtd service on meta node.

  1. Copy mgmtd_main to /opt/3fs/bin and config files to /opt/3fs/etc.

    cp ~/3fs/build/bin/mgmtd_main /opt/3fs/bin
    cp ~/3fs/configs/{mgmtd_main.toml,mgmtd_main_launcher.toml,mgmtd_main_app.toml} /opt/3fs/etc
  2. Update config files:

    cluster_id = "stage"
    
    [fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.1.1:10000"
  3. Initialize the cluster:

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "init-cluster --mgmtd /opt/3fs/etc/mgmtd_main.toml 1 1048576 16"

    The parameters of admin_cli:

    • 1 the chain table ID
    • 1048576 the chunk size in bytes
    • 16 the file strip size

    Run help init-cluster for full documentation.

  4. Start mgmtd service:

    cp ~/3fs/deploy/systemd/mgmtd_main.service /usr/lib/systemd/system
    systemctl start mgmtd_main
  5. Run list-nodes command to check if the cluster has been successfully initialized:

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"

If multiple instances of mgmtd services deployed, one of the mgmtd services is elected as the primary; others are secondaries. Automatic failover occurs when the primary fails.


Step 5: Meta service

Install meta service on meta node.

  1. Copy meta_main to /opt/3fs/bin and config files to /opt/3fs/etc.
    cp ~/3fs/build/bin/meta_main /opt/3fs/bin
    cp ~/3fs/configs/{meta_main_launcher.toml,meta_main.toml,meta_main_app.toml} /opt/3fs/etc
  2. Update config files:
    cluster_id = "stage"
    
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    [server.mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.1.1:10000"
    
    [server.fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'
  3. Config file of meta service is managed by mgmtd service. Use admin_cli to upload the config file to mgmtd:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type META --file /opt/3fs/etc/meta_main.toml"
  4. Start meta service:
    cp ~/3fs/deploy/systemd/meta_main.service /usr/lib/systemd/system
    systemctl start meta_main
  5. Run list-nodes command to check if meta service has joined the cluster:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"

If multiple instances of meta services deployed, meta requests will be evenly distributed to all instances.


Step 6: Storage service

Install storage service on storage node.

  1. Format the attached 16 SSDs as XFS and mount at /storage/data{1..16}, then create data directories /storage/data{1..16}/3fs and log directory /var/log/3fs.
    mkdir -p /storage/data{1..16}
    mkdir -p /var/log/3fs
    for i in {1..16};do mkfs.xfs -L data${i} /dev/nvme${i}n1;mount -o noatime,nodiratime -L data${i} /storage/data${i};done
    mkdir -p /storage/data{1..16}/3fs
  2. Increase the max number of asynchronous aio requests:
    sysctl -w fs.aio-max-nr=67108864
  3. Copy storage_main to /opt/3fs/bin and config files to /opt/3fs/etc.
    rsync -avz meta:~/3fs/build/bin/storage_main /opt/3fs/bin
    rsync -avz meta:~/3fs/configs/{storage_main_launcher.toml,storage_main.toml,storage_main_app.toml} /opt/3fs/etc
  4. Update config files:
    cluster_id = "stage"
    
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    [server.mgmtd]
    mgmtd_server_address = ["RDMA://192.168.1.1:8000"]
    
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.1.1:10000"
    
    [server.targets]
    target_paths = ["/storage/data1/3fs","/storage/data2/3fs","/storage/data3/3fs","/storage/data4/3fs","/storage/data5/3fs","/storage/data6/3fs","/storage/data7/3fs","/storage/data8/3fs","/storage/data9/3fs","/storage/data10/3fs","/storage/data11/3fs","/storage/data12/3fs","/storage/data13/3fs","/storage/data14/3fs","/storage/data15/3fs","/storage/data16/3fs",]
  5. Config file of storage service is managed by mgmtd service. Use admin_cli to upload the config file to mgmtd:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type STORAGE --file /opt/3fs/etc/storage_main.toml"
  6. Start storage service:
    rsync -avz meta:~/3fs/deploy/systemd/storage_main.service /usr/lib/systemd/system
    systemctl start storage_main
  7. Run list-nodes command to check if storage service has joined the cluster:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-nodes"
    

Step 7: Create admin user, storage targets and chain table

  1. Create an admin user:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "user-add --root --admin 0 root"
    The admin token is printed to the console, save it to /opt/3fs/etc/token.txt.
  2. Generate admin_cli commands to create storage targets on 5 storage nodes (16 SSD per node, 6 targets per SSD).
    • Follow instructions at here to install Python packages.
    pip install -r ~/3fs/deploy/data_placement/requirements.txt
    python ~/3fs/deploy/data_placement/src/model/data_placement.py \
       -ql -relax -type CR --num_nodes 5 --replication_factor 3 --min_targets_per_disk 6
    python ~/3fs/deploy/data_placement/src/setup/gen_chain_table.py \
       --chain_table_type CR --node_id_begin 10001 --node_id_end 10005 \
       --num_disks_per_node 16 --num_targets_per_disk 6 \
       --target_id_prefix 1 --chain_id_prefix 9 \
       --incidence_matrix_path output/DataPlacementModel-v_5-b_10-r_6-k_3-λ_2-lb_1-ub_1/incidence_matrix.pickle
    The following 3 files will be generated in output directory: create_target_cmd.txt, generated_chains.csv, and generated_chain_table.csv.
  3. Create storage targets:
    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") < output/create_target_cmd.txt
  4. Upload chains to mgmtd service:
    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chains output/generated_chains.csv"
  5. Upload chain table to mgmtd service:
    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chain-table --desc stage 1 output/generated_chain_table.csv"
  6. List chains and chain tables to check if they have been correctly uploaded:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-chains"
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "list-chain-tables"

Step 8: FUSE client

For simplicity FUSE client is deployed on the meta node in this guide. However, we strongly advise against deploying clients on service nodes in production environment.

  1. Copy hf3fs_fuse_main to /opt/3fs/bin and config files to /opt/3fs/etc.
    cp ~/3fs/build/bin/hf3fs_fuse_main /opt/3fs/bin
    cp ~/3fs/configs/{hf3fs_fuse_main_launcher.toml,hf3fs_fuse_main.toml,hf3fs_fuse_main_app.toml} /opt/3fs/etc
  2. Create the mount point:
    mkdir -p /3fs/stage
  3. Set cluster ID, mountpoint, token file and mgmtd address in hf3fs_fuse_main_launcher.toml
    cluster_id = "stage"
    mountpoint = '/3fs/stage'
    token_file = '/opt/3fs/etc/token.txt'
    
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
  4. Set mgmtd and monitor address in hf3fs_fuse_main.toml.
    [mgmtd]
    mgmtd_server_addresses = ["RDMA://192.168.1.1:8000"]
    
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.1.1:10000"
  5. Config file of FUSE client is also managed by mgmtd service. Use admin_cli to upload the config file to mgmtd:
    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://192.168.1.1:8000"]' "set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml"
  6. Start FUSE client:
    cp ~/3fs/deploy/systemd/hf3fs_fuse_main.service /usr/lib/systemd/system
    systemctl start hf3fs_fuse_main
  7. Check if 3FS has been mounted at /3fs/stage:
    mount | grep '/3fs/stage'

FAQ

How to troubleshoot admin_cli init-cluster error?

If mgmtd fails to start after running init-cluster, the most likely cause is an error in mgmtd_main.toml. Any changes to this file require clearing all FoundationDB data and re-running init-cluster


How to build a single-node cluster?

A minimum of two storage services is required for data replication. If set --num-nodes=1, the gen_chain_table.py script will fail. In a test environment, this limitation can be bypassed by deploying multiple storage services on a single machine.


How to update config files?

All config files are managed by mgmtd. If any *_main.toml is updated, such as storage_main.toml, the modified file should be uploaded using admin_cli set-config.


How to troubleshoot common deployment issues?

When encountering any error during deployment,

  • Check the log messages in stdout/stderr using journalctl, especially during service startup.
  • Check log files stored in /var/log/3fs/ on service and client nodes.
  • Ensure that the directory /var/log/3fs/ exists before starting any service.