From acdab39d9d02688edf5647ab69d73cba9fda40c4 Mon Sep 17 00:00:00 2001 From: jorgeantonio21 Date: Wed, 18 Sep 2024 22:35:10 +0100 Subject: [PATCH 1/3] first commit --- website/atoma-basics.md | 135 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+) create mode 100644 website/atoma-basics.md diff --git a/website/atoma-basics.md b/website/atoma-basics.md new file mode 100644 index 0000000..7d549aa --- /dev/null +++ b/website/atoma-basics.md @@ -0,0 +1,135 @@ +# Compute Layer + +Atoma is revolutionizing the AI landscape with its innovative decentralized compute infrastructure. This section outlines the core components and unique features of Atoma's Compute Layer, highlighting how it addresses the growing demand for secure, efficient, and scalable AI services. + +## Atoma's decentralized Verifiable and Private AI Cloud + +Atoma's Compute Layer is powered by a decentralized network of execution nodes that handle AI workloads. This network pools compute power from permissionless nodes, equipped with GPUs or AI-specific hardware such as TPUs and XPUs. The architecture is designed to meet the growing demand for decentralized AI services, with a focus on performance and security tailored to AI computation. + +Atoma's Compute Layer is built for efficiency and is driven by a combination of economic incentives, robust tokenomics, and the increasing demand for decentralized AI services. Unlike conventional GPU-based DePiN networks, Atoma introduces advanced performance and security mechanisms tailored specifically to AI computation. + +We are aggregating compute from professional data centers equipped with the latest high-performance GPUs, as well as from retail-grade machines equipped with consumer retail GPUs, including Macbooks pros, by using MLX and Metal kernels, etc. + +## Key Differentiators from DePiN Networks + +While DePiN networks generally concentrate on pooling computational resources and managing transactions, Atoma adopts a more tailored strategy. Nodes within the Atoma Network opt into particular AI processing tasks, including AI inference (executing models on input data), model refinement, AI data embedding, and model development. + +Additionally, Atoma stands out with its robust security protocols. By utilizing a Sampling Consensus protocol and Trusted Execution Environments (TEEs), the network ensures that every computation is safeguarded from tampering. This is essential for the integrity of generative AI outputs, particularly for end-user-facing applications where reliable results are critical. + +## Atoma's Free Market for Compute + +Atoma implements a dynamic, efficient marketplace for AI compute resources: + +- **Intelligent Request Routing**: User requests are automatically directed to the most suitable nodes based on a multi-dimensional criteria set, which includes: + - Cost + - Uptime + - Privacy features + - Response times + - Hardware capabilities + - Current workload + +- **Optimized Performance**: This smart routing ensures each request is processed efficiently, balancing performance and cost-effectiveness. This will ultimately lead to a fairer market for having access to AI compute resources. + +- **Sampling Consensus for Trust**: Atoma's own Sampling Consensus algorithm combined with TEEs provides high-assurance verification of node reliability, fostering a trustworthy ecosystem. + +- **Transparent Pricing**: Node operators set competitive rates, while users benefit from clear, market-driven pricing. In this way, nodes can bid their compute power at a fair market price, while users can have the flexibility to pick and choose the best node for their needs. + +- **Flexible Resource Allocation**: The network adapts in real-time to fluctuating demand, scaling resources as needed. + +This approach creates a robust, decentralized marketplace for AI compute power, combining reliability, efficiency, and economic incentives for all participants. + +## Node Reputation and Incentives + +### Node Reputation Mechanisms + +Atoma's network employs a sophisticated reputation system to ensure high-quality service and network integrity: + +- **Performance Metrics**: Nodes are evaluated on key factors including: + - Availability + - Execution speed + - Task completion rate + - Output accuracy + - Hardware capabilities + +- **Reward System**: Nodes earn rewards for: + - Successful task completion + - Maintaining high uptime + - Consistently meeting performance benchmarks + +- **Collateral Requirement**: Nodes must stake collateral to participate, which can be: + - Increased for higher-tier tasks + - Slashed for malicious behavior or repeated poor performance + +- **Dynamic Task Allocation**: Higher-reputation nodes receive priority for: + - More complex AI workloads + - Higher-value tasks + - Sensitive or privacy-focused computations + +### Trust and Security Measures + +- **Sampling Consensus**: Randomly selected nodes verify computations, ensuring result integrity without centralized oversight. + +- **Trusted Execution Environments (TEEs)**: Hardware-level isolation protects sensitive data and ensures tamper-proof execution. + +- **Transparent Reporting**: Node performance metrics are publicly available, fostering trust and enabling informed user choices. + +This multi-faceted approach creates a self-regulating ecosystem that incentivizes high performance, security, and reliability across the Atoma network. + + +## Atoma's optimized infrastructure + +Atoma leverages Rust's low-level speed and memory safety to power its decentralized AI infrastructure. Known for system efficiency, Rust is the de facto language for high-performance systems programming, integration with high security technologies such as TEEs, and integration with GPU programming frameworks such as CUDA and Metal programming. The combination of these features makes Rust the ideal language for Atoma's decentralized AI infrastructure. Moreover, instead of utilizing large legacy libraries such as PyTorch, which often leads to high memory usage and lower execution speed, Atoma adopts Candle, a lightweight, Rust-native AI framework maintained by HuggingFace. The compact binaries of Candle allow nodes, even at the network edge, to execute AI tasks with greater efficiency. + +For large-scale AI processing, such as processing large context-window LLM inference by the largest AI models, Atoma incorporates advanced techniques such as CUDA-based FlashAttention and PagedAttention, enhancing performance for both inference and training tasks. These optimizations ensure efficient scheduling of workloads, maximizing GPU utilization and enabling nodes to handle parallel requests seamlessly. Atoma's network scales both vertically and horizontally, supporting a growing number of nodes and cores to accommodate increasing computational demand. + +## Atoma at the Edge: Empowering Local AI + +Atoma extends its reach beyond decentralized cloud infrastructure to the edge, enabling powerful AI capabilities directly on users' devices: + +- **WASM and WebGPU Compatibility**: We are building a cutting-edge software stack that leverages WebAssembly (WASM) and WebGPU technologies, allowing for high-performance AI applications to run natively in browsers and on local devices. + +- **Edge LLM Deployment**: Users can run compact yet powerful Language Models directly on their devices, ensuring privacy and reducing latency for AI-driven tasks. + +- **Comprehensive SDK**: Atoma provides developers with a robust toolkit to create innovative edge AI applications that seamlessly integrate with our decentralized compute layer. + +- **Data Ownership and Monetization**: This edge-centric approach empowers users and developers to retain control over AI-generated data. Through Atoma's tokenomics, this data can be ethically monetized in decentralized data marketplaces. + +- **Fueling Next-Gen AI**: The aggregated edge data becomes a valuable resource for training future generations of AI models, creating a virtuous cycle of innovation within the Atoma ecosystem. + +By bridging edge computing with our decentralized infrastructure, Atoma is fostering a new paradigm of accessible, private, and user-centric AI applications. + +## AI Infrastructure + +### Inference, Text Embeddings, and Fine-tuning + +Atoma's infrastructure is fully optimized to handle AI tasks like inference, text embeddings, fine-tuning etc. The network implements advanced techniques to accelerate inference, including: + +- **Flash Attention 2 and 3**: These techniques reduce the number of reads and writes from HBM (High Bandwidth Memory) on GPUs, leading to significant speed improvements in AI inference and training workloads. This results in faster processing times and more efficient use of hardware resources, particularly for large language models (LLMs). +- **vAttention**: A memory management mechanism that allocates large amounts of virtual memory for models, but efficiently assigns physical memory at runtime using minimal CPU and GPU resources. This allows for optimized memory usage, reducing overhead while running AI models. +- **vLLM**: Inspired by OS pagination techniques, vLLM handles the memory management of AI inference requests more efficiently. It uses virtual memory to ensure that large model requests are processed smoothly. + +### Multi-GPU Serving and Quantization Techniques + +Atoma enables multi-GPU serving, allowing the deployment of large language models (LLMs) across multiple GPUs to handle more extensive computations. This capability makes it possible to serve some of the largest available open-source models. + +To further enhance performance, the network utilizes various quantization techniques, such as: + +- INT8/INT4 Quantization +- FP8/FP4 Quantization + +These techniques enable more efficient model execution by reducing memory usage and computation costs, all while maintaining high performance. + +### RAG (Retrieval-Augmented Generation) Implementation + +Atoma will incorporate Retrieval-Augmented Generation (RAG) to enhance AI model performance by combining data retrieval with content generation. This approach improves the accuracy of AI outputs by using relevant external data during inference, making responses more contextually rich and reliable. + + +## Future Roadmap: Decentralized AI Training and Data Production + +### Integration of Decentralized AI Training + +Atoma plans to introduce decentralized AI training, leveraging the latest NVIDIA GPUs, such as the Hopper and Blackwell families, integrated with TEEs to ensure secure and efficient AI training processes. + +### Real and Synthetic Data Production + +Through the Atoma Network, vast amounts of real and synthetic data will be generated. Such data can be utilized for decentralized AI training. This data will be carefully labeled and curated through specialized mechanisms, further supporting the network's long-term AI training initiatives. From 29e5f9bee6c3b0387aba004358ffa400a5c9b923 Mon Sep 17 00:00:00 2001 From: jorgeantonio21 Date: Wed, 18 Sep 2024 23:56:07 +0100 Subject: [PATCH 2/3] add node node docs --- website/atoma-node.md | 217 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 217 insertions(+) diff --git a/website/atoma-node.md b/website/atoma-node.md index e69de29..f8edd63 100644 --- a/website/atoma-node.md +++ b/website/atoma-node.md @@ -0,0 +1,217 @@ +# Atoma Node + +Atoma Nodes are the backbone of the Atoma's decentralized compute layer. They are responsible for executing AI workloads and providing compute resources to the network. Nodes are rewarded with native TOMA tokens for their contributions to the network. + +In this section, we will explain how to set up a node and connect it to the Atoma, so anyone with available computing resources can participate in the network. + +**Note:** The Atoma Node is currently under development and the following documentation is for Atoma's alpha release sole purpose. + +## Requirements + +- Have Rust and Cargo installed, for more details please consult [here](https://www.rust-lang.org/tools/install). +- Have a machine with either one or more Nvidia GPUs, or any MacBook Pro that supports Metal. +- For Nvidia GPUs, it is recommended to have a at least CUDA 12.1 installed. +- In order to use optimized CUDA kernels, with Flash Attention2 support, it is recommended to have a Nvidia Ampere or newer GPU architecture (see more details below). +- It is possible to run the Atoma Node relying solely on a CPU, even though the performance will likely be far from optimal. +- Clone the Atoma's node [repository](https://github.com/atoma-network/atoma-node). +- An Hugging Face API key, to download the models used for inference. +- For IPFS compatibility, it is recommended to have a local IPFS node running (see more details below). +- For Gateway's compatibility, it is recommended to have created a Gateway account and have access to a valid API key (see more details [here](https://docs.mygateway.xyz/developer-guide/api-reference/authentication)). +- To support the current Atoma native UI application, it is recommended to have a Supabase account and access to a valid API key (see more details [here](https://supabase.com/docs/reference/javascript/auth-signup)). +- Have a Sui wallet, and some `SUI` tokens in it. Please visit the Sui official CLI [website](https://docs.sui.io/references/cli/client) to install the Sui CLI and follow the instructions to set up a wallet. +- Have Atoma's native faucet `TOMA` token available on your Sui wallet. You can request faucet tokens here [here](). + +## Configuration + +After you have installed Rust and Cargo, and have cloned the Atoma's node repository, you are required to specify a set of configuration parameters that will allow the node to connect to the Atoma Network. We recommend you to create a `config.toml` file in the root of the Atoma node's repository with the following parameters: + +```toml +[client] +config_path = "YOUR_LOCAL_PATH_TO_SUI_CONFIG_FILE" +atoma_db_id = "0x0ee50a4ef345ffec5c58906e7f6a7f569fddbf6696c3d7b5b305b72e2683f304" +node_badge_id = "0xbc093a0daf2d5ba7ed287a7e1cf4fac6973523beca462531174647588bfcc4ec" +package_id = "0x8fc663315a07208e86473b808d902c9b97a496a3d2c3779aa6839bd9d26272b8" +small_id = YOUR_SMALL_ID + +max_concurrent_requests = 1000 + +[client.request_timeout] +secs = 300 +nanos = 0 + +[inference] +api_key = "HUGGING_FACE_API_KEY" +cache_dir="./models" +flush_storage=true +tracing=true +jrpc_port=INFERENCE_JRPC_PORT + +[[inference.models]] +device_ids = [YOUR_DEVICE_IDS] +dtype="DTYPE" +model_id="HOST_MODEL_ID" +revision="HOST_MODEL_REVISION" +use_flash_attention=BOOLEAN + +[input_manager] +firebase_url = "https://atoma-testing-default-rtdb.europe-west1.firebasedatabase.app" +firebase_email = "testing@atoma.network" +firebase_password = "testing" +firebase_api_key = "YOUR_FIREBASE_API_KEY " +ipfs_api_key = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9." +ipfs_port = IPFS_DAEMON_PORT +small_id = YOUR_SMALL_ID + +[output_manager] +firebase_url = "https://atoma-testing-default-rtdb.europe-west1.firebasedatabase.app" +firebase_email = "testing@atoma.network" +firebase_password = "testing" +firebase_api_key = "YOUR_FIREBASE_API_KEY" +gateway_api_key = "YOUR_GATEWAY_API_KEY" +gateway_bearer_token = "YOUR_GATEWAY_BEARER_TOKEN" +ipfs_port = IPFS_DAEMON_PORT +small_id = YOUR_SMALL_ID + +[event_subscriber] +http_url = "SUI_RPC_PROVIDER_HTTP_URL" +ws_url = "SUI_RPC_PROVIDER_WS_URL" +package_id = "0x8fc663315a07208e86473b808d902c9b97a496a3d2c3779aa6839bd9d26272b8" +small_id = YOUR_SMALL_ID + +[event_subscriber.request_timeout] +secs = 300 +nanos = 0 + +[streamer] +firebase_url = "https://atoma-testing-default-rtdb.europe-west1.firebasedatabase.app" +firebase_email = "testing@atoma.network" +firebase_password = "testing" +firebase_api_key = "YOUR_FIREBASE_API_KEY" +small_id = YOUR_SMALL_ID +``` + +In order to fill the above configuration file, you will need to: + +1. Fill the Sui `client` parameters with the path to your Sui config file, together with your node registration small id. In order to register your node on the Atoma contract, please follow the instruction below. As an example, if my Sui config file is located at `~/.sui/sui_config/client.yaml` (standard file path), and your registration node small id is 1234, your `config.toml` file should look like this: + +```toml +[client] +config_path = "~/.sui/sui_config/client.yaml" +atoma_db_id = "0x0ee50a4ef345ffec5c58906e7f6a7f569fddbf6696c3d7b5b305b72e2683f304" +node_badge_id = "0xbc093a0daf2d5ba7ed287a7e1cf4fac6973523beca462531174647588bfcc4ec" +package_id = "0x8fc663315a07208e86473b808d902c9b97a496a3d2c3779aa6839bd9d26272b8" +small_id = 1234 +``` + +2. Fill the `inference` parameters with your Hugging Face API key, and the model you want to use for inference. Your node must download a specific model weight from Hugging Face. Your node will download the model weights to the `./models` directory, and will use the `DTYPE` and `USE_FLASH_ATTENTION` parameters to select the best inference configuration to serve the specific model inference. Please refer to our [supported model page]() for more information about the supported models, for the current alpha release. + +If you have a machine with an Nvidia GPU with CUDA 12.1 or newer and an Ampere or newer GPU architecture, you can use the optimized CUDA kernels and Flash Attention2 by setting `USE_FLASH_ATTENTION` to `true`, otherwise you should set it to `false`. + +You can create a free Hugging Face's account page, [here](https://huggingface.co/join). You will be able to find your API key on your account page, under the `Settings` tab. + +If you wish to deploy a Llama3.1 8b instruct model for inference, with `bf16` precision (we suggest either `bf16` or `fp16` precision types for most of the models, or else some quantized precision types), you should set the `DTYPE` parameter to `bf16`. In this case, your `config.toml` file should look like this: + +```toml +[inference] +api_key = "HUGGING_FACE_API_KEY" +cache_dir="./models" +flush_storage=true +tracing=true +jrpc_port=INFERENCE_JRPC_PORT + +[[inference.models]] +device_ids = [0] +dtype="bf16" +model_id="llama31_instruct8b" +revision="" +use_flash_attention=true +``` + +The model id for a specific AI model can be found in the model's page on Hugging Face's website. For example, the model id for Llama3.1 8b instruct can be found [here](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct). If you support multiple GPU devices, you can specify which GPU device ids to use by setting the `device_ids` parameter to a list of integers, e.g. `device_ids = [0, 1]` to use the first and second GPU devices. + +With more than one GPU device, the model weights will be automatically split across the available GPUs, with tensor weight parallelism applied. For example with 1 NVIDIA RTX 3090/4090 it is possible to run a Llama3.1 8b model with precision `bf16` or `fp16`. With a NVIDIA A100 is is possible to run a Llama3.1 8b model with precision `fp32`, or a Llama3.1 70b quantized `int4` or `int8` model. With 2 NVIDIA A100 GPUs it is possible to run a Llama3.1 70b model with precision `bf16` or `fp16`, across both 2 GPUs. + +With 8 NVIDIA A100 or H100 GPUs it is possible to run a Llama3.1 405b model with precision `fp8` precision, across all 8 GPUs, whereas with 16 NVIDIA A100 or H100 GPUs it is possible to run a Llama3.1 405b model with precision `bf16` or `fp16` precision, across all 16 GPUs. + +2. Fill the `input_manager`, `output_manager` and `streamer` parameters with your Firebase project credentials, and the URL to your local IPFS daemon. To run a local IPFS daemon, you can follow the instructions [here](https://docs.ipfs.tech/install/), we can also find more details below. The Supabase credentials are required for the node to support the current Atoma's alpha native UI application. This is not strictly necessary if you are only interested in contributing to the Atoma's compute layer at the smart contract level. + +4. Fill the `event_subscriber` parameters with the URL to your Sui RPC provider. We suggest using a Sui RPC provider that is geographically close to your node, to reduce latency and improve performance. We recommend a few providers including [BlockVision](https://blockvision.org/), [Shinami](https://www.shinami.com/), etc. + +## Run the Atoma Node + +In order to run the Atoma Node, you need to run the following commands, at the root of the Atoma node's repository: + +```bash +$ cd atoma-node/ +$ RUST_LOG=info cargo run --release --featurs FEATURE -- --config-path "YOUR_CONFIG_TOML_FILE_PATH"` +``` + +The `FEATURE` flag can be one of the following: + +- `cuda`: to run the Atoma Node with CUDA support. +- `metal`: to run the Atoma Node with Metal support. +- `flash-attn`: to run the Atoma Node with Flash Attention2 support (for Ampere or newer GPU architecture). +- `cpu`: to run the Atoma Node with CPU support. + +The `YOUR_CONFIG_TOML_FILE_PATH` should be the path to the `config.toml` file you created in the previous step. If you created it at the root of the Atoma node's repository, it should be `../config.toml`. + +Once your node is running, you should be able to see the node's logs in the terminal. If you have set up the node correctly, you should see the node's logs starting with `INFO` within a few seconds. You can also check how long it takes for the node to load the model weights into the GPU device. + +Your node will start listening to incoming inference requests, once you have registered your node on the Atoma smart contract on the Sui network. + +## Node registration + +In order to register your node on the Atoma smart contract, you need to first have some SUI tokens in your Sui wallet (whose keypair client information is specified in the `config.toml` file above). You will also need to have some faucet TOMA tokens in your wallet. You can request TOMA tokens from the Atoma faucet, [here](). + +The first step is to clone the Atoma's smart contract repository, and build the Atoma's smart contract package. + +```bash +$ cd ~ +$ git clone https://github.com/atoma-network/atoma-contracts +$ cd atoma-contracts/sui/cli +``` + +To register your node on the Atoma smart contract, you need to run the following command: + +```bash +$ ./cli db register-node \ + --package "0x8fc663315a07208e86473b808d902c9b97a496a3d2c3779aa6839bd9d26272b8" +``` + +Then you need to subscribe your node to the currrent AI model you are hosting, as follows: + +```bash +$ ./cli db add-node-to-model \ + --package "0x8fc663315a07208e86473b808d902c9b97a496a3d2c3779aa6839bd9d26272b8" \ + --model "YOUR_MODEL_ID" \ + --echelon ECHELON +``` + +In order to replace the flag `YOUR_MODEL_ID` you can consult the right value in [here](). You can further find your `ECHELON` value by consulting [here](), depending on your GPU hardware specs. + +## Flash attention requirements + +It is required holding one or more a Nvidia Ampere or newer GPU architecture to use the optimized CUDA kernels and Flash Attention2. + +## CUDA requirements + +We support any NVIDIA series 20xx or newer. It is recommended to have a CUDA 12.1 or newer installed driver installed. For more details about how to update your NVIDIA driver, please refer to the [NVIDIA website](https://www.nvidia.com/Download/index.aspx). + +## Metal requirements + +We support any Apple Silicon series M or newer. It is recommended to have a Metal version 3.0 or newer compatible machine. + +## IPFS compatibility + +It is recommended to have a local IPFS node running, to store AI generated model outputs on behalf of the user, if this has requested it. It is possible to run a local IPFS node by following the steps: + +1. Install the IPFS daemon by following the instructions [here](https://docs.ipfs.tech/install/command-line/#install-official-binary-distributions). +2. On a different terminal command line, run `ipfs init` to initialize the IPFS node. +3. On that same terminal command line, run `ipfs daemon` to start the IPFS daemon. + +## Gateway compatibility + +It is recommended to have a Gateway account, to store AI generated model outputs on behalf of the user, if this has requested it. It is possible to create a free Gateway account [here](https://www.mygateway.xyz/). Once logged in, you will be able to create a new API key by navigating to the `API Keys` section and clicking on the `Create API Key` button. We will be using the API key to authenticate requests to the Gateway API. + +## Supabase compatibility + From 5c0798923179aa07973ddc237dce228312415f41 Mon Sep 17 00:00:00 2001 From: jorgeantonio21 Date: Thu, 19 Sep 2024 12:43:26 +0100 Subject: [PATCH 3/3] update smart contract docs --- website/atoma-contracts.md | 356 ++++++++++++++++++++++++++----------- 1 file changed, 256 insertions(+), 100 deletions(-) diff --git a/website/atoma-contracts.md b/website/atoma-contracts.md index d97634c..559bd70 100644 --- a/website/atoma-contracts.md +++ b/website/atoma-contracts.md @@ -1,37 +1,46 @@ # Atoma's Contracts -Currently the Atoma Network protocol is supported on the Sui blockchain. We will expand Atoma's reach to both EVM chains and Solana. - -## Sui - -### Atoma Contract Features - -We have a current implementation of the Atoma contract on Sui. This contract is responsible for the following features: - -1. `Node registration` - Nodes operating on the Atoma Network should first register on the Atoma contract. Once registered on the Atoma contract, nodes can receive newly submitted requests and run the required computation to resolve the request. -2. `Submit collateral` - Upon registration, nodes should deposit a given amount of collateral. The collateral is indexed in Atoma's native token, the `TOMA` token. -3. `Accrue fees` - Nodes accrue fees, indexed in `TOMA` token, based on the number and the type of requests they process. Accrued fees can only be withdrawn two epochs later. -4. `Subscribe to AI models and other forms of compute types` - Upon registration, nodes should specify which AI models these nodes subscribe to (i.e., which models do the nodes currently host). Once a node registers for a given set of models, it can't change these, unless it deregisters and registers a second type, specifying the new set of models. -5. `Node deregistration` - Once a node decides to stop providing compute to the Atoma Network, it can deregister itself directly on the smart contract. -6. `Specifying node hardware features` - The Atoma Network protocol runs on a `sampling consensus` mechanism. This mechanism requires multiple different nodes to reach consensus on the execution of a given output state. To achieve this, it is required that nodes, on a selected `quorum`, generate outputs in a deterministic fashion. However, most AI requests are `non-deterministic in nature`. It is possible to achieve determinism, if the selected nodes for a given request, have the same GPU hardware with the same congifuration. For this reason, nodes must submit the type of GPU card(s) these can hold. -7. `Echelon specification` - Upon 5., the Atoma contract specifies compute `echelons`. These can be thought of as shards of the network. Compute across echelons should be as homogeneous as possible in compute and memory requirements, process time, and determinism. -8. `Request submission` - Every request to the Atoma Network (processed by Atoma tokens) is submitted via the Atoma contract. Requests are paid in `TOMA` token. -9. `Load balancing` - Based on fine grained echelon performance, the Atoma contract is responsible for balancing request total volume across suitable echelons (based on their total available compute) and each echelon total `amount of work` at each time. -10. `Random sampling` - Each request submitted into the Atoma Network is processed across a finite number of nodes within the same suitable echelon. The Atoma contract is responsible for randomly selecting the requested number of nodes. We currently use Sui's on-chain random generation features. -11. `Timeouts` - The Atoma contract keeps a registry of the time it takes to process each request. If a node does not submit a request on time, a time out is triggered and a percentage of the node's deposited collateral is slashed automatically. -12. `Output commitment submission` - Upon generating a new output for a given request, a node must submit a cryptographic commitment back to the Atoma contract. This commitment is used by the Atoma contract to check if there is `consensus on the state of the output`. Once consensus is reached, all nodes that generated a commitment are entitled to accrue fees (paid by the user on request submission). -13. `Dispute` - If consensus is not reached on the state of the output (that is, different nodes submit different commitments), a dispute mechanism is put forth by the Atoma contract, by selecting additional high reputation (running trusted hardware) to resolve the dispute. -14. `Staking` - Registered nodes are entitled for staking rewards based on their average node performance, in each echelon (future feature). -15. `Governance` - Will allow `TOMA` holders to vote and decide which models to operate on the Atoma Network, as well as other types of compute (future feature). - -### Future Features -We plan to add other features to the Atoma contract. These include - -1. `Staking`; -2. `Governance`; -3. `Dispute` - we are in the process of establishing different types of dispute resolving (i.e BFT dispute resolution and trusted hardware oracle nodes) -4. `General compute tasks`, this will include general WASM applications that can be run on Atoma nodes. Due to potential security issues for both the user and the node, we will require such applications to run in trusted execution environments (TEEs). +The Atoma Network is supported by an on-chain smart contract on the Sui blockchain. That said, the Atoma protocol is chain agnostic, in particular, we have future plans to expand Atoma's functionality to other chains, such as EVM compatible chains, Solana, Near, etc. We will also explore the possibility of integrating as an EigenLayer AVS, or building our own L1/L2 for native payments. +This document outlines the key features, upcoming developments, and usage instructions for interacting with Atoma's smart contracts on Sui. +## Atoma Contract Features + +The Atoma contract on Sui implements the following key features: + +1. **Node Registration**: Nodes must register to participate in the Atoma Network and process requests. + +2. **Collateral Management**: Nodes deposit `TOMA` tokens as collateral upon registration. + +3. **Fee Accrual**: Nodes earn fees in `TOMA` tokens based on processed requests, withdrawable after two epochs. + +4. **Model Subscription**: Nodes specify which AI models they host and can process. + +5. **Node Deregistration**: Allows nodes to exit the network and withdraw collateral. + +6. **Hardware Specification**: Nodes declare their GPU configurations to ensure deterministic outputs within quorums. + +7. **Echelon System**: Organizes nodes into compute shards (echelons) based on hardware capabilities. + +8. **Request Handling**: Manages submission and payment (in `TOMA`) for network requests. + +9. **Load Balancing**: Distributes requests across suitable echelons based on performance and workload. + +10. **Random Node Sampling**: Selects a subset of nodes within an echelon to process each request. + +11. **Timeout Enforcement**: Monitors request processing times and slashes collateral for late responses. + +12. **Output Consensus**: Nodes submit cryptographic commitments of outputs to reach consensus. + +13. **Dispute Resolution**: Handles disagreements on output state using high-reputation nodes. + +### Upcoming Features + +1. **Staking**: Reward system for nodes based on performance within echelons. +2. **Governance**: Voting mechanism for `TOMA` holders to influence network decisions. +3. **Enhanced Dispute Resolution**: Implementing BFT and trusted hardware oracle solutions. +4. **General Compute Tasks**: Support for WASM applications running inside Trusted Execution Environments (TEEs). + +This contract design ensures a robust, scalable, and secure decentralized compute network for AI and other intensive tasks. ### Atoma Contract Documentation The following instructions provide a detailed description on how to interact with the Atoma contract, on the Sui blockchain. @@ -72,12 +81,29 @@ The Atoma contract emits various types of events: - `settlement::SettledEvent` is emitted when a ticket is settled and fee is distributed. - `settlement::NewlySampledNodesEvent` is emitted when a new set of nodes is sampled for a prompt because of timeout. -#### Create a Sui wallet +#### Create a Sui Wallet + +To interact with the Atoma contract on the Sui blockchain, you'll need a Sui wallet. If you already have one, you can skip to the next section. Otherwise, follow these steps to create a new wallet: + +1. Choose a Sui wallet: + - For browser extensions: [Sui Wallet](https://chrome.google.com/webstore/detail/sui-wallet/opcgpfmipidbgpenhmajoajpbobppdil) or [Ethos Wallet](https://chrome.google.com/webstore/detail/ethos-sui-wallet/mcbigmjiafegjnnogedioegffbooigli) + - For mobile: [Suiet](https://suiet.app/) or [Morphis Wallet](https://morphiswallet.com/) -As a first step, in order to interact with the Atoma contract, a user must have a wallet on the Sui blockchain. If the reader already has one, it can skip to the next section, otherwise we recommend following the official Sui [docs](https://blog.sui.io/sui-wallets/). +2. Install your chosen wallet and follow the setup instructions. +3. Securely store your recovery phrase (seed words) in a safe place. -#### How to use the atoma protocol +4. Fund your wallet: + - For testnet: Use the [Sui Faucet](https://discord.com/channels/916379725201563759/971488439931392130) in the official Sui Discord. + - For mainnet: Purchase SUI tokens from a supported exchange. + +5. Verify your wallet balance using the Sui Explorer or your wallet interface. + +For more detailed instructions and additional wallet options, refer to the [official Sui documentation on wallets](https://docs.sui.io/learn/about-sui/sui-wallets). + + + +#### How to use the Atoma protocol To interact with the Atoma protocol, utilize the `gate` module within the `atoma` package, responsible for prompt submission. @@ -100,12 +126,15 @@ As of now, the supported modalities are: We discuss pricing below. - `model`: a string identifier of the model for text-to-text generation. Refer to our website for supported models. + - `pre_prompt_tokens`: For in-context applications, this is the number of tokens already generated before the user's current prompt. - `prompt`: input text prompt. There's no limit to the prompt length at the protocol level, but a Sui transaction can be at most 128KB. - `random_seed`: any random number to seed the random generator for consistent output across nodes. Before Sui stabilizes random generator, you can use `atoma::utils::random_u64`. - `repeat_last_n`: instructs the model to avoid reusing tokens within the last `n` tokens. - `repeat_penalty`: a float number determining token repetition avoidance. + - `should_stream_output`: a boolean indicating whether the output should be streamed or not + to a suitable output destination. - `temperature`: a float number determining randomness in the output. - `top_k`: an integer determining token consideration for the next generation. - `top_p`: a float number determining token consideration for the next generation. @@ -146,10 +175,17 @@ If no nodes can generate the prompt within the budget, the transaction fails. `submit_text2image_prompt` has a `max_fee_per_input_token` and `max_fee_per_output_token` parameters. These apply to input and output token prices, respectively. -The last parameter is `nodes_to_sample`. -It's optional and defaults to a sensible value. -Higher number of nodes means higher confidence in the generated output. -However, the price is also higher as nodes multiply the prompt price. +The last parameter is `nodes_to_sample`, as an optional parameter. If specified, a +higher number of nodes means higher confidence in the generated output, overall. +However, the price is also higher as nodes multiply the prompt price. This behavior +is part of our standard `Sampling Consensus` protocol. +If the value of `nodes_to_sample` is not specified, then the protocol will advance +with the Cross-Validation Sampling Consensus mechanism. That is, a single node will +be sampled by the contract and once the node generates the response, the contract +will sample more nodes to attest to the response's correctness, with some probability `p`, +specified at the protocol level. This approach reduces the cost of verifiable inference, +while guaranteeing that the protocol converges to game-theoretical Nash equilibrium, where +honest nodes are incentivized to act honestly. Refer to the `atoma::prompts` module for sample implementations. If you are developing a custom smart contract for prompt submission, this module is a great starting point. @@ -258,9 +294,77 @@ Current node echelons are the following (based on the node's type of GPU): | 34 | 2 x NVIDIA H100 (80GB) | | 35 | 4 x NVIDIA H100 (80GB) | | 36 | 8 x NVIDIA H100 (80GB) | -| 100 | MACBOOK PRO M2 (Metal) | -| 101 | MACBOOK PRO M3 (Metal) | -| 200 | AMD | +| 37 | 1 x NVIDIA RTX 2060 | +| 38 | 2 x NVIDIA RTX 2060 | +| 39 | 4 x NVIDIA RTX 2060 | +| 40 | 1 x NVIDIA RTX 2070 | +| 41 | 2 x NVIDIA RTX 2070 | +| 42 | 4 x NVIDIA RTX 2070 | +| 43 | 1 x NVIDIA RTX 2080 | +| 44 | 2 x NVIDIA RTX 2080 | +| 45 | 4 x NVIDIA RTX 2080 | +| 46 | 1 x NVIDIA RTX 2080 Ti | +| 47 | 2 x NVIDIA RTX 2080 Ti | +| 48 | 4 x NVIDIA RTX 2080 Ti | +| 49 | 1 x NVIDIA RTX 3060 | +| 50 | 2 x NVIDIA RTX 3060 | +| 51 | 4 x NVIDIA RTX 3060 | +| 52 | 1 x NVIDIA RTX 3070 | +| 53 | 2 x NVIDIA RTX 3070 | +| 54 | 4 x NVIDIA RTX 3070 | +| 55 | 1 x NVIDIA RTX 3080 | +| 56 | 2 x NVIDIA RTX 3080 | +| 57 | 4 x NVIDIA RTX 3080 | +| 58 | 1 x NVIDIA Titan V (Volta) | +| 59 | 2 x NVIDIA Titan V (Volta) | +| 60 | 4 x NVIDIA Titan V (Volta) | +| 61 | 1 x NVIDIA Quadro RTX 8000 (Turing) | +| 62 | 2 x NVIDIA Quadro RTX 8000 (Turing) | +| 63 | 4 x NVIDIA Quadro RTX 8000 (Turing) | +| 64 | 1 x NVIDIA RTX 4060 | +| 65 | 2 x NVIDIA RTX 4060 | +| 66 | 4 x NVIDIA RTX 4060 | +| 67 | 1 x NVIDIA RTX 4070 | +| 68 | 2 x NVIDIA RTX 4070 | +| 69 | 4 x NVIDIA RTX 4070 | +| 70 | 1 x NVIDIA RTX 4070 Ti | +| 71 | 2 x NVIDIA RTX 4070 Ti | +| 72 | 4 x NVIDIA RTX 4070 Ti | +| 1000 | 1 x AMD Radeon RX 6600 | +| 1001 | 2 x AMD Radeon RX 6600 | +| 1002 | 4 x AMD Radeon RX 6600 | +| 1003 | 1 x AMD Radeon RX 6700 XT | +| 1004 | 2 x AMD Radeon RX 6700 XT | +| 1005 | 4 x AMD Radeon RX 6700 XT | +| 1006 | 1 x AMD Radeon RX 6800 XT | +| 1007 | 2 x AMD Radeon RX 6800 XT | +| 1008 | 4 x AMD Radeon RX 6800 XT | +| 1009 | 1 x AMD Radeon RX 6900 XT | +| 1010 | 2 x AMD Radeon RX 6900 XT | +| 1011 | 4 x AMD Radeon RX 6900 XT | +| 1012 | 1 x AMD Radeon RX 7600 | +| 1013 | 2 x AMD Radeon RX 7600 | +| 1014 | 4 x AMD Radeon RX 7600 | +| 1015 | 1 x AMD Radeon RX 7700 XT | +| 1016 | 2 x AMD Radeon RX 7700 XT | +| 1017 | 4 x AMD Radeon RX 7700 XT | +| 1018 | 1 x AMD Radeon RX 7800 XT | +| 1019 | 2 x AMD Radeon RX 7800 XT | +| 1020 | 4 x AMD Radeon RX 7800 XT | +| 1021 | 1 x AMD Radeon RX 7900 XT | +| 1022 | 2 x AMD Radeon RX 7900 XT | +| 1023 | 4 x AMD Radeon RX 7900 XT | +| 1024 | 1 x AMD Radeon RX 7900 XTX | +| 1025 | 2 x AMD Radeon RX 7900 XTX | +| 1026 | 4 x AMD Radeon RX 7900 XTX | +| 1027 | 1 x AMD Instinct MI100 | +| 1028 | 2 x AMD Instinct MI100 | +| 1029 | 4 x AMD Instinct MI100 | +| 1030 | 1 x AMD Instinct MI200 | +| 1031 | 2 x AMD Instinct MI200 | +| 1032 | 4 x AMD Instinct MI200 | +| 2000 | MACBOOK PRO M2 (Metal) | +| 2001 | MACBOOK PRO M3 (Metal) | #### Node model subscription @@ -269,7 +373,7 @@ In order to subscribe to a given model, the node operator can run the following ```sh ./cli db add-node-to-model \ - --package "TODO(add package id here)" \ + --package "0x8fc663315a07208e86473b808d902c9b97a496a3d2c3779aa6839bd9d26272b8" \ --model "MODEL" \ ``` @@ -282,73 +386,125 @@ The available list of supported models is: | Model Type | Hugging Face model name | |------------------------------------|------------------------------------------| -| falcon_7b | tiiuae/falcon-7b | -| falcon_40b | tiiuae/falcon-40b | -| falcon_180b | tiiuae/falcon-180b | -| llama_v1 | Narsil/amall-7b | -| llama_v2 | meta-llama/Llama-2-7b-hf | -| llama_solar_10_7b | upstage/SOLAR-10.7B-v1.0 | -| llama_tiny_llama_1_1b_chat | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | -| llama3_8b | meta-llama/Meta-Llama-3-8B | -| llama3_instruct_8b | meta-llama/Meta-Llama-3-8B-Instruct | -| llama3_70b | meta-llama/Meta-Llama-3-70B | -| mamba_130m | state-spaces/mamba-130m | -| mamba_370m | state-spaces/mamba-370m | -| mamba_790m | state-spaces/mamba-790m | -| mamba_1-4b | state-spaces/mamba-1.4b | -| mamba_2-8b | state-spaces/mamba-2.8b | -| mistral_7bv01 | mistralai/Mistral-7B-v0.1 | -| mistral_7bv02 | mistralai/Mistral-7B-v0.2 | -| mistral_7b-instruct-v01 | mistralai/Mistral-7B-Instruct-v0.1 | -| mistral_7b-instruct-v02 | mistralai/Mistral-7B-Instruct-v0.2 | -| mixtral_8x7b | mistralai/Mixtral-8x7B-v0.1 | -| phi_3-mini | microsoft/Phi-3-mini-4k-instruct | -| stable_diffusion_v1-5 | runwayml/stable-diffusion-v1-5 | -| stable_diffusion_v2-1 | stabilityai/stable-diffusion-2-1 | -| stable_diffusion_xl | stabilityai/stable-diffusion-xl-base-1.0 | -| stable_diffusion_turbo | stabilityai/sdxl-turbo | -| quantized_7b | TheBloke/Llama-2-7B-GGML | -| quantized_13b | TheBloke/Llama-2-13B-GGML | -| quantized_70b | TheBloke/Llama-2-70B-GGML | -| quantized_7b-chat | TheBloke/Llama-2-7B-Chat-GGML | -| quantized_13b-chat | TheBloke/Llama-2-13B-Chat-GGML | -| quantized_70b-chat | TheBloke/Llama-2-70B-Chat-GGML | -| quantized_7b-code | TheBloke/CodeLlama-7B-GGUF | -| quantized_13b-code | TheBloke/CodeLlama-13B-GGUF | -| quantized_32b-code | TheBloke/CodeLlama-34B-GGUF | -| quantized_7b-leo | TheBloke/leo-hessianai-7B-GGUF | -| quantized_13b-leo | TheBloke/leo-hessianai-13B-GGUF | -| quantized_7b-mistral | TheBloke/Mistral-7B-v0.1-GGUF | -| quantized_7b-mistral-instruct | TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF | -| quantized_7b-mistral-instruct-v0.2 | TheBloke/Mistral-7B-Instruct-v0.2-GGUF | -| quantized_7b-zephyr-a | TheBloke/zephyr-7B-alpha-GGUF | -| quantized_7b-zephyr-b | TheBloke/zephyr-7B-beta-GGUF | -| quantized_7b-open-chat-3.5 | TheBloke/openchat_3.5-GGUF | -| quantized_7b-starling-a | TheBloke/Starling-LM-7B-alpha-GGUF | -| quantized_mixtral | TheBloke/Mixtral-8x7B-v0.1-GGUF | -| quantized_mixtral-instruct | TheBloke/Mistral-7B-Instruct-v0.1-GGUF | -| quantized_llama3-8b | QuantFactory/Meta-Llama-3-8B-GGUF | -| qwen_w0.5b | Qwen/Qwen1.5-0.5B | -| qwen_w1.8b | Qwen/Qwen1.5-1.8B | -| qwen_w4b | Qwen/Qwen1.5-4B | -| qwen_w7b | qwen/Qwen1.5-7B | -| qwen_w14b | qwen/Qwen1.5-14B | -| qwen_w72b | qwen/Qwen1.5-72B | -| qwen_moe_a2.7b | qwen/Qwen1.5-MoE-A2.7B | - +| falcon_7b_f16 | tiiuae/falcon-7b | +| falcon_7b_bf16 | tiiuae/falcon-7b | +| falcon_40b_f16 | tiiuae/falcon-40b | +| falcon_40b_bf16 | tiiuae/falcon-40b | +| falcon_180b_f16 | tiiuae/falcon-180b | +| falcon_180b_bf16 | tiiuae/falcon-180b | +| flux_dev_f16 | black-forest-labs/FLUX.1-dev | +| flux_dev_bf16 | black-forest-labs/FLUX.1-dev | +| flux_schnell_f16 | black-forest-labs/FLUX.1-schnell | +| flux_schnell_bf16 | black-forest-labs/FLUX.1-schnell | +| llama_v1_f16 | Narsil/amall-7b | +| llama_v1_bf16 | Narsil/amall-7b | +| llama_v2_f16 | meta-llama/Llama-2-7b-hf | +| llama_v2_bf16 | meta-llama/Llama-2-7b-hf | +| llama_solar_10_7b_f16 | upstage/SOLAR-10.7B-v1.0 | +| llama_solar_10_7b_bf16 | upstage/SOLAR-10.7B-v1.0 | +| llama_tiny_llama_1_1b_chat_f16 | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | +| llama_tiny_llama_1_1b_chat_bf16 | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | +| llama3_8b_f16 | meta-llama/Meta-Llama-3-8B | +| llama3_8b_bf16 | meta-llama/Meta-Llama-3-8B | +| llama3_instruct_8b_f16 | meta-llama/Meta-Llama-3-8B-Instruct | +| llama3_instruct_8b_bf16 | meta-llama/Meta-Llama-3-8B-Instruct | +| llama3_70b_f16 | meta-llama/Meta-Llama-3-70B | +| llama3_70b_bf16 | meta-llama/Meta-Llama-3-70B | +| mamba_130m_f16 | state-spaces/mamba-130m | +| mamba_130m_bf16 | state-spaces/mamba-130m | +| mamba_370m_f16 | state-spaces/mamba-370m | +| mamba_370m_bf16 | state-spaces/mamba-370m | +| mamba_790m_f16 | state-spaces/mamba-790m | +| mamba_790m_bf16 | state-spaces/mamba-790m | +| mamba_1-4b_f16 | state-spaces/mamba-1.4b | +| mamba_1-4b_bf16 | state-spaces/mamba-1.4b | +| mamba_2-8b_f16 | state-spaces/mamba-2.8b | +| mamba_2-8b_bf16 | state-spaces/mamba-2.8b | +| mistral_7bv01_f16 | mistralai/Mistral-7B-v0.1 | +| mistral_7bv01_bf16 | mistralai/Mistral-7B-v0.1 | +| mistral_7bv02_f16 | mistralai/Mistral-7B-v0.2 | +| mistral_7bv02_bf16 | mistralai/Mistral-7B-v0.2 | +| mistral_7b-instruct-v01_f16 | mistralai/Mistral-7B-Instruct-v0.1 | +| mistral_7b-instruct-v01_bf16 | mistralai/Mistral-7B-Instruct-v0.1 | +| mistral_7b-instruct-v02_f16 | mistralai/Mistral-7B-Instruct-v0.2 | +| mistral_7b-instruct-v02_bf16 | mistralai/Mistral-7B-Instruct-v0.2 | +| mixtral_8x7b_f16 | mistralai/Mixtral-8x7B-v0.1 | +| mixtral_8x7b_bf16 | mistralai/Mixtral-8x7B-v0.1 | +| phi_3-mini_f16 | microsoft/Phi-3-mini-4k-instruct | +| phi_3-mini_bf16 | microsoft/Phi-3-mini-4k-instruct | #### Atoma's request submission -To submit a request to the Atoma network, a user can run the following command: -TODO: replace with `text` or `image` requests. +##### Text Prompt Request + +To submit a text prompt request to the Atoma network, say on Llama3.18b instruct model, while sampling 3 nodes for verifiability, a user can run the following command: + +```sh +./cli gate send-prompt-to-ipfs \ + --package "your package id can be found when publishing" \ + --model "llama3_8b_instruct" \ + --prompt "YOUR_PROMPT" \ + --max-tokens 512 \ + --max-fee-per-token 1 \ + --nodes-to-sample 3 +``` + +The above command will submit a text prompt request to the Atoma network and print the corresponding transaction digest, the output text will be stored on IPFS and the user can retrieve it with the correct IPFS `cid`. We also +support storage on Gateway. To do so, the user can run the following command: + +```sh +./cli gate send-prompt-to-gateway \ + --package "your package id can be found when publishing" \ + --model "llama3_8b_instruct" \ + --prompt "YOUR_PROMPT" \ + --max-tokens 512 \ + --max-fee-per-token 1 \ + --gateway-user-id "YOUR_GATEWAY_USER_ID" \ + --nodes-to-sample 3 +``` + +Where you need to provide your Gateway user ID, which you have set once registering to Atoma Gateway portal. + +##### Image Prompt Request + +###### Image Prompt Request to IPFS + +To submit an image prompt request to the Atoma network, say on Flux-dev model, while sampling 3 nodes for verifiability, a user can run the following command: + +```sh +./cli gate send-image-prompt-to-ipfs \ + --package "your package id can be found when publishing" \ + --model "flux_dev" \ + --prompt "YOUR_PROMPT" \ + --height 512 \ + --width 512 \ + --max_fee_per_input_token 1 \ + --max_fee_per_output_token 1 \ + --nodes-to-sample 3 +``` + +where `max_fee_per_input_token` and `max_fee_per_output_token` are the maximum fees to be paid to nodes per text input token and output image pixel, respectively. + +###### Image Prompt Request to Gateway + +To submit an image prompt request to the Atoma network, say on Flux-dev model, while sampling 3 nodes for verifiability, a user can run the following command: ```sh -./cli gate submit-tell-me-a-joke-prompt \ +./cli gate send-image-prompt-to-gateway \ --package "your package id can be found when publishing" \ - --model "llama" + --model "flux_dev" \ + --prompt "YOUR_PROMPT" \ + --height 512 \ + --width 512 \ + --max_fee_per_input_token 1 \ + --max_fee_per_output_token 1 \ + --gateway-user-id "YOUR_GATEWAY_USER_ID" \ + --nodes-to-sample 3 ``` +Where you need to provide your Gateway user ID, which you have set once registering to Atoma Gateway portal. +