arangodb · Simran-B · Apr 25, 2025 · Mar 27, 2025 · Mar 27, 2025 · Mar 27, 2025
diff --git a/site/content/3.11/components/tools/arango-datasets.md b/site/content/3.11/components/tools/arango-datasets.md
@@ -3,9 +3,9 @@ title: ArangoDB Datasets
 menuTitle: ArangoDB Datasets
 weight: 60
 description: >-
-  `arango_datasets` is a Python package for loading sample datasets into ArangoDB
+  `arango-datasets` is a Python package for loading sample datasets into ArangoDB
 ---
-You can use the `arango_datasets` package in conjunction with the `python-arango`
+You can use the `arango-datasets` package in conjunction with the `python-arango`
 driver to load example data into your ArangoDB deployments. The data is hosted
 on AWS S3. There are a number of existing datasets already available and you can
 view them by calling the `list_datasets()` method as shown below.
@@ -24,7 +24,7 @@ You can find the source code repository of the module on GitHub:
 
 ## Usage
 
-Once you have installed the `arango_datasets` package, you can use it to
+Once you have installed the `arango-datasets` package, you can use it to
 download and import datasets into your deployment with `arango_datasets.Datasets`.
 
 The `Datasets` constructor requires a valid [python-arango](../../develop/drivers/python.md)

diff --git a/site/content/3.11/data-science/arangographml/_index.md b/site/content/3.11/data-science/arangographml/_index.md
@@ -7,33 +7,37 @@ description: >-
 aliases:
   - graphml
 ---
-Traditional machine learning overlooks the connections and relationships
+Traditional Machine Learning (ML) overlooks the connections and relationships
 between data points, which is where graph machine learning excels. However,
 accessibility to GraphML has been limited to sizable enterprises equipped with
-specialized teams of data scientists. ArangoGraphML, on the other hand,
-simplifies the utilization of GraphML, enabling a broader range of personas to
-extract profound insights from their data.
+specialized teams of data scientists. ArangoGraphML simplifies the utilization of GraphML,
+enabling a broader range of personas to extract profound insights from their data.
 
 ## How GraphML works
 
-GraphML focuses on the utilization of neural networks specifically for
-graph-related tasks. It is well-suited for addressing vague or fuzzy problems
-and facilitating their resolution. The process involves incorporating a graph's
-topology (node and edge structure) and the node and edge characteristics and
-features to create a numerical representation known as an embedding.
+Graph machine learning leverages the inherent structure of graph data, where
+entities (nodes) and their relationships (edges) form a network. Unlike
+traditional ML, which primarily operates on tabular data, GraphML applies
+specialized algorithms like Graph Neural Networks (GNNs), node embeddings, and
+link prediction to uncover complex patterns and insights.
+
+1. **Graph Construction**:
+   Raw data is transformed into a graph structure, defining nodes and edges based
+   on real-world relationships.
+2. **Featurization**:
+   Nodes and edges are enriched with features that help in training predictive models.
+3. **Model Training**:
+  Machine learning techniques are applied on GNNs to identify patterns and make predictions.
+4. **Inference & Insights**:
+   The trained model is used to classify nodes, detect anomalies, recommend items,
+   or predict future connections.
+
+ArangoGraphML streamlines these steps, providing an intuitive and scalable
+framework to integrate GraphML into various applications, from fraud detection
+to recommendation systems.
 
 ![GraphML Embeddings](../../../images/GraphML-Embeddings.webp)
 
-Graph Neural Networks (GNNs) are explicitly designed to learn meaningful
-numerical representations, or embeddings, for nodes and edges in a graph.
-
-By applying a series of steps, GNNs effectively create graph embeddings,
-which are numerical representations that encode the essential information
-about the nodes and edges in the graph. These embeddings can then be used
-for various tasks, such as node classification, link prediction, and
-graph-level classification, where the model can make predictions based on the
-learned patterns and relationships within the graph.
-
 ![GraphML Workflow](../../../images/GraphML-How-it-works.webp)
 
 It is no longer necessary to understand the complexities involved with graph
@@ -45,71 +49,133 @@ The platform comes preloaded with all the tools needed to prepare your graph
 for machine learning, high-accuracy training, and persisting predictions back
 to the database for application use.
 
-### Classification
-
-Node classification is a natural fit for graph databases as it can leverage
-existing graph analytics insights during model training. For instance, if you
-have performed some community detection, potentially using ArangoDB's built-in
-Pregel support, you can use these insights as inputs for graph machine learning. 
-
-#### What is Node Classification
-
-The goal of node classification is to categorize the nodes in a graph based on
-their neighborhood connections and characteristics in the graph. Based on the
-behaviors or patterns in the graph, the Graph Neural Network (GNN) will be able
-to learn what makes a node belong to a category.
-
-Node classification can be used to solve complex problems such as:
-- Entity Categorization 
-  - Email
-  - Books
-  - WebPage
-  - Transaction
-- Social Networks
-  - Events
-  - Friends
-  - Interests
-- BioPharmaceutical
-  - Protein-protein interaction
-  - Drug Categorization
-  - Sequence grouping
-- Behavior
-  - Fraud 
-  - Purchase/decision making
-  - Anomaly 
-
-Many use cases can be solved with node classification. With many challenges,
-there are multiple ways to attempt to solve them, and that's why the
-ArangoGraphML node classification is only the first of many techniques to be
-introduced. You can sign up to get immediate access to our latest stable
-features and also try out other features included in the pipeline, such as
-embedding similarity or link prediction.
-
-For more information, [get in touch](https://www.arangodb.com/contact/)
-with the ArangoDB team.
-
-### Metrics and Compliance
-
-#### Training Performance
-
-Before using a model to provide predictions to your application, there needs
-to be a way to determine its level of accuracy. Additionally, a mechanism must
-be in place to ensure the experiments comply with auditor requirements.
-
-ArangoGraphML supports these objectives by storing all relevant training data
-and metrics in a metadata graph, which is only available to you and is never
-viewable by ArangoDB. This metagraph contains valuable training metrics such as
-average accuracy (the general metric for determining model performance), F1,
-Recall, Precision, and confusion matrix data. This graph links all experiments
+## Supported Tasks
+
+### Node Classification
+
+Node classification is a **supervised learning** task where the goal is to
+predict the label of a node based on both its own features and its relationships
+within the graph. It requires a set of labeled nodes to train a model, which then
+classifies unlabeled nodes based on learned patterns.
+
+**How it works in ArangoGraphML**
+
+- A portion of the nodes in a graph is labeled for training.
+- The model learns patterns from both **node features** and
+  **structural relationships** (neighboring nodes and connections).
+- It predicts labels for unlabeled nodes based on these learned patterns.
+
+**Example Use Cases**
+
+- **Fraud Detection in Financial Networks**
+  - **Problem:** Fraudsters often create multiple accounts or interact within
+    suspicious clusters to evade detection.
+  - **Solution:** A transaction graph is built where nodes represent users and
+    edges represent transactions. The model learns patterns from labeled
+    fraudulent and legitimate users, detecting hidden fraud rings based on
+    **both user attributes and transaction relationships**.
+
+- **Customer Segmentation in E-Commerce & Social Media**
+  - **Problem:** Businesses need to categorize customers based on purchasing
+    behavior and engagement.
+  - **Solution:** A graph is built where nodes represent customers and edges
+    represent interactions (purchases, reviews, social connections). The model
+    predicts the category of each user based on how similar they are to other users
+    **not just by their personal data, but also by how they are connected to others**.
+
+- **Disease Classification in Biomedical Networks**
+  - **Problem:** Identifying proteins or genes associated with a disease.
+  - **Solution:** A protein interaction graph is built where nodes are proteins
+    and edges represent biochemical interactions. The model classifies unknown
+    proteins based on their interactions with known disease-related proteins,
+    rather than just their individual properties.
+
+### Node Embedding Generation
+
+Node embedding is an **unsupervised learning** technique that converts nodes
+into numerical vector representations, preserving their **structural relationships**
+within the graph. Unlike simple feature aggregation, node embeddings
+**capture the influence of neighboring nodes and graph topology**, making
+them powerful for downstream tasks like clustering, anomaly detection,
+and link prediction. These combinations can provide valuable insights.
+Consider using [ArangoDB's Vector Search](https://arangodb.com/2024/11/vector-search-in-arangodb-practical-insights-and-hands-on-examples/)
+capabilities to find similar nodes based on their embeddings.
+
+**Feature Embeddings versus Node Embeddings**
+
+**Feature Embeddings** are vector representations derived from the attributes or
+features associated with nodes. These embeddings aim to capture the inherent
+characteristics of the data. For example, in a social network, a
+feature embedding might encode user attributes like age, location, and
+interests. Techniques like **Word2Vec**, **TF-IDF**, or **autoencoders** are
+commonly used to generate such embeddings.
+
+In the context of graphs, **Node Embeddings** are a
+**combination of a node's feature embedding and the structural information from its connected edges**.
+Essentially, they aggregate both the node's attributes and the connectivity patterns
+within the graph. This fusion helps capture not only the individual properties of
+a node but also its position and role within the network.
+
+**How it works in ArangoGraphML**
+
+- The model learns an embedding (a vector representation) for each node based on its
+  **position within the graph and its connections**.
+- It **does not rely on labeled data** – instead, it captures structural patterns
+  through graph traversal and aggregation of neighbor information.
+- These embeddings can be used for similarity searches, clustering, and predictive tasks.
+
+**Example Use Cases**
+
+- **Recommendation Systems (E-commerce & Streaming Platforms)**
+  - **Problem:** Platforms like Amazon, Netflix, and Spotify need to recommend products,
+    movies, or songs.
+  - **Solution:** A user-item interaction graph is built where nodes are users
+    and products, and edges represent interactions (purchases, ratings, listens).
+    **Embeddings encode relationships**, allowing the system to recommend similar
+    items based on user behavior and network influence rather than just individual
+    preferences.
+
+- **Anomaly Detection in Cybersecurity & Finance**
+  - **Problem:** Detecting unusual activity (e.g., cyber attacks, money laundering)
+    in complex networks.
+  - **Solution:** A network of IP addresses, users, and transactions is represented as
+    a graph. Nodes with embeddings that significantly deviate from normal patterns
+    are flagged as potential threats. The key advantage here is that anomalies are
+    detected based on **network structure, not just individual activity logs**.
+
+- **Link Prediction (Social & Knowledge Graphs)**
+  - **Problem:** Predicting new relationships, such as suggesting friends on
+    social media or forecasting research paper citations.
+  - **Solution:** A social network graph is created where nodes are users, and
+    edges represent friendships. **Embeddings capture the likelihood of
+    connections forming based on shared neighborhoods and structural
+    similarities, even if users have never interacted before**.
+
+### Key Differences
+
+| Feature               | Node Classification | Node Embedding Generation  |
+|-----------------------|---------------------|----------------------------|
+| **Learning Type**     | Supervised          | Unsupervised               |
+| **Input Data**        | Labeled nodes       | Graph structure & features |
+| **Output**            | Predicted labels    | Node embeddings (vectors)  |
+| **Key Advantage**     | Learns labels based on node connections and attributes | Learns structural patterns and node relationships |
+| **Use Cases**         | Fraud detection, customer segmentation, disease classification | Recommendations, anomaly detection, link prediction |
+
+ArangoGraphML provides the infrastructure to efficiently train and apply these
+models, helping users extract meaningful insights from complex graph data.
+
+## Metrics and Compliance
+
+ArangoGraphML supports tracking your ML pipeline by storing all relevant metadata
+and metrics in a Graph called ArangoPipe. This is only available to you and is never
+viewable by ArangoDB. This metadata graph links all experiments
 to the source data, feature generation activities, training runs, and prediction
-jobs. Having everything linked across the entire pipeline ensures that, at any
-time, anything done that could be considered associated with sensitive user data,
-it is logged and easily accessible.
+jobs, allowing you to track the entire ML pipeline without having to leave ArangoDB.
 
 ### Security
 
 Each deployment that uses ArangoGraphML has an `arangopipe` database created,
-which houses all this information. Since the data lives with the deployment,
+which houses all ML Metadata information. Since this data lives within the deployment,
 it benefits from the ArangoGraph SOC 2 compliance and Enterprise security features.
 All ArangoGraphML services live alongside the ArangoGraph deployment and are only
-accessible within that organization.
+accessible within that organization.