Skip to content

Commit

Permalink
docs: add docs (#1)
Browse files Browse the repository at this point in the history
* docs: add docs
- quickstart(zh)
- introduction(zh,en)
- blogs(zh)

Signed-off-by: 赵安家 <[email protected]>
  • Loading branch information
anjia0532 authored Feb 24, 2022
1 parent b975b05 commit e8b0d50
Show file tree
Hide file tree
Showing 60 changed files with 8,666 additions and 6,372 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ npm-debug.log*
yarn-debug.log*
yarn-error.log*
package-lock.json
.idea/
325 changes: 325 additions & 0 deletions blog/2018-12-18-china-mobile-practice/index.md

Large diffs are not rendered by default.

157 changes: 157 additions & 0 deletions blog/2020-10-23-announcing-nydus/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
title: Introducing Dragonfly Container Image Service
authors:
- name: Peng Tao
- name: Liu Bo
tags: [dragonfly, container image, OCI, nydus]
description: Announcing Dragonfly container image service to greatly improve container startup pulling speed and image access security.
hide_table_of_contents: false
---

## Introducing Dragonfly Container Image Service

### Small is Fast, Large is Slow

With containers, it is relatively fast to deploy web apps, mobile backends, and API services right out of the box.
Why? Because the container images they use are generally small (hundreds of MBs).

A larger challenge is deploying applications with a huge container image (several GBs). It takes a good amount of
time to have these images ready to use. We want the time spent shortened to a certain extent to leverage the powerful
container abstractions to run and scale the applications fast.

Dragonfly has been doing well at distributing container images. However, users still have to download an entire
container image before creating a new container.
Another big challenge is arising security concerns about container image.

Conceptually, we pack application's environment into a single image that is more easily shared with consumers.
Image is then put into a filesystem locally on top of which an application can run. The pieces that are now being
launched as nydus are the culmination of the years of work and experience of our team in building filesystems.
Here we introduce the [dragonfly image service (codename nydus)](https://github.com/dragonflyoss/image-service) as
an extension to the Dragonfly project. It's software that minimizes download time and provides image integrity check
across the whole lifetime of a container, enabling users to manage applications fast and safely.

nydus is co-developed by engineers from Alibaba Cloud and Ant Group. It is widely used in the internal production
deployments. From our experience, we value its container creation speedup and image isolation enhancement the most.
And we are seeing interesting use cases of it from time to time.

### Nydus: Dragonfly Image Service

The nydus project designs and implements an user space filesystem on top of a container image format that improves over
the current OCI image specification. Its key features include:

* Container images are downloaded on demand
* Chunk level data duplication
* Flatten image metadata and data to remove all intermediate layers
* Only usable image data is saved when building a container image
* Only usable image data is downloaded when running a container
* End-to-end image data integrity
* Compactible with the OCI artifacts spec and distribution spec
* Integrated with existing CNCF project dragonfly to support image distribution in large clusters
* Different container image storage backends are supported

Nydus mainly consists of a new containier image format and a FUSE (Filesystem in USErspace) daemon to translate it into
container accessible mountpoint.

![nydus-architecture| center | 768x356](nydus-architecture.png)

The FUSE daemon takes in either [FUSE](https://www.kernel.org/doc/html/latest/filesystems/fuse.html)
or [virtiofs](https://virtio-fs.gitlab.io/) protocol to service POD created by conventional runc
containers or [Kata Containers](https://katacontainers.io/). It supports pulling container image data from container
image registry, [OSS](https://www.alibabacloud.com/product/oss), NAS, as well as Dragonfly supernode and node peers.
It can also optionally use a local directory to cache all container image data to speed up future container creation.

Internally, nydus splits a container image into two parts: a metadata layer and a data layer. The metadata layer is
a self-verifiable [merkle tree](https://en.wikipedia.org/wiki/Merkle_tree). Each file and directory
is a node in the merkle tree with a hash aloneside. A file's hash is the hash of its file content,
and a directory's hash is the hash of all of its descendents. Each file is divided into even sized chunks and saved
in a data layer. File chunks can be shared among different container images by letting file nodes pointing inside
them point to the same chunk location in the shared data layer.

![nydus-format| center | 768x356](nydus-format.png)

### How can you benefit from nydus?

The immediate benefit of running nydus image service is that users can launch containers almost instantly.
In our tests, we found out that nydus can boost container creation from minutes to seconds.

![nydus-performance| center | 768x356](nydus-performance.png)

Another less-obvious but important benefit is runtime data integration check. With OCIv1 container images,
the image data cannot be verified after being unpacked to local directory, which means if some files in the
local directories are undermined either intentionally or not, containers will simply take them as is, incurring
data leaking risk. In contrast, nydus image won't be unpacked to local directory at all, what's more,
given that verification can be enforced on every data access to nydus image, the data leak risk
can be completely avoided by forcing to fetch the data from the trusted image registry again.

![nydus-integraty| center | 768x356](nydus-integrity.png)

### The Future of Nydus

The above examples showcase the power of nydus. For the last year, we've worked alongside the production team,
laser-focused on making nydus stable, secure, easy to use.

Now, as the foundation for nydus has been laid, our new focus is the ecosystem it aims to serve broadly.
We envision a future where users install dragonfly and nydus on their clusters, run containers with large
image as fast they do with regular size image today, and feel confident about the safety of data on their
container image.

### For the community

While we have widely deployed nydus in our production, we believe a proper upgrade to OCI image spec shouldn’t be
built without the community. To this end, we propose nydus as a reference implementation that aligns well with the
OCI image spec v2 proposal [1], and we look forward to working with other industry leaders should
this project come to fruition.

### FAQ

#### Q: What are the challenges with oci image spec v1?

* ["The Road to OCIv2 Images: What's Wrong with Tar?"](https://www.cyphar.com/blog/post/20190121-ociv2-images-i-tar)
written by Aleksa Sarai covers all the challenges with OCIv1, a quick summary of his article is that tar is legacy and
doesn't fit well to be a container image format.

#### Q: How is this different than crfs?

* The basic idea of the two are quite similar. Deep down, the nydus image format supports chunk level data deduplication
and end-to-end data integraty at runtime, which is an improvement over the stargz format used by crfs.

#### Q: How is this different than Teleport of Azure?

* Azure Teleport is like the current OCI image format plus a SMB-enabled snapshotter. It supports container image
lazy-fetching and suffers from all the Tar format defects. OTOH, nydus deprecates the legacy
Tar format and takes advantage of the merkle tree format to provide more advantages over the Tar format.

#### Q: What if network is down while container is running with nydus?

* With OCIv1, container would fail to start at all should network be down while container image is not fully
downloaded. Nydus has changed that a lot because it goes with lazy fetch/load mechanism, a failure in network may
take down a running container. Nydus addresses the problem with a prefetch mechanism which can be configured to
* run in background right after starting a container.

### [1]:OCI Image Specification V2 Requirements

In the mean time, the OCI (Open Container Initiate) community has been actively discussing the emerging of OCI
image spec v2 aiming to address new challenges with oci image spec v1.

Starting from June 2020, the OCI community spent more than a month discussing the requirements for OCI image
specification v2. It is important to notice that OCIv2 is just a marketing term for updating the OCI
specification to better address some use cases. It is not a brand new specification.

The discussion went from an email thread ([Proposal Draft for OCI Image Spec V2](https://groups.google.com/a/opencontainers.org/g/dev/c/Zk3yf45HIdA))
and [a shared document](https://hackmd.io/@cyphar/ociv2-brainstorm) to several OCI community online meetings,
and the result is quite aspiring. The concluded OCIv2 requirements are:

* Reduced Duplication
* Canonical Representation (Reproducible Image Building)
* Explicit (and Minimal) Filesystem Objects and Metadata
* Mountable Filesystem Format
* Bill of Materials
* Lazy Fetch Support
* Extensibility
* Verifiability and/or Repairability
* Reduced Uploading
* Untrusted Storage

For detailed meaning of each requirement, please refer to [the original shared document](https://hackmd.io/@cyphar/ociv2-brainstorm).
We actively joined the community discussions and found out that the nydus project fits nicely to these requirements.
It further encouraged us to opensource the nydus project to help the community discussion with a working code base.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added blog/2020-10-23-announcing-nydus/nydus-format.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
76 changes: 76 additions & 0 deletions blog/2022-01-17-containerd-accepted-nydus-snapshotter/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: Containerd Accepted Nydus-snapshotter
author: Changwei Ge
tags: [dragonfly, container image, OCI, nydus, nydus-snapshotter, containerd]
description: Announcing Containerd accepted nydus-snapshotter as a sub-project
hide_table_of_contents: false
---


## Containerd Accepted Nydus-snapshotter

Early January, Containerd community has taken in nydus-snapshotter as a sub-project. Check out the code,
particular introductions and tutorial from its [new repository](https://github.com/containerd/nydus-snapshotter).
We believe that the donation to containerd will attract more users and developers
for nydus itself and bring much value to the community users.

Nydus-snapshotter is a containerd's remote snapshotter, it works as a standalone process out of containerd,
which only pulls nydus image's bootstrap from remote registry and forks another process called **nydusd**.
Nydusd has a unified architecture, which means it works in form of a FUSE user-space filesystem daemon,
a virtio-fs daemon or a fscache user-space daemon. Nydusd is responsible for fetching data blocks
from remote storage like object storage or standard image registry, thus to fulfill containers'
requests to read its rootfs.

Nydus is an excellent container image acceleration solution which significantly reduces time cost by starting container.
It is originally developed by a virtual team from Alibaba Cloud and Ant Group and deployed in very large scale.
Millions of containers are created based on nydus image each day in Alibaba Cloud and Ant Group.
The underlying technique is a newly designed, container optimized and oriented read-only filesystem named **Rafs**.
Several approaches are provided to create rafs format container image.
The image can be pushed and stored in standard registry since it is compatible with OCI image
and distribution specifications. A nydus image can be converted from a OCI source image where metadata and files
data are split into a "bootstrap" and one or more "blobs" together with necessary manifest.json and config.json.
Development of integration with Buildkit is in progress.

![rafs disk layout](rafs_disk_layout.png)

Nydus provides following key features:

- Chunk level data de-duplication among layers in a single repository to reduce storage, transport and memory cost
- Deleted(whiteout) files in certain layer aren't packed into nydus image, therefore, image size may be reduced
- E2E image data integrity check. So security issues like "Supply Chain Attack" can be avoided and detected at runtime
- Integrated with CNCF incubating project Dragonfly to distribute container images in P2P fashion and mitigate
the pressure on container registries
- Different container image storage backends are supported. For example, Registry, NAS, Aliyun/OSS and applying other
remote storage backend like AWS S3 is also possible.
- Record files access pattern during runtime gathering access trace/log, by which user's abnormal behaviors
are easily caught. So we can ensure the image can be trusted

Beyond above essential features, nydus can be flexibly configured as a FUSE-base user-space filesystem or
in-kernel EROFS with an on-demand loader user-space daemon and integrating nydus with VM-based container
runtime is much easier.

- Lightweight integration with VM-based containers runtime like KataContainers. In fact, KataContainers
is considering supporting nydus as a **native** image acceleration solution.
- Nydus closely cooperates with Linux **in-kernel** disk filesystem Containers' rootfs can directly be set up
by EROFS with lazy pulling capability. The corresponding changes had been merged into Linux kernel since v5.16

To run with runc, nydusd works as FUSE user-space daemon:

![runc nydus](nydus_runc.png)

To work with KataContainers, it works as a virtio-fs daemon:

![kata nydus](nydus_kata.png)

Nydus community is working together with Linux Kernel to develop erofs+fscache based user-space on-demand read.

![runc erofs nydus](nydus_runc_erofs.svg)

Nydus and eStargz developers are working together on a new project named [acceld](https://github.com/goharbor/acceleration-service)
in **Harbor** community to provide a general service to support the conversion from OCI v1 image to kinds of acceleration
image formats for various accelerator providers, so that keep a smooth upgrade from OCI v1 image. In addition to
the conversion service acceld and the conversion tool nydusify, nydus is also supporting buildkit to enable
exporting nydus image directly from Dockerfile as a compression type.

In the future, nydus community will work closely with the containerd community on fast and efficient methods and
solution of distributing container images, container image security, container image content storage efficiency, etc.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 0 additions & 6 deletions blog/2022-02-23-example.md

This file was deleted.

6 changes: 6 additions & 0 deletions blog/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,9 @@ gaius:
title: Maintainer of Dragonfly
url: https://github.com/gaius-qi
image_url: https://avatars.githubusercontent.com/u/15955374?s=96&v=4

anjia0532:
name: anjia0532
title: Maintainer of Dragonfly
url: https://github.com/anjia0532
image_url: https://avatars.githubusercontent.com/u/15098916?s=400&u=23a4b699baa0ed924cf1db40b9edb614d0263621&v=4
97 changes: 95 additions & 2 deletions docs/getting-started/introduction.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,99 @@
---
title: Introduction
sidebar_position: 1
id: introduction
title: What Is Dragonfly?
description: Dragonfly is an intelligent P2P-based image and file distribution tool. It aims to improve the efficiency and success rate of file transferring, and maximize the usage of network bandwidth, especially for the distribution of larget amounts of data, such as application distribution, cache distribution, log distribution, and image distribution.
slug: /
---

## What is Dragonfly?
## What Is Dragonfly?

Dragonfly is an intelligent P2P-based image and file distribution tool. It aims to improve the efficiency and success
rate of file transferring, and maximize the usage of network bandwidth, especially for the distribution of larget
amounts of data, such as application distribution, cache distribution, log distribution, and image distribution.

At Alibaba, every month Dragonfly is invoked two billion times and distributes 3.4PB of data. Dragonfly has become one
of the most important pieces of infrastructure at Alibaba.

While container technologies makes DevOps life easier most of the time, it surely brings some challenges: for example
the efficiency of image distribution, especially when you have to replicate image distribution on several hosts.

Dragonfly works extremely well with both Docker and [PouchContainer](https://github.com/alibaba/pouch) in this scenario.
It's also compatible with containers of other formats. It delivers up to 57 times the throughput of native docker
and saves up to 99.5% of the out bandwidth of registry.

Dragonfly makes it simple and cost-effective to set up, operate, and scale any kind of file, image,
or data distribution.

## Why Dragonfly

This project is an open-source version of the Dragonfly used at Alibaba. It has the following features:

**Note:** More Alibaba-internal features will be made available to open-source users soon. Stay tuned!

- **P2P-based file distribution**: By using the P2P technology for file transmission, it makes the most out of the
bandwidth resources of each peer to improve downloading efficiency, and saves a lot of cross-IDC bandwidth,
especially the costly cross-board bandwidth.
- **Non-invasive support to all kinds of container technologies**: Dragonfly can seamlessly support various containers
for distributing images.
- **Host level speed limit**: In addition to rate limit for the current download task like many other downloading tools
(for example wget and curl), Dragonfly also provides rate limit for the entire host.
- **Passive CDN**: The CDN mechanism can avoid repetitive remote downloads.
- **Strong consistency**: Dragonfly can make sure that all downloaded files are consistent even if users do not provide
any check code (MD5).
- **Disk protection and highly efficient IO**: Prechecking disk space, delaying synchronization, writing file blocks
in the best order, isolating net-read/disk-write, and so on.
- **High performance**: SuperNode is completely closed-loop, which means that it doesn't rely on any database
or distributed cache, processing requests with extremely high performance.
- **Auto-isolation of Exception**: Dragonfly will automatically isolate exception nodes (peer or SuperNode)
to improve download stability.
- **No pressure on file source**: Generally, only a few SuperNodes will download files from the source.
- **Support standard HTTP header**: Support submitting authentication information through HTTP header.
- **Effective concurrency control of Registry Auth**: Reduce the pressure on the Registry Auth Service.
- **Simple and easy to use**: Very few configurations are needed.

## How Does It Stack Up Against Traditional Solution?

We carried out an experiment to compare the performance of Dragonfly and wget.

|Test Environment ||
|---|---|
|Dragonfly Server|2 * (24-Core 64GB-RAM 2000Mb/s)|
|File Source Server|2 * (24-Core 64GB-RAM 2000Mb/s)|
|Client|4-Core 8GB-RAM 200Mb/s|
|Target File Size|200MB|
|Experiment Date|April 20, 2016|

The expeirment result is as shown in the following figure.

![How it stacks up](/img/docs/intro/performance.png)

As you can see in the chart, for Dragonfly, no matter how many clients are downloading, the average downloading
time is always about 12 seconds. But for wget, the downloading time keeps increasing with the number of clients.
When the number of wget clients reaches 1,200, the file source crashed and therefore cannot serve any client.

## How Does It Work?

Dragonfly works slightly differently when downloading general files and downloading container images.

### Downloading General Files

The SuperNode plays the role of CDN and schedules the transfer of blocks between each peer. dfget is the P2P client,
which is also called a "peer". It's mainly used to download and share blocks.

![Downloading General Files](/img/docs/intro/dfget.png)

### Downloading Container Images

Registry is similar to the file server above. dfget proxy is also called dfdaemon, which intercepts HTTP requests
from docker pull or docker push, and then decides which requests to process with dfget.

![Downloading Container Images](/img/docs/intro/dfget-combine-container.png)

### Downloading Blocks

Every file is divided into multiple blocks, which are transferred between peers. Each peer is a P2P client.
The SuperNode will check if the corresponding file exists in the local disk. If not,
the file will be downloaded into SuperNode from the file server.

![How file blocks are downloaded](/img/docs/intro/distributing.png)
Loading

0 comments on commit e8b0d50

Please sign in to comment.