-
Notifications
You must be signed in to change notification settings - Fork 6
feat: New example transferring data to ORNL DAAC #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
wildintellect
wants to merge
29
commits into
main
Choose a base branch
from
demo/daac-transfer
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
26e2a20
feat: ATL08 to COPC
wildintellect f907a92
fix:PDAL COPC options
wildintellect 506ca91
docs:COPC Conversion notes/todo
wildintellect ef23927
Update copc/pdal_setup.ipynb
wildintellect 0cbc453
Update copc/ATL08_to_COPC.ipynb
wildintellect b9a2aa4
fix: update maap-py function
wildintellect f7f7e8d
chore:pull latest
wildintellect 298dd43
Add files via upload
abarciauskas-bgse 63ef6ed
rename file
abarciauskas-bgse 168d7f3
Add files via upload
abarciauskas-bgse 11c34a9
Add files via upload
abarciauskas-bgse 79bc4b2
Delete edl-token-example.ipynb
abarciauskas-bgse 57d3878
Rename edl-token-example (1).ipynb to edl-token-example.ipynb
abarciauskas-bgse 5bf1f7c
Update perf_testing.ipynb
abarciauskas-bgse ed63859
revert change
abarciauskas-bgse 8e60f86
feat: New example transferring data to ORNL DAAC
wildintellect 82ece45
fix: Address issues from PR #57
wildintellect c1d516c
fix: Remove confusing link to file that's now in this repo
wildintellect 1c2ce12
feat: Readme for COPC example
wildintellect 5657780
Merge pull request #42 from MAAP-Project/feat/copc_atl
wildintellect 1a8f867
Update notebook with instructions
abarciauskas-bgse fc0144b
Merge pull request #53 from MAAP-Project/ab/edl-token-example
abarciauskas-bgse 45577ab
part1&2 testing data migration
sdradsb 0a219c3
new testing data migration script - last cell
sdradsb 7deb2d1
small change to the last cell
sdradsb 8a4f880
Merge pull request #59 from sdradsb/main
sdradsb 24bf30e
feat: New example transferring data to ORNL DAAC
wildintellect cccb88d
fix: Address issues from PR #57
wildintellect 4c37a34
rebase from main
wildintellect File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,232 @@ | ||
| { | ||
wildintellect marked this conversation as resolved.
Show resolved
Hide resolved
wildintellect marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "d1862af9", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Upload to ORNL DAAC\n", | ||
| "\n", | ||
| "This Notebook demonstrates transferring data from MAAP to ORNL DAAC. You need to first identify the correct DAAC to publish your data, and then start the submission process. In this case ORNL DAAC https://daac.ornl.gov/submit/\n", | ||
| "\n", | ||
| "Currently it pushes data, which incurs egress cost, for this particular dataset that was ~$30. In the future we plan to see about having the DAAC pull data between AWS buckets to avoid egress.\n", | ||
| "\n", | ||
| "\n", | ||
| "## Install Rclone\n", | ||
| "\n", | ||
| "On the MAAP ADE you need to have [rclone](https://rclone.org/). We chose rclone because it verifies file integrity on upload, can resume uploads, and supports both S3 and FTPS.\n", | ||
| "\n", | ||
| "```\n", | ||
| "# Install rclone\n", | ||
| "apt install unzip\n", | ||
| "curl https://rclone.org/install.sh | bash\n", | ||
| "```\n", | ||
| "\n", | ||
| "## Setup s3 as source\n", | ||
| "```\n", | ||
| "rclone config\n", | ||
| "\n", | ||
| "# Settings to pick (based on the rclone config file)\n", | ||
| "[s3]\n", | ||
| "type = s3\n", | ||
| "provider = AWS\n", | ||
| "env_auth = true\n", | ||
| "region = us-west-2\n", | ||
| "location_constraint = us-west-2\n", | ||
| "```\n", | ||
| "\n", | ||
| "\n", | ||
| "## Setup DAAC as destination sftp\n", | ||
| "```\n", | ||
| "rclone config\n", | ||
| "\n", | ||
| "# Settings to pick (based on the rclone config file)\n", | ||
| "[ornl]\n", | ||
| "type = ftp\n", | ||
| "host = daacupload.ornl.gov\n", | ||
| "# username is all lowercase, even if you signed up differently\n", | ||
| "user = <username>\n", | ||
| "explicit_tls = true\n", | ||
| "no_check_certificate = true\n", | ||
| "ask_password = true\n", | ||
| "```\n", | ||
| "\n", | ||
| "You can check your rclone config (and save for later)\n", | ||
| "```\n", | ||
| "cat /projects/.config/rclone/rclone.conf\n", | ||
| "```" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "95ee0ed2", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# A Simple test to verify permission and upload destination\n", | ||
| "#!rclone copyto -P s3:nasa-maap-data-store/file-staging/icesat2-boreal/boreal_agb_202302151676439579_1326.tif ornl:/407161fd93/" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "343b787f", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Setup Transfer List\n", | ||
| "\n", | ||
| "Initially we thought we could use a STAC query to select the files necessary for transfer. This is the ideal method since, external groups like DAACs can reliably repeat the same query.\n", | ||
| "\n", | ||
| "In the end however for this particular case the BBOX query was too crude to select the correct full set, so Paul provided a real list in the same format derived in another manner." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "105ef242", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "## You need pystac_client\n", | ||
| "#%pip install pystac_client" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 14, | ||
| "id": "222bea04", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from pystac_client import Client\n", | ||
| "import os" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 4, | ||
| "id": "81fcde17", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "#make a list of granules meeting criteria\n", | ||
| "# https://stac.maap-project.org/collections/icesat2-boreal/items?bbox=-180,51.6,180,78\n", | ||
| "api = Client.open('https://stac.maap-project.org/')\n", | ||
| "\n", | ||
| "granule_results = api.search (\n", | ||
| " max_items=5000,\n", | ||
| " collections=['icesat2-boreal'],\n", | ||
| " bbox=[-180,51.6,180,78]\n", | ||
| ")\n", | ||
| "#save list to text file" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 5, | ||
| "id": "f2359f07", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# create an iterator to get the items\n", | ||
| "test = granule_results.get_all_items()" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 13, | ||
| "id": "026fc208", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# build a list of asset urls\n", | ||
| "assets = [item.assets.get('cog_default').href.replace(\"s3://\",\"\") for item in granule_results.get_all_items()]" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 7, | ||
| "id": "0c482ca5", | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "data": { | ||
| "text/plain": [ | ||
| "3556" | ||
| ] | ||
| }, | ||
| "execution_count": 7, | ||
| "metadata": {}, | ||
| "output_type": "execute_result" | ||
| } | ||
| ], | ||
| "source": [ | ||
| "# check the number of assets selected\n", | ||
| "len(assets)" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": 15, | ||
| "id": "e5501e38", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# convert the asset list to just the basename as save as a text file for rclone to use\n", | ||
| "# Filter to only in the list\n", | ||
| "#https://rclone.org/filtering/#files-from-read-list-of-source-file-names\n", | ||
| "txt_file = 'icesat2_boreal_granules.txt'\n", | ||
| "with open(txt_file, 'w') as filehandle:\n", | ||
| " filehandle.writelines([f\"{os.path.basename(granule)}\\n\" for granule in assets])" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "9d0725af", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Do the Rclone transfer\n", | ||
| "Run this in a terminal (not sure password prompt will work inside a notebook)\n", | ||
| "```\n", | ||
| "rclone copy --dry-run --no-update-modtime -P --files-from icesat2_boreal_granules.txt s3:nasa-maap-data-store/file-staging/icesat2-boreal ornl:/407161fd93/\n", | ||
| "```\n", | ||
| "\n", | ||
| "An updated list of tiles \n", | ||
| "```\n", | ||
| "rclone copy --dry-run --no-update-modtime -P --files-from /projects/shared-buckets/nathanmthomas/boreal_agb_tiles_DAAC.txt s3:nasa-maap-data-store/file-staging/icesat2-boreal ornl:/407161fd93/\n", | ||
| "```\n", | ||
| "\n", | ||
| "Example output\n", | ||
| "```\n", | ||
| "2023-03-17 16:31:52 ERROR : ftp://daacupload.ornl.gov:21/407161fd93: SetModTime is not supported\n", | ||
| "Transferred: 27.839 GiB / 27.839 GiB, 100%, 39.908 MiB/s, ETA 0s\n", | ||
| "Checks: 3556 / 3556, 100%\n", | ||
| "Transferred: 335 / 335, 100%\n", | ||
| "Elapsed time: 11m40.2s\n", | ||
| "```\n", | ||
| "You can ignore the SetModTime error messages." | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python [conda env:root] *", | ||
| "language": "python", | ||
| "name": "conda-root-py" | ||
| }, | ||
| "language_info": { | ||
| "codemirror_mode": { | ||
| "name": "ipython", | ||
| "version": 3 | ||
| }, | ||
| "file_extension": ".py", | ||
| "mimetype": "text/x-python", | ||
| "name": "python", | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython3", | ||
| "version": "3.7.8" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.