Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h5boss version 2 #1

Open
wants to merge 199 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 105 commits
Commits
Show all changes
199 commits
Select commit Hold shift + click to select a range
89ce43a
h5boss subselect
valiantljk Mar 14, 2016
15def44
update
valiantljk Mar 15, 2016
d78cd1e
update
valiantljk Apr 15, 2016
c14db91
update
valiantljk Apr 18, 2016
81ee2fc
update
valiantljk Apr 18, 2016
6c7f37a
fix but in select.py
valiantljk Apr 18, 2016
7385b40
add timing
valiantljk Apr 18, 2016
ba68934
add pmj test
valiantljk Apr 18, 2016
b4f2ee5
add pmj check
valiantljk Apr 18, 2016
b0a02de
add metadata check
valiantljk Apr 18, 2016
47281d9
update parallel converter
valiantljk Apr 25, 2016
56c6a72
update gitignore
valiantljk Apr 25, 2016
c0fdb4f
update add pmf
valiantljk May 2, 2016
ea11b36
h5boss select function
valiantljk May 2, 2016
1e72067
Create README.md
valiantljk May 2, 2016
1875387
boss2hdf5
valiantljk May 2, 2016
fdab5c9
Merge branch 'master' of https://github.com/valiantljk/h5boss-util
valiantljk May 2, 2016
d093808
update
valiantljk May 2, 2016
96bbda4
update select
valiantljk May 3, 2016
3dd8d54
update demo and scritps
valiantljk May 5, 2016
5884ac4
update random
valiantljk May 6, 2016
05a1192
update h5boss subset-selection
valiantljk May 25, 2016
9ea4163
delete h5 files
valiantljk May 25, 2016
117c23e
h5boss_uitl.ipynb
valiantljk May 25, 2016
b8e3045
update
valiantljk May 25, 2016
63c18e7
update xlsx
valiantljk May 26, 2016
731c627
update add
valiantljk May 26, 2016
dca3f76
update
valiantljk May 27, 2016
428ca13
update update.py
valiantljk Jun 1, 2016
d52f82b
update remove.py
valiantljk Jun 2, 2016
bcf07b2
delete files
valiantljk Jun 2, 2016
c8e20ed
update tdd
valiantljk Jun 2, 2016
01d0bda
Update README.md
valiantljk Jun 2, 2016
3b17d1f
Update README.md
valiantljk Jun 2, 2016
99acd41
update test cases
valiantljk Jun 2, 2016
27263ec
Merge branch 'master' of github.com:valiantljk/h5boss-util
valiantljk Jun 2, 2016
7b933ac
update test_remove.py
valiantljk Jun 2, 2016
c73b510
pmf full list and csv read
valiantljk Jun 5, 2016
c147555
random pmf list
valiantljk Jun 5, 2016
19a8766
Update README.md
valiantljk Jun 5, 2016
d72f4f7
Update README.md
valiantljk Jun 5, 2016
1498a3a
Update README.md
valiantljk Jun 5, 2016
41759dd
Update README.md
valiantljk Jun 5, 2016
f04560a
update functions
valiantljk Jun 5, 2016
c515c11
Merge branch 'master' of github.com:valiantljk/h5boss-util
valiantljk Jun 5, 2016
f2e870c
update
valiantljk Jun 6, 2016
a74fcf7
update
valiantljk Jun 6, 2016
7f7bd0f
update multiprocessing subset
valiantljk Jun 24, 2016
eb4bb99
start h5boss select mpi version
valiantljk Jun 24, 2016
09aff6a
add cori script
valiantljk Jun 24, 2016
0892557
update mpi version
valiantljk Jun 24, 2016
3b44d35
added repack option in remove
valiantljk Jun 24, 2016
9018209
update boss2hdf5 parallel
valiantljk Jun 27, 2016
6531811
parallel dev
valiantljk Jul 1, 2016
7ff0346
update parallel meta query
valiantljk Jul 7, 2016
13e3a15
subset function bug fixed, catalog is a 1d table that contains compou…
valiantljk Jul 9, 2016
7d7e6c9
update mpi version
valiantljk Jul 11, 2016
8796ea7
update parallel subset with virtual dataset optimization
valiantljk Jul 11, 2016
15c583e
update
valiantljk Jul 20, 2016
269e250
update h5boss
valiantljk Jul 22, 2016
a67dea6
support parallel read write to shared file
valiantljk Aug 3, 2016
aa400d8
update h5boss
valiantljk Aug 9, 2016
1e621f9
remove h5check
valiantljk Aug 9, 2016
1ba57f1
update
valiantljk Aug 9, 2016
31c3399
update h5boss c
valiantljk Aug 9, 2016
aea23cb
ch5boss parallel and pyh5boss catalog function
valiantljk Aug 12, 2016
6eb624e
update
valiantljk Aug 14, 2016
19de353
change code structure
valiantljk Aug 15, 2016
5e559cf
catalog parallel smart write
valiantljk Aug 18, 2016
5e06af7
update catalog copy
valiantljk Aug 19, 2016
00b5f56
update
valiantljk Aug 19, 2016
cab6926
update catalog write
valiantljk Aug 21, 2016
60b4f2d
update h5bosspy timing
valiantljk Aug 22, 2016
82f73ad
update gitignore
valiantljk Aug 22, 2016
0169d48
before adding catalog c code
valiantljk Aug 25, 2016
47aa06f
fix bug in parsing parameters, break
valiantljk Aug 29, 2016
45790e5
update catalog parsing and read
valiantljk Aug 30, 2016
8170c5f
update catalog read
valiantljk Aug 30, 2016
7ff1c91
update 1 read fields by fiberid, 2 read records by offsets list, 3 wr…
valiantljk Aug 31, 2016
a9120fd
finish h5boss c catalog parallel read write debugging
valiantljk Sep 1, 2016
25768cb
update h5boss python
valiantljk Sep 9, 2016
03b99be
update h5boss map initial version
valiantljk Sep 14, 2016
24d863b
update new format converter
valiantljk Sep 14, 2016
ab53b00
update h5boss format version 2, done
valiantljk Sep 14, 2016
e308057
update h5boss converter v2, fix exposures part, change 1d to 2d array
valiantljk Sep 15, 2016
044cb4b
update boss2hdf5 v1
valiantljk Sep 15, 2016
cafdcb9
update conversion 2 script
valiantljk Sep 21, 2016
8fb7c3e
update h5boss query
valiantljk Sep 24, 2016
c502dca
update checkfiber
valiantljk Sep 24, 2016
9088033
update h5boss v2 query
valiantljk Sep 24, 2016
12b45ed
Update README.md
valiantljk Sep 25, 2016
3ee5cc1
update h5map: get dataset type and shape. fiber template create for n…
valiantljk Sep 25, 2016
6a14020
Merge branch 'master' of github.com:valiantljk/h5boss-util
valiantljk Sep 25, 2016
74612d4
readd h5boss subset v1
valiantljk Sep 25, 2016
fa1b46f
regorganize the files
valiantljk Sep 25, 2016
a7d652b
update subset_mpi.py
valiantljk Sep 25, 2016
dce5894
update h5boss v2 query
valiantljk Sep 26, 2016
8cb15bc
figured out python list buffer reusing issue
valiantljk Sep 26, 2016
23e5f78
create template has bug, start to rewrite fiber copy
valiantljk Sep 26, 2016
63358ba
template creation is ok, parallel fiber copy is under debugging
valiantljk Sep 27, 2016
1a3b6d4
found a bug in datamap, the assumption that flux in different plate/m…
valiantljk Sep 27, 2016
a908031
need to fix the bug in datamap
valiantljk Sep 27, 2016
d8ff2e8
update selectmpi and h5map
valiantljk Sep 27, 2016
6784a14
parallel read for v2 is done
valiantljk Sep 30, 2016
e37d94b
update subset_mpi interface
valiantljk Oct 3, 2016
1066f55
unify the submit_subset script
valiantljk Oct 3, 2016
f4e84bc
achived 2.5X in template creation by turning on latest -libver- option
valiantljk Oct 4, 2016
ef56095
update gitignore
valiantljk Oct 4, 2016
bee3c25
clean h5boss c
valiantljk Oct 4, 2016
b32589e
clean h5boss c parser
valiantljk Oct 4, 2016
f7cbf21
re-org h5boss py
valiantljk Oct 4, 2016
de4c8d8
add converter notebook and fibercheck slurm script
valiantljk Oct 4, 2016
6eed9b2
rename files for version 2
valiantljk Oct 5, 2016
1e106e7
rename files for version 2
valiantljk Oct 5, 2016
045c675
clean scripts
valiantljk Oct 5, 2016
1795a93
update test scripts
valiantljk Oct 5, 2016
2992060
update random pmf script
valiantljk Oct 5, 2016
79e50c8
h5boss clean codes version 0.1
valiantljk Oct 14, 2016
b414ac3
update clean version 0.1
valiantljk Oct 14, 2016
45aec5d
update
valiantljk Oct 14, 2016
f9a2ef2
update
valiantljk Oct 14, 2016
708aade
Update README.md
valiantljk Oct 14, 2016
bd503a0
Update README.md
valiantljk Oct 14, 2016
069169a
update
valiantljk Oct 14, 2016
6d5cf9a
update
valiantljk Oct 14, 2016
b0bc3eb
update
valiantljk Oct 14, 2016
0e51fa0
add setup.py
valiantljk Oct 14, 2016
a8eafe8
update docs for parallel subset
valiantljk Oct 14, 2016
c964de8
update
valiantljk Oct 14, 2016
9aaf5d5
update table contents on the page
valiantljk Oct 14, 2016
fc99d09
update converter
valiantljk Oct 23, 2016
2c14611
update h5boss doc
valiantljk Oct 23, 2016
40396f0
fix boss2hdf5
valiantljk Oct 23, 2016
057a16f
update select
valiantljk Oct 24, 2016
d27182f
update select
valiantljk Oct 24, 2016
98be44f
reverse subset
valiantljk Oct 24, 2016
33c6730
update subset output
valiantljk Oct 24, 2016
66b5786
change select output timing
valiantljk Oct 24, 2016
4820b45
update page of subset
valiantljk Oct 24, 2016
c4a88ed
fix sql
valiantljk Oct 24, 2016
d51d8fc
fix sql
valiantljk Oct 24, 2016
f41363f
fix sql
valiantljk Oct 24, 2016
7b46de8
fix sql
valiantljk Oct 24, 2016
3adca57
fix sql
valiantljk Oct 24, 2016
ff6dd82
fix sql
valiantljk Oct 24, 2016
fe7c73f
fix sql
valiantljk Oct 24, 2016
70d6a49
fix sql
valiantljk Oct 24, 2016
2f18885
fix sql
valiantljk Oct 24, 2016
a9102c3
fix sql
valiantljk Oct 24, 2016
a7d95f5
fix sql
valiantljk Oct 24, 2016
fd4f79d
fix sql
valiantljk Oct 24, 2016
dfb4b27
fix sql
valiantljk Oct 24, 2016
90f7f87
fix sql
valiantljk Oct 24, 2016
16d1135
fix sql
valiantljk Oct 24, 2016
14c508b
fix sql
valiantljk Oct 24, 2016
49f76e3
fix sql
valiantljk Oct 24, 2016
4654068
fix sql
valiantljk Oct 24, 2016
44c4236
fix sql
valiantljk Oct 24, 2016
b64274a
add base file or function(add) demo
valiantljk Oct 24, 2016
2d9e4bc
add base.h5
valiantljk Oct 24, 2016
9465105
update select and sql
valiantljk Oct 24, 2016
583d427
select_update fix
valiantljk Oct 24, 2016
f9b9a8f
fix update
valiantljk Oct 24, 2016
dde7816
fix update
valiantljk Oct 24, 2016
5a2769e
update remove
valiantljk Oct 24, 2016
c94418d
update remove.py
valiantljk Oct 24, 2016
c6d2fea
update timing report
valiantljk Oct 24, 2016
f43e9d6
update timing of repack
valiantljk Oct 24, 2016
55f7da9
repack option
valiantljk Oct 24, 2016
b839e8d
error handling
valiantljk Oct 24, 2016
2d70890
clean codes in select.py
valiantljk Oct 24, 2016
5befd4a
timer report
valiantljk Oct 24, 2016
09a504b
updating docs
valiantljk Oct 24, 2016
5e48475
bug fixed in select_update.py
valiantljk Oct 24, 2016
c804108
update add/subset/update docs
valiantljk Oct 24, 2016
b45ab80
fix convertion code
valiantljk Oct 25, 2016
b876a22
boss2hdf5
valiantljk Oct 25, 2016
0dc1346
boss2hdf5
valiantljk Oct 25, 2016
84e74b5
fix bugs in two converters
valiantljk Oct 25, 2016
0a45939
select.py
valiantljk Oct 25, 2016
d9d4173
update doc for select
valiantljk Oct 25, 2016
324275a
update sample file
valiantljk Oct 25, 2016
ed64567
doc for update
valiantljk Oct 25, 2016
6b6a429
typo in update.rst
valiantljk Oct 25, 2016
c962f4b
update fits2hdf5
valiantljk Oct 27, 2016
b6be763
update fits2hdf
valiantljk Oct 27, 2016
b7fb347
update fits2hdf
valiantljk Oct 27, 2016
c6dc976
update fits2hdf image
valiantljk Oct 27, 2016
7c3b5ad
update fits2hdf
valiantljk Oct 29, 2016
1791541
example code for reading from fits versus from hdf5
valiantljk Nov 3, 2016
38347c0
h5boss missing features
valiantljk Nov 7, 2016
287f6e4
update missing features
valiantljk Nov 7, 2016
a673844
h5boss missing features
valiantljk Nov 7, 2016
2a51d49
update fits2hdf5
valiantljk Nov 17, 2016
0019e0b
update examples for reading multiple HDUs
valiantljk Dec 5, 2016
b7da5c8
typo in h5boss.add function
valiantljk Dec 12, 2016
584f5be
Update README.md
valiantljk Apr 23, 2017
9dc49a6
Update README.md
valiantljk Jul 27, 2017
998e928
Update README.md
valiantljk Jul 27, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .classpath
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<classpath>
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/Python 2.7.10 (/anaconda/bin/python2.7)"/>
<classpathentry exported="true" kind="con" path="org.eclipse.jdt.USER_LIBRARY/h5boss"/>
<classpathentry kind="output" path="out/production/h5boss-util"/>
</classpath>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this .classpath file needed in h5boss, or was it accidentally added to the repo?

38 changes: 36 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,45 @@
*.err
*.out
h5boss_py/demo/*.txt
h5boss_c/*.txt
h5boss_c/checkendian
h5boss_c/checkendian.c
h5boss_c/iniaa
h5boss_c/iniaa.c
h5boss_c/str
h5boss_c/strtoltest
h5boss_c/strtoltest.c
h5boss_py/demo/2979928_10k.out-rank
h5boss_py/demo/2979975_10k.out-rank
h5boss_py/demo/2979984_10k.out-rank
h5boss_py/h5boss/boss2hdf5_v2.ipynb
h5boss_py/demo/log
h5boss_py/demo/drop_cache/drop_file_from_page_cache
h5boss_py/demo/drop_cache/is_file_in_page_cache
h5boss_py/demo/drop_cache/pagesize
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

*.hdf5
*.h5
# C extensions
*.so

h5boss_c/nodes10k.txt
h5boss_c/nodes10k_fiber.txt
h5boss_c/testh5g
h5boss_c/*.exe
h5boss_c/*.o
h5boss_py/demo/drop_file_from_page_cache
h5boss_py/demo/input-full-cori-output
h5boss_py/demo/is_file_in_page_cache
h5boss_py/demo/nodes.txt
h5boss_py/demo/nodes10k.txt
h5boss_py/demo/nodes10k_fiber.txt
h5boss_py/demo/nodes1k.txt
h5boss_py/demo/pagesize
h5boss_py/demo/test1.sh
h5boss_py/demo/pmf-list/*
# Distribution / packaging
.Python
env/
Expand Down
9 changes: 9 additions & 0 deletions .idea/libraries/h5boss.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

124 changes: 124 additions & 0 deletions .idea/uiDesigner.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

25 changes: 23 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,23 @@
# h5boss
Exploratory tools for reformatting BOSS spectra as hdf5 files
# H5Boss
Exploratory tools for managing BOSS spectra.

Boss is originally maintained as millions of fits file in thousands of different folders. Accessing and analyzing them are inefficient in terms of I/O bandwidth and programming productivity. In h5boss, we focus on:

1. Reformatting: Preserve the fits file structure and specture hierarchicy using HDF5
2. Object I/O: Design object interface for accessing the files as pmf indexed object
3. Query Caching: Develop transparent cache for restoring analysis workflow and reducing metadata overhead
4. Subset Selection: Support subset selection, moving various interested fiber object and catalog together and save in one file
4. Data mover: Design API for moving data through various storage tiers.

Currently, h5boss is implemented in both python and c version, in which the python version is actively maintained and supported. The c version is mainly for I/O sensitive users/applications.

#Demo with h5boss_py
1. source cori-setup
2. cd h5boss_py/demo
2. subset -h
3. add -h
4. update -h

#Demo with h5boss_c
1. cd h5boss_c
2. make
5 changes: 5 additions & 0 deletions cori-bb
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
module load python
module load h5py-parallel
export PYTHONPATH=`pwd`/h5boss_py:$PYTHONPATH
export PATH=`pwd`/h5boss_py/scripts:$PATH
9 changes: 9 additions & 0 deletions cori-conda-serial
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
export PYTHONPATH=`pwd`/h5boss_py:$PYTHONPATH
export PATH=`pwd`/h5boss_py/scripts:$PATH
export basedir=/global/projecta/projectdirs/sdss/data/sdss/dr12/boss/spectro/redux/v5_7_0
export PYTHONPATH=/global/homes/j/jialin/anaconda2/lib/python2.7/site-packages:$PYTHONPATH
export HDF5_PATH=/global/homes/j/jialin/packages/serial-hdf5/hdf5-1.8.16/hdf5path
export H5PY_PATH=/global/homes/j/jialin/packages/h5py-2.6.0/h5pypath
export PYTHONPATH=$H5PY_PATH/lib/python2.7/site-packages:$PYTHONPATH
export PATH=/global/homes/j/jialin/anaconda2/bin:$PATH
5 changes: 5 additions & 0 deletions cori-setup
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
module load python
module load h5py-parallel
export PYTHONPATH=`pwd`/h5boss_py:$PYTHONPATH
export PATH=`pwd`/h5boss_py/scripts:$PATH
6 changes: 6 additions & 0 deletions cori-setup-serial
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
#module load python/3.4-anaconda
module load python/2.7-anaconda
export PYTHONPATH=`pwd`/h5boss_py:$PYTHONPATH
export PATH=`pwd`/h5boss_py/scripts:$PATH
export basedir=/global/projecta/projectdirs/sdss/data/sdss/dr12/boss/spectro/redux/v5_7_0
9 changes: 9 additions & 0 deletions h5boss-util.eml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<?xml version="1.0" encoding="UTF-8"?>
<component inherit-compiler-output="true" jdk="Python 2.7.10 (/anaconda/bin/python2.7)" jdk_type="Python SDK">
<output-test url="file://$MODULE_DIR$/out/test/h5boss-util"/>
<exclude-output/>
<contentEntry url="file://$MODULE_DIR$"/>
<levels>
<level name="h5boss" value="project"/>
</levels>
</component>
121 changes: 0 additions & 121 deletions h5boss/io.py

This file was deleted.

Loading