Skip to content

Conversation

@ijjorama
Copy link
Member

Open logfile for checksum records in XrdCephOss, write to it only in XrdCephPosix.
Move readback attribute logging into ceph_posix_setxattr.
Don't calculate Adler32 checksum unless writing data (unset "streamed" value was being written for readback checksum,
causing values with patterns like 0*[24]0*).
Need to strdup result of concat pool ":" pathname to use as char *path value.
Remove unnecessary definition of ts_rfc3339 function from XrdCephOss.

Jo-stfc pushed a commit that referenced this pull request Mar 18, 2025
* Buffer implementation for XrdCeph

* Better error return code values

* Add timing into BufferIO

* Add timing into BufferSimple

* Utils code area

* Update raw data access and copy

* Adding Extents

* ReadV simple logic

* Add to own files the readV implementations

* Add to own files the readV implementations; cmake updated

* Logging improvements and write buffer updates

* Add IOadapter with blocking aio access

* Use IOadapter with blocking aio access

* Small logging update

* Reduce logging information; fix timeing to ms

* Reduce logging information;

* Reduced logging, and better use of aggregated metrics

* comment clean and typo fixes

* Remove uncessary file close

* Additional logging in case of problems

* Additional logging in case of problems

* allow option for buffering with IO or AIO buffer

Co-authored-by: james <[email protected]>
Co-authored-by: root <[email protected]>

merge variable rpm name into bufferedIO (#19)

* variable rpm name

* Update xrootd-ceph.spec.in

* Update makesrpm.sh

* Update makesrpm.sh

Fixes to remove warnings from devtoolset-9 compilation

Master buffered ceph io (#20)

* Buffer implementation for XrdCeph

* Better error return code values

* Add timing into BufferIO

* Add timing into BufferSimple

* Utils code area

* Update raw data access and copy

* Adding Extents

* ReadV simple logic

* Add to own files the readV implementations

* Add to own files the readV implementations; cmake updated

* Logging improvements and write buffer updates

* Add IOadapter with blocking aio access

* Use IOadapter with blocking aio access

* Small logging update

* Reduce logging information; fix timeing to ms

* Reduce logging information;

* Reduced logging, and better use of aggregated metrics

* comment clean and typo fixes

* Remove uncessary file close

* Additional logging in case of problems

* Additional logging in case of problems

* allow option for buffering with IO or AIO buffer

* fix conflicts

* Allow for finite retries on EBUSY, else fail with EIO.

It is possible for a read/write from the buffer to return EBUSY due to an underlying issue.
In these cases, if the -EBUSY is returned out of XrdCeph, a large number of retries can originate.
It is better at this point for the transfer to be flagged as failed, and retried properly.
The code allows for 5 retries with a 1s sleep between them. If this doesn't work - which it might not -
then an -EIO error is returned to xrootd.
Other error messages are not affected.

* Better summary stats output for CephIOAdapterRaw

* Comment out a comment

Co-authored-by: james <[email protected]>
Co-authored-by: root <[email protected]>

variable version/release for template (#21)

Update bufferedIO with updates from master (#26)

* variable rpm name (#17)

* variable rpm name

* Update xrootd-ceph.spec.in

* Update makesrpm.sh

* Update makesrpm.sh

* Master cephnamelib (#16)

* Allow ceph.namelib to take params and apply translation to full path

* Reduce logging

Remove extraneous logging messages

* simplify parsing of namelib and added a log line for any remapped file

Co-authored-by: James <[email protected]>

* XRD-22 Fix ensuring the correct filename is passed to the CephFile instance. (#24)

A regression in previous commit meant that the filename was not correctly passed
to the CephFile instance. This fix ensures that the filename is set correctly.

Co-authored-by: james <[email protected]>

* re-introduce variable names to spec input (#27)

Co-authored-by: Jo-stfc <[email protected]>
Co-authored-by: James <[email protected]>

Decreased logging for bufferedIO operations. (#25)

Reduced printouts. Only summary stats now produced, rather than the logging per read.

Co-authored-by: James Walder <[email protected]>

Updates from master to buffered io needed for 550 2 (#32)

* XRD-12 Add timestamp information for ceph logging methods

Update the logwrapper method to print out the current timestamp in the initial section of output.

* Return permission denied on write attempt on existing file with EXCL set (#31)

Co-authored-by: James Walder <[email protected]>

* disable posc (#30)

posc is disabled for proxies, but not for a unified setup. XrdCeph does not support the posc flag as it misinterprets objects as folders

Co-authored-by: James Walder <[email protected]>
Co-authored-by: Jo-stfc <[email protected]>

Buffered io multibuffers (#38)

* Add multiple buffer support for reads in case of simultaneous threads reading the same file.

* Further refinements to the simultaneous file reads code

 - Ensure all relevent read / write methods will create a buffer if needed
 - Validty check on close that a buffer was actually created (or bypass code if not)
 - Bugfix in case of odd read sizes combined with multi/split buffer reads (critical)
 - Clean of comments included for development

* Enhanced logging for cluster metrics and readV layer improvments (#35)

- dumpCLusterInfo to check on the rados connection info
  - extra logging in a delete to give info on delete times
  - update the readV basic alg to do a simple bulk request

Co-authored-by: James Walder <[email protected]>

* Add time taken to unlink a file in the logging message

  - Logging an unlink now includes the time taken, in cases of (un)successful deletes
  - Remove some extraneous comments

* - Fix issue with buffer passthrough read
 - Add maximum number of simultaneous buffers for a given file
Once a given number of opens have been made against the same file, don't
create a large buffer, and only create a 1MiB buffer for each new file.
This should avoid issues with small paged reads, but would normally hope the
pasthrough mode would be triggered in each read.

* Additional statistics on buffered reading added.

 - Will report bytes read from ceph, bytes read but bypassed the cache, and the cache hit fraction

---------

Co-authored-by: James Walder <[email protected]>

Bug fix for writes with bufferedIO when extending over buffer range.  (#40)

* Bug fix for writes with bufferedIO when extending over buffer range.
 - Fix for case where multiple writes to the buffer are needed for a given xrd write request
 - Previously threw an error; now will correctly perform the multiple writes as required.
 - Set the Simple Data buffer capacity to the input size, rather than the capacity of the vector, which could be larger.

---------

Co-authored-by: James Walder <[email protected]>

variable rpm name (#17)

* variable rpm name

* Update xrootd-ceph.spec.in

* Update makesrpm.sh

* Update makesrpm.sh

re-introduce variable names to spec input (#27)
Jo-stfc added a commit that referenced this pull request Mar 18, 2025
* variable rpm name (#17)

* variable rpm name

* Update xrootd-ceph.spec.in

* Update makesrpm.sh

* Update makesrpm.sh

* Master cephnamelib (#16)

* Allow ceph.namelib to take params and apply translation to full path

* Reduce logging

Remove extraneous logging messages

* simplify parsing of namelib and added a log line for any remapped file

Co-authored-by: James <[email protected]>

* XRD-22 Fix ensuring the correct filename is passed to the CephFile instance. (#24)

A regression in previous commit meant that the filename was not correctly passed
to the CephFile instance. This fix ensures that the filename is set correctly.

Co-authored-by: james <[email protected]>

* XRD-12 Add timestamp information for ceph logging methods

Update the logwrapper method to print out the current timestamp in the initial section of output.

* re-introduce variable names to spec input (#27)

* Return permission denied on write attempt on existing file with EXCL set (#31)

Co-authored-by: James Walder <[email protected]>

* disable posc (#30)

posc is disabled for proxies, but not for a unified setup. XrdCeph does not support the posc flag as it misinterprets objects as folders

* Disk space reporting (#36)

* Provide XrdCephOss::StatLS and ceph_posix_stat_pool to enable disk space reporting. Responds to the 'xrdfs query space' command as requested by ALICE VO

* Remove ts() timestamp function and unnecessary #defines

* Read ceph.poolnames setting from XRootD config to specify reportable pools.

* Support 'xrdfs spaceinfo' via Stat() method returning XrdOssOK for stat'ing 'pool:'

* Tidy up tracing of Stat* calls

* Remove unwanted method isPathReportablePool

* Add comments for need to support stat-ing '/'

* Return -ENOMEM if malloc fails

* Return -ENOMEM if malloc fails

* Rename disk space reporting config item to ceph,reportingppols and log if the list of names is not present. Report if ceph_posix_stat_pool call to get the amount of used space fails

* Sanitize incoming pool name and allow for MonALISA format

* Optional tracing of Stat* incoming paths and response. Remove double logging of ceph.reporting pools.

* Check that sanitized pool name is not marked invalid

* Use ceph namelib translation at Oss level by copying translateFileName logic from Posix level. More error checking if stat can't find pool name.

* Remove superfluous comments

* Ensure tracing of path arguments to Stat() and StatLS(). Add Doxygen-style commments to changed methods

* Make source tarball only as minimum output

* Add make-src-tar.sh to additionally place required source tarball in '--output' destination

* Change back usedSpace to totalSpace in ceph_posix_statfs

* feat: improve (vector) read implementation (#37)

Try to avoid usage of libradosstriper for readv operations
since it may impact performance significantly. To do so we explicitly
determine the objects that constitute a file and read from them using
rados only. Reads are async.

To do these async reads conveniently we introduce a class for handling
multiple async read requests.

* Initial implementation of ReadV at the XrdOss level

* Correct the signature of ReadV to XrdCephOssFile

* feat: do not use libradosstiper for readv operation

* feat: use atomic operations for readv requests

This should be the most efficient way of handling multiple read ops.

* feat: use nonstriper reads for pread requests

* feat: use nonstriper reads for read operations also

To do so we do complete refactoring: bulkAioRead class moved to a
separate file, and its features extended. Namely, it can do reads
from files, not only objects, now.

* feat: print warning message if waiting for aio reads from ceph takes long

This is useful for debugging the reasons of failures for read(v) requests.

* Added some comments

* fix: use size_t for start_block

We can use "%zx" in sprintf, so let's unify the types of variables in
the function. This will also allow us to extend limitations on the
file size.

* feat: refactor BulkAioRead::read method, suggested during review

1. Rename end_block to last_block
2. Move variable definitions closer to its usage
3. Use 'std::min' instead of 'if' for chunk_len determination
4. Use more efficient chunk_start calculation

* feat: add options to allow one to switch to standard read mechanisms

This may be useful for testing.

* feat: rename block_size to object_size in BulkAioRead

New name better describes reality, since we are talking about the size
of ceph objects.

* feat: rename wait_for_complete to submit_and_wait_for_complete

New name describes this function better.

* feat: use more meaningful names for variables that loops over operations map

op_data should describe the contents of the variables better.

* feat: move type definitions into the class

* feat: added comments with method's description

* feat: remove unnecessary semicolons

* feat: convert wait_for_complete method from void to int

This allow one to improve several things. Here we change key to the
operations and use object number instead of full its name.

* fix: fixed comment

* fix: fixed comments

* feat: refactor bulkAioRead class

Pointers were dropped from objectReadOperation and ceph_bufferlist objects.
The objects are moved to appropriate classes to simplify memory management
and usage.

* feat: take into account completion's return value

We can retrieve return code from completion and get meaningful status
of the whole operation with this value.

* feat: allow reading of sparse file

Since we do not really expect sparse files, we use a fallback mechanism:
if a read(v) failed with -ENOENT exit status, then just resubmit it using
striper-based functions.

* lint: remove trailing whitespaces

* feat: use meaningful names for read(v) functions

The name now indicates whether read(v)s are striper or non-striper
ones.

* feat: fallback to striper-based read if number of stripes > 1

Just in case, such files should not be present in our production setup

* feat: allow zero-sized reads

In principle, this is a correct request, so we should support it.

* fix: make sure we do not delete completion objects until submitted operation is completed

This is done to prevent some nasty side-effects, e.g. writing to a deleted buffer.

* fix: remove move constructor from bulkAioRead

We do not use it.

* fix: handle failure to allocate completion

Completion allocation can fail, we should take that into an account.

* feat: use file reference to construct readOp objects

There is no need to extract (and the copy) file name and object size
from file reference to construct read object, we can use file reference
directly.

* feat: replace conversion operator with explicit method

Implicit conversion was making code less readable.

* feat: remove call to is_complete() in completion wrapper destructor

There is no need to check for completion, we can call wait_for_complete
multiple times.

* feat: put warning threshold to config file

It is better to have this value as configurable instead of hardcoded.

* fix: initialize return code variable in ReadOpData

* Added comment

* feat: add comment for future optimization.

We should use `aio_cancel` to cancel all pending read operations in future.

* fix: remove vim's swp file

Commited by accident

* feat: improve logging

Add file descriptor to sparse file's logging, fix typos.

* fix: minor fixes

Remove unnecessary include, move variable declaration closer to the
usage, fix spelling in the comment.

* feat: BulkAioRead::read method refactoring

Refactoring was made to increase (hopefully) readability.

* fix: better wording for comment

* feat: BulkAioRead::read -- change loop exit condition

We can exit when `to_read == 0`. This allow us to drop `end_block`
variable.

* fix: add call to `clear` after getting results

This is to allow clients to use the same readOp object for future
operations.

---------

Co-authored-by: Ian Johnson <[email protected]>
Co-authored-by: Alexander Rogovskiy <[email protected]>

* duplicate struct definition

* move struct definition to headers

* use bufferedIO version of path

* remove MAXPATHLEN redefinition

---------

Co-authored-by: snafus <[email protected]>
Co-authored-by: James <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Ian Johnson <[email protected]>
Co-authored-by: alex-rg <[email protected]>
Co-authored-by: Alexander Rogovskiy <[email protected]>

Buffered io nonstriperbuffer (xrootd#43)

* Add capability for buffer io raw to use striperless reads

* Add capability for buffer io raw to use striperless reads

* Add a maybe striper for reading in ceph posix

* Use striperless reads when bypassing the buffer

feat: improve (vector) read implementation (#37)

Try to avoid usage of libradosstriper for readv operations
since it may impact performance significantly. To do so we explicitly
determine the objects that constitute a file and read from them using
rados only. Reads are async.

To do these async reads conveniently we introduce a class for handling
multiple async read requests.

* Initial implementation of ReadV at the XrdOss level

* Correct the signature of ReadV to XrdCephOssFile

* feat: do not use libradosstiper for readv operation

* feat: use atomic operations for readv requests

This should be the most efficient way of handling multiple read ops.

* feat: use nonstriper reads for pread requests

* feat: use nonstriper reads for read operations also

To do so we do complete refactoring: bulkAioRead class moved to a
separate file, and its features extended. Namely, it can do reads
from files, not only objects, now.

* feat: print warning message if waiting for aio reads from ceph takes long

This is useful for debugging the reasons of failures for read(v) requests.

* Added some comments

* fix: use size_t for start_block

We can use "%zx" in sprintf, so let's unify the types of variables in
the function. This will also allow us to extend limitations on the
file size.

* feat: refactor BulkAioRead::read method, suggested during review

1. Rename end_block to last_block
2. Move variable definitions closer to its usage
3. Use 'std::min' instead of 'if' for chunk_len determination
4. Use more efficient chunk_start calculation

* feat: add options to allow one to switch to standard read mechanisms

This may be useful for testing.

* feat: rename block_size to object_size in BulkAioRead

New name better describes reality, since we are talking about the size
of ceph objects.

* feat: rename wait_for_complete to submit_and_wait_for_complete

New name describes this function better.

* feat: use more meaningful names for variables that loops over operations map

op_data should describe the contents of the variables better.

* feat: move type definitions into the class

* feat: added comments with method's description

* feat: remove unnecessary semicolons

* feat: convert wait_for_complete method from void to int

This allow one to improve several things. Here we change key to the
operations and use object number instead of full its name.

* fix: fixed comment

* fix: fixed comments

* feat: refactor bulkAioRead class

Pointers were dropped from objectReadOperation and ceph_bufferlist objects.
The objects are moved to appropriate classes to simplify memory management
and usage.

* feat: take into account completion's return value

We can retrieve return code from completion and get meaningful status
of the whole operation with this value.

* feat: allow reading of sparse file

Since we do not really expect sparse files, we use a fallback mechanism:
if a read(v) failed with -ENOENT exit status, then just resubmit it using
striper-based functions.

* lint: remove trailing whitespaces

* feat: use meaningful names for read(v) functions

The name now indicates whether read(v)s are striper or non-striper
ones.

* feat: fallback to striper-based read if number of stripes > 1

Just in case, such files should not be present in our production setup

* feat: allow zero-sized reads

In principle, this is a correct request, so we should support it.

* fix: make sure we do not delete completion objects until submitted operation is completed

This is done to prevent some nasty side-effects, e.g. writing to a deleted buffer.

* fix: remove move constructor from bulkAioRead

We do not use it.

* fix: handle failure to allocate completion

Completion allocation can fail, we should take that into an account.

* feat: use file reference to construct readOp objects

There is no need to extract (and the copy) file name and object size
from file reference to construct read object, we can use file reference
directly.

* feat: replace conversion operator with explicit method

Implicit conversion was making code less readable.

* feat: remove call to is_complete() in completion wrapper destructor

There is no need to check for completion, we can call wait_for_complete
multiple times.

* feat: put warning threshold to config file

It is better to have this value as configurable instead of hardcoded.

* fix: initialize return code variable in ReadOpData

* Added comment

* feat: add comment for future optimization.

We should use `aio_cancel` to cancel all pending read operations in future.

* fix: remove vim's swp file

Commited by accident

* feat: improve logging

Add file descriptor to sparse file's logging, fix typos.

* fix: minor fixes

Remove unnecessary include, move variable declaration closer to the
usage, fix spelling in the comment.

* feat: BulkAioRead::read method refactoring

Refactoring was made to increase (hopefully) readability.

* fix: better wording for comment

* feat: BulkAioRead::read -- change loop exit condition

We can exit when `to_read == 0`. This allow us to drop `end_block`
variable.

* fix: add call to `clear` after getting results

This is to allow clients to use the same readOp object for future
operations.

---------

Co-authored-by: Ian Johnson <[email protected]>
Co-authored-by: Alexander Rogovskiy <[email protected]>

Update XrdCephBufferAlgSimple.cc (xrootd#45)

Remove verbose logging for case when cache is bypassed, as the read size is at least the size of the buffer.

XRD-22 Fix ensuring the correct filename is passed to the CephFile instance. (#24)

A regression in previous commit meant that the filename was not correctly passed
to the CephFile instance. This fix ensures that the filename is set correctly.

Co-authored-by: james <[email protected]>
amadio and others added 27 commits April 11, 2025 09:04
Follow recommendation from
https://cmake.org/cmake/help/latest/command/install.html and make the
destination path relative
Mac OS X's bash does not support associative arrays.  By going to
more primitive indexed arrays (which are admittedly less pretty),
we can enable the TPC test on Mac OS X, broadening the overall test
coverage.
The type definition of the throttle class makes it appear as if the
`Features` function is a virtual that can be forwarded along; this
is not true.  Instead, it's defined in the base and accesses a
protected member.  That must be copied forward to have the right
features reported.

This was noticed when a throttle plugin was run on top of a cache:
the cache bit got dropped, causing the `Age` header to go missing.
The `Features()` function was implemented as if it was an override
but it is not a virtual function in the base class; this caused a
real bug in the combination of the Throttle and Pfc.

Markup the throttle class with overrides to prevent such a bug
from recurring.
This adds a simple integration test that downloads a file from a
XCache server through another fixture as an origin, then verifies
the file is unchanged.
This makes the placement of binaries and libraries independent of
the directory structure of the CMake build system, which makes it
easier to set paths to client and server binaries in tests, as well
as ensuring that libraries and plugins from the build itself are
always used when running the test suite. Moreover, this allows to
migrate the build system to use more idiomatic CMake by using, for
example, add_subdirectory(Xrd*) and src/Xrd*/CMakeLists.txt instead
of the current include(Xrd*) and src/Xrd*.cmake, to keep plugins in
the same place as other libraries.

Note: These variables must be set in the CACHE to allow Python builds
using setup.py to override them, so that libraries are placed in the
correct directory for later installation by pip.
This is the first version that provides the multi interface, so it
allows us to remove checks and ifdefs for this from the project.
This is to make the headers, source files, namespaces, etc all the
same name as the plugin itself, for consistency.
dynamic-entropy and others added 30 commits September 9, 2025 22:24
Move operations from a manager should succed
after redirection
- Add tests for rename operation with HTTP
  - MOVE operation is currently fails
	after redirection
  - Thus the tests assert against http code 401;
    a fix would change it to 201 instead
  - For redirection from meta-manager; move failes
	for a manager node
…peration; this allows rename after redirections
uint16_t to time_t. Since time_t can be 64 bits the Python bindings
need to use unsigned long long.
We test for overflow, but now timeouts use time_t, which may be
64 bits wide, so use 2^65 to force an overflow in that case too.
Reverts commit eb4295e as the hack
introduced there is no longer needed after the current fix.

Fixes: xrootd#2357
This adds support to get/put void* values in the client environment.
Since pointers are only defined during runtime, no support for shell
import is provided for pointers.

Issue: xrootd#2522
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.