-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add internal URI handling API #19073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first remarks. Did not yet look at everything.
static zend_string *parse_url_uri_to_string(void *uri, uri_recomposition_mode_t recomposition_mode, bool exclude_fragment) | ||
{ | ||
ZEND_UNREACHABLE(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to simply NULL
the pointer in the uri_handler_t
struct instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the same comment from @DanielEScherzer in the original PR, and I wrote him that I would like to avoid making the handlers optional if possible, because this way the existence of the handlers don't have to be checked before their usage - it's advantageous both for maintainability and performance.
The parse_url based implementation is special because it's not directly exposed for userland - it's just an internal URI "backend" for BC, and these handlers aren't necessarily needed for now. We could of course expose the to_string
handlers later for 3rd party extensions if we want to. Then the code should probably be changed to something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function that triggers undefined behavior when called (this is what ZEND_UNREACHABLE implies for production builds) and not having a function (i.e. dereferencing a NULL pointer when trying to call the function) are functionally the same. In both cases the PHP binary will do something bad (ideally just crash).
Thus it seems to be preferable to clearly indicate that the handler is not available by using NULL rather than pretending there is a handler when calling it is unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see what you mean now. I can't comment on what method is preferable, but intentionally passing a NULL
value instead of a handler function, while callers of the handlers never expect NULL
also seems wrong. Normally, static analyzers would emit an error in this case (in PHP for sure, and I don't know about C
), that's why I didn't even think about this solution.
TBH the code which uses ZEND_UNREACHABLE()
is unreachable indeed if one uses the internal API: currently, no function is exposed that would make use of the relevant handlers.
static void *parse_url_clone_uri(void *uri) | ||
{ | ||
ZEND_UNREACHABLE(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
ext/uri/php_uri.c
Outdated
if (uri_handler_name == NULL) { | ||
return uri_handler_by_name("parse_url", sizeof("parse_url") - 1); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to parse_url
in a new API is probably not a good idea. Instead the “legacy” users should just pass "parse_url"
explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to parse_url
here works because that's the default indeed where php_uri_get_handler()
is called, the other "backends" can only be used if the config is explicitly passed (not null).
The other reason why I opted for this approach is that it would be inconvenient to create and free a new zend_string
when the legacy implementation is needed, and I wanted to avoid adding a known string just for this purpose, or exposing the C string based uri_handler_by_name
function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at this again and I must say that I'm having trouble meaningfully reviewing this. It adds a large amount of code with unclear purpose and confusing (to me) naming.
ext/uri/php_uri.c
Outdated
PHPAPI zend_result php_uri_get_scheme(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_SCHEME, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_username(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_USERNAME, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_password(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PASSWORD, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_host(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_HOST, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_port(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PORT, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_path(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PATH, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_query(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_QUERY, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_fragment(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_FRAGMENT, read_mode, zv); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of these new helpers is not clear to me. It feels like just another layer of indirection by moving the enum into the function name. There's also already uri_property_handler_from_internal_uri()
, why doesn't it work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions come from the time when the property was passed as a zend_string, so having separate methods used to make sense. You are right, these are not really needed anymore, so I'm fine with removing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the alternative code is quite much longer, and a bit more difficult to use:
- zend_result result = php_uri_get_host(internal_uri, URI_COMPONENT_READ_RAW, &host_zv);
+ zend_result result = php_uri_property_handler_from_internal_uri(internal_uri, URI_PROPERTY_NAME_USERNAME)->read_func(internal_uri, URI_COMPONENT_READ_RAW, &host_zv);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+it makes the handlers directly available for usage, which I wanted to avoid for now (because write handlers are not always available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really mind the extra helpers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already implemented the suggestion when I realized that the helpers really simplify usage, so I got rid of my changes after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preliminary review round
ext/uri/php_uri.c
Outdated
uri->scheme = zend_string_copy(Z_STR(tmp)); | ||
zval_ptr_dtor(&tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just "transfer the lifetime" instead of copy+dtor
uri->scheme = zend_string_copy(Z_STR(tmp)); | |
zval_ptr_dtor(&tmp); | |
uri->scheme = Z_STR(tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This occurs a couple of times in this function.
ext/uri/php_uri.c
Outdated
PHPAPI zend_result php_uri_get_scheme(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_SCHEME, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_username(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_USERNAME, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_password(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PASSWORD, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_host(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_HOST, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_port(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PORT, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_path(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_PATH, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_query(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_QUERY, read_mode, zv); | ||
} | ||
|
||
PHPAPI zend_result php_uri_get_fragment(const uri_internal_t *internal_uri, uri_component_read_mode_t read_mode, zval *zv) | ||
{ | ||
return php_uri_get_property(internal_uri, URI_PROPERTY_NAME_FRAGMENT, read_mode, zv); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really mind the extra helpers
ext/uri/php_uri.c
Outdated
|
||
result = php_uri_get_scheme(uri_internal, read_mode, &tmp); | ||
if (result == FAILURE) { | ||
php_uri_free(uri_internal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid the repeated error blocks by using a single "error" label that you goto to?
I'd say one of the only proper use-cases for goto in C is for these kinds of repeated error handlings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agreed! This is also a very good idea :)
ext/standard/http_fopen_wrapper.c
Outdated
if (uri_handler == NULL) { | ||
return NULL; | ||
} | ||
zend_string *tmp_uri = zend_string_init(path, strlen(path), false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah the fact that we need to make a copy is sad :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a pity... Should I then add C string based helper functions for parsing? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the parse_uri
handlers would take a const char*, size_t
pair instead of a zend_string*
, we wouldn't have this problem. So perhaps the internal API should be changed. FWIW I don't think that we should have one for both a zend_string*
and a ptr-length pair, just one should be enough. The pair one is more flexible and since we don't take copies of the input string it should not make any other case worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just implemented this.
ext/openssl/xp_ssl.c
Outdated
zend_string *resource = zend_string_init(resourcename, resourcenamelen, false); | ||
uri_internal_t *internal_uri = php_uri_parse(uri_handler, resource, true); | ||
if (internal_uri == NULL) { | ||
zend_string_release(resource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be zend_string_efree even
ext/soap/php_http.h
Outdated
char *soapaction, | ||
int soap_version, | ||
zval *response); | ||
int make_http_soap_request(zval *this_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is a bit weird here
@@ -1143,39 +1152,48 @@ int make_http_soap_request(zval *this_ptr, | |||
char *loc; | |||
|
|||
if ((loc = get_http_header_value(ZSTR_VAL(http_headers), "Location:")) != NULL) { | |||
php_url *new_url = php_url_parse(loc); | |||
uri_handler_t *uri_handler = php_uri_get_handler(uri_parser_class); | |||
if (uri_handler == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here
ext/soap/php_http.c
Outdated
|
||
zend_string *loc_str = zend_string_init(loc, strlen(loc), false); | ||
php_uri *new_uri = php_uri_parse_to_struct(uri_handler, loc_str, URI_COMPONENT_READ_NORMALIZED_ASCII, true); | ||
zend_string_release(loc_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be zend_string_release_ex
with persistent=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch from zend_string to the pointer-length pair seems to have been a good idea
@@ -14,6 +14,8 @@ | |||
+----------------------------------------------------------------------+ | |||
*/ | |||
|
|||
#include "ext/uri/php_uri.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should be a forward declaration instead, as to not complicate the headers further
@@ -22,6 +22,37 @@ | |||
extern zend_module_entry uri_module_entry; | |||
#define phpext_uri_ptr &uri_module_entry | |||
|
|||
PHPAPI void php_uri_implementation_set_object_handlers(zend_class_entry *ce, zend_object_handlers *object_handlers); | |||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedef struct { | |
typedef struct php_uri { |
The reason being that it's more convenient for tools like pahole
|
||
php_uri *uri = ecalloc(1, sizeof(*uri)); | ||
zval tmp; | ||
ZVAL_UNDEF(&tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this line is actually needed
char *soapaction, | ||
int soap_version, | ||
zval *response); | ||
zend_string *buf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation looks still weird on github
char *soapaction, | ||
int soap_version, | ||
zval *return_value) | ||
zend_string *buf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation looks still weird on github
zval host_zv; | ||
zend_result result = php_uri_get_host(internal_uri, URI_COMPONENT_READ_RAW, &host_zv); | ||
if (result == SUCCESS && Z_TYPE(host_zv) == IS_STRING) { | ||
const char * host = Z_STRVAL(host_zv); | ||
char * url_name = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move this declaration outside this if block, then you can remove lines 2655-2657 and change the return in line 2662 to return url_name;
The nice thing is that all the cleanup is then centralised
RETURN_VALIDATION_FAILED | ||
} | ||
|
||
if ((url->user != NULL && !is_userinfo_valid(url->user)) | ||
|| (url->pass != NULL && !is_userinfo_valid(url->pass)) | ||
if (strcmp(uri_handler->name, "parse_url") == 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (strcmp(uri_handler->name, "parse_url") == 0 && | |
if (strcmp(uri_handler->name, URI_PARSER_PHP) == 0 && |
if (url->host) { | ||
const char * host = ZSTR_VAL(url->host); | ||
zval host_zv; | ||
zend_result result = php_uri_get_host(internal_uri, URI_COMPONENT_READ_RAW, &host_zv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all these property reads: why do you actually check the type? If reading was successful (i.e. returned SUCCESS) I would assume that the type is correct.
No description provided.