Skip to content

Conversation

@miguelleitao
Copy link
Contributor

Added a function to encode strings.

@ooxi
Copy link
Owner

ooxi commented Dec 27, 2017

Thanks for your contribution @miguelleitao! I'm very interested in adding this functionality, since we do not have an encoding functionality yet. Please see my inline comments on your merge requests for details (will follow in a couple of minutes :-)

Copy link
Owner

@ooxi ooxi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some work, but looks promising! Please also add a test case for this functionality.

entities.c Outdated
return (size_t)(to - dest);
}

int encode_html_entities(char *dest, const char *src) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return sitze_t instead of int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree. I'll commit the correction.

entities.c Outdated
int encode_html_entities(char *dest, const char *src) {
char *to = dest;
for( const char *from = src ; *from ; from++ ) {
int i = 9999;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please declare i as late as possible and do not assign a magic unused value to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree. I'll commit the correction.

entities.c Outdated
}
//if ( *from=='\r' || *from=='\n' ) continue;
for( i=0 ; i<sizeof NAMED_ENTITIES / sizeof *NAMED_ENTITIES ; i++ )
if ( *from == NAMED_ENTITIES[i][1][0] ) break;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this logic :-) are you comparing just the first character or am I missing something? Or is this sufficient because of the available entities (then please make sure this remains the case by adding an appropriate test case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm just comparing the first character. I found this sufficient for the final application I was developing. Maybe this encoder should be classified as an ASCII encoder instead of an UTF-8 encoder. I'll think about this during next week. Enlarging the length of the test can be done but this solution will still be inefficient since it uses a top-down search on an unsorted table.
If I won't improve the test, I'll try to keep the restriction clear to future users or developers.
Any way, I'll provide a test case.

entities.c Outdated
}
}
*to = 0;
return strlen(dest);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you know dest and to, no call to strlen is necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely agree. I'll commit the correction.

@miguelleitao
Copy link
Contributor Author

Hi ooxi. Thank you for your reviews. I just pushed a new version of the encoding function.
I implemented the character matching test using all the UTF bytes and included a simple test in t-entities.c.
I removed the previously included percent coding because I believe that this solution should be closer to the expected from this package. When required, the percent coding can be implemented from another package or a developed function. This way, the encoding function should produce the opposite result to the already provided decoding function.
Remaining issues:

  • the encoding function uses top-down search which is not efficient.
  • the result is unspecified when the character is not found in the table.
    Hope you find this simple implementation usable.
    Let me know of any questions.

@ooxi
Copy link
Owner

ooxi commented Dec 29, 2017

Thanks for splitting your changes into topic commits, of which I already have merged several into master! The rest can be found in the feature/encode branch and will be merged as soon as I have had a chance to look at it :-)

@gjtorikian
Copy link

Thanks for splitting your changes into topic commits, of which I already have merged several into master! The rest can be found in the feature/encode branch and will be merged as soon as I have had a chance to look at it :-)

Was this ever merged? 😅

@ooxi
Copy link
Owner

ooxi commented May 21, 2022

@gjtorikian unfortunately, no. But feel free to use it as is :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants