Function to encode strings #3

miguelleitao · 2017-12-27T01:56:02Z

Added a function to encode strings.

ooxi · 2017-12-27T11:48:30Z

Thanks for your contribution @miguelleitao! I'm very interested in adding this functionality, since we do not have an encoding functionality yet. Please see my inline comments on your merge requests for details (will follow in a couple of minutes :-)

ooxi

Needs some work, but looks promising! Please also add a test case for this functionality.

ooxi · 2017-12-27T11:49:02Z

entities.c

 	return (size_t)(to - dest);
 }

+int encode_html_entities(char *dest, const char *src) {


Please return sitze_t instead of int

Completely agree. I'll commit the correction.

ooxi · 2017-12-27T11:50:12Z

entities.c

+int encode_html_entities(char *dest, const char *src) {
+        char *to = dest;
+        for( const char *from = src ; *from ; from++ ) {
+            int i = 9999;


Please declare i as late as possible and do not assign a magic unused value to it

Completely agree. I'll commit the correction.

ooxi · 2017-12-27T11:51:52Z

entities.c

+            }
+            //if ( *from=='\r' || *from=='\n' ) continue;
+            for( i=0 ; i<sizeof NAMED_ENTITIES / sizeof *NAMED_ENTITIES ; i++ )
+                if ( *from == NAMED_ENTITIES[i][1][0] ) break;


I don't really get this logic :-) are you comparing just the first character or am I missing something? Or is this sufficient because of the available entities (then please make sure this remains the case by adding an appropriate test case)

Yes, I'm just comparing the first character. I found this sufficient for the final application I was developing. Maybe this encoder should be classified as an ASCII encoder instead of an UTF-8 encoder. I'll think about this during next week. Enlarging the length of the test can be done but this solution will still be inefficient since it uses a top-down search on an unsorted table.
If I won't improve the test, I'll try to keep the restriction clear to future users or developers.
Any way, I'll provide a test case.

ooxi · 2017-12-27T11:52:49Z

entities.c

+            }
+        }
+        *to = 0;
+        return strlen(dest);


Since you know dest and to, no call to strlen is necessary

Completely agree. I'll commit the correction.

miguelleitao · 2017-12-28T19:29:16Z

Hi ooxi. Thank you for your reviews. I just pushed a new version of the encoding function.
I implemented the character matching test using all the UTF bytes and included a simple test in t-entities.c.
I removed the previously included percent coding because I believe that this solution should be closer to the expected from this package. When required, the percent coding can be implemented from another package or a developed function. This way, the encoding function should produce the opposite result to the already provided decoding function.
Remaining issues:

the encoding function uses top-down search which is not efficient.
the result is unspecified when the character is not found in the table.
Hope you find this simple implementation usable.
Let me know of any questions.

ooxi · 2017-12-29T14:26:06Z

Thanks for splitting your changes into topic commits, of which I already have merged several into master! The rest can be found in the feature/encode branch and will be merged as soon as I have had a chance to look at it :-)

gjtorikian · 2022-05-20T23:44:32Z

Thanks for splitting your changes into topic commits, of which I already have merged several into master! The rest can be found in the feature/encode branch and will be merged as soon as I have had a chance to look at it :-)

Was this ever merged? 😅

ooxi · 2022-05-21T13:06:19Z

@gjtorikian unfortunately, no. But feel free to use it as is :-)

miguelleitao added 3 commits October 23, 2017 23:59

encoding

234c272

stdio added

665707d

encode declaration

92cd2bc

ooxi requested changes Dec 27, 2017

View reviewed changes

miguelleitao added 5 commits December 27, 2017 22:10

size_t return

087ef2b

const cast in cmp() to improve compatibility

28ea301

encoding with full test

20d7f5e

added commat; entitie

4626491

encoding test

1e13e72

miguelleitao added 6 commits December 29, 2017 15:21

encoding

e6a49b5

stdio added

81026a7

encode declaration

d33c982

size_t return

0fe8c7f

encoding with full test

3a3b17a

encoding test

b15161a

ooxi and others added 2 commits December 29, 2017 15:38

Misc stylistic unification

eeb211f

Merge branch 'feature/encode' of https://github.com/ooxi/entities

aea9a89

Merge branch 'ooxi:master' into master

96769f8

Function to encode strings #3

Are you sure you want to change the base?

Function to encode strings #3

Uh oh!

Conversation

miguelleitao commented Dec 27, 2017

Uh oh!

ooxi commented Dec 27, 2017

Uh oh!

ooxi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miguelleitao commented Dec 28, 2017

Uh oh!

ooxi commented Dec 29, 2017

Uh oh!

gjtorikian commented May 20, 2022

Uh oh!

ooxi commented May 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants