Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looks good, but no conditions without functions? #3

Open
Norcoen opened this issue Oct 15, 2019 · 17 comments
Open

Looks good, but no conditions without functions? #3

Norcoen opened this issue Oct 15, 2019 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@Norcoen
Copy link

Norcoen commented Oct 15, 2019

Just found this little gem from this article:
https://tqdev.com/2019-php-templating-engine-in-165-lines

So far I really like it but as far as I can see it is not possible to have conditions in the template without calling a dedicated functions which must be provided to the sandbox.

This seems to be impossible:
{{if:SOME_VARIABLE!="sometext")}} Some output {{endif}}

That makes it unpleasant to work with, because it is basic functionality you are probably used to from other engines like FatFreeFramework

@mevdschee mevdschee self-assigned this Oct 17, 2019
@mevdschee mevdschee added the enhancement New feature or request label Oct 17, 2019
@mevdschee
Copy link
Member

mevdschee commented Oct 17, 2019

as far as I can see it is not possible to have conditions in the template without calling a dedicated functions which must be provided to the sandbox

Correct, but this is not hard. You specify the following functions:

[
  'eq' => function ($a, $b) {return $a == $b;},
  'neq' => function ($a, $b) {return $a != $b;}
]

And then you can write:

{{if:SOME_VARIABLE|eq("sometext")}} Some output {{endif}}

This is how we use it ourselves:

https://github.com/Usecue/InvoiceLion/blob/master/lib/InvoiceTemplate.php#L40

I hope this helps.

@Norcoen
Copy link
Author

Norcoen commented Oct 18, 2019

Yeah..., No...
What I meant was, that I know how to use it (eq, neq, lt, gt, lte, gte functions for comparison)

But it is such a basic functionality, that you want to use it without explicitly passing the needed functions every time. Right now I just integrated them into the template class with a third (and experimental fourth) parameter

$if = function($a, $b, $op='==', $mod='') {
  $ret = false;
  switch($op) {
    case '==':
      $ret = ($a == $b);
      break;
    case '===':
      $ret ($a === $b);
      break;
    case '!=':
      $ret = ($a != $b);
      break;
    case '!==':
      $ret = ($a !== $b);
      break;
    case '>':
      $ret = ($a > $b);
      break;
    case '<':
      $ret = ($a < $b);
      break;
    case '>=':
       $ret = ($a >= $b);
       break;
    case '<=':
       $ret = ($a <= $b);
       break;
  }
  if($mod == '!') {
    $ret = !$ret;
  }
  return $ret;
};

Right now I am experimenting using this for .htaccess file generation based on environmental variables - so I just searched for a small, self-sustained class capable of inserting variables based on some conditions or even doing loops.

Regarding your example

Are you trying to filter out <script>-Tags and Content to prevent XSS or something?
Because something like this will slip through:

<scr<script>ipt>Test</script>ipt>
alert("Test");
</scr<script>ipt>Test</script>ipt>

@mevdschee
Copy link
Member

mevdschee commented Oct 18, 2019

Right now I just integrated them into the template class with a third (and experimental fourth) parameter

Sounds like a good idea to me. Can you do a PR?

I just searched for a small, self-sustained class capable of inserting variables based on some conditions or even doing loops.

Great that you found this lib.

Are you trying to filter out <script>-Tags and Content to prevent XSS or something?

Yes, but mainly to avoid execution of accidentally copy-pasted script tags.

Because something like this will slip through:

Yes it does. Good point. Do you think this will help?

https://github.com/Usecue/InvoiceLion/blob/master/lib/InvoiceTemplate.php

Thank you for your help with making this tiny template engine even better.

Kind regards, Maurits

@Norcoen
Copy link
Author

Norcoen commented Oct 18, 2019

Sounds like a good idea to me. Can you do a PR?

Will do if I produce something useful, I just played around with it for a bit, changing dozens of lines with no real goal

Are you trying to filter out <script>-Tags and Content to prevent XSS or something?

Yes, but mainly to avoid execution of accidentally copy-pasted script tags.

Because something like this will slip through:

Yes it does. Good point. Do you think this will help?

https://github.com/Usecue/InvoiceLion/blob/master/lib/InvoiceTemplate.php

I only have some experience with finding flaws like this in other peoples code, because my job mainly resolves around keeping legacy code alive and fixing bugs that pop up.

When I look at sources like this

https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet#XSS_using_HTML_quote_encapsulation

the only save way that comes to my mind, is to have a whitelist with allowed tags and encode everything else with htmlentities (so non-whitelisted <tag> becomes &lt;tag&gt;) same should be true for partial tags like I used in the example above)

@mevdschee
Copy link
Member

Thank you for your kind reply! Please keep finding those bugs :-)

@Norcoen
Copy link
Author

Norcoen commented Oct 18, 2019

Thank you for encouraging me :)
English is not my mother tongue, so I always fear being misunderstood

Being me, I just had to do some research and found this:

https://blog.uploadcare.com/vulnerability-in-html-design-the-script-tag-33d24642359e

The interesting part:

you can forbid executing any scripts but those you allow explicitly. To do so, add the nonce (number used once) attribute holding some unique value and a special header that forbids executing scripts without the attribute.

Then, even if an intruder happens to inject a malicious script into your page, the script will not be executed. This is called Content-Security-Policy

I guess Content-Security-Policy is the right way to do this, instead of trying to filter out <script>tags - I knew those were around for quite some time now, just nobody ever used them in the code I'm responsible for, so I seem to have missed too realize how useful those are.

So I realize now, your code seems to be good for your purpose, but I should stop trying to filter out such things from user generated content and just use the new possibilities

@mevdschee
Copy link
Member

Yeah.. but this is a script in a PDF template engine, not in HTML, so the header won't help here, unfortunately. You can disable scripts altogether, but the page numbers rely on it.

@Norcoen
Copy link
Author

Norcoen commented Oct 18, 2019

oh sorry, I did not think about any other output than html when I saw the <script> tag - I guess that makes my last statement even more true, that your approach is good for your purpose 👍

@Norcoen
Copy link
Author

Norcoen commented Oct 21, 2019

Hi, sorry to bother you again in here, but github seems to not have direct messages anymore.

I read your article about writing a parser for template languages and found a small error

<?php
function token_with_quote($str, $quote = "'", $escape = '^', $separator = '|')
{
    $tokens = [];
    $token = '';
    $escaped = false;
    $quoted = false;
    for ($i = 0; $i < strlen($str); $i++) {
        $c = $str[$i];
        if (!$quoted) {
            if ($c == $quote) {
                $quoted = true;
-            } elseif (substr($str, $i, strlen($separator)) == $separator) {
+            } elseif ($seplen = substr($str, $i, strlen($separator)) == $separator) {
                $tokens[] = $token;
                $token = '';
                $i += $seplen - 1;
                continue;
            }
        } else {
            if (!$escaped) {
                if ($c == $quote) {
                    $quoted = false;
                } elseif ($c == $escape) {
                    $escaped = true;
                }
            } else {
                $escaped = false;
            }
        }
        $token .= $c;
    }
    $tokens[] = $token;
    return $tokens;
}

-$input = "'one|uno'||'three^'^''|'four^^^'^cuatro'|"
+$input = "'one|uno'||'three^'^''|'four^^^'^cuatro'|";
$output = token_with_quote($input);
echo json_encode($output) . "\n";
function token_unquote($arr, $quote = "'", $escape = '^')
{
    for ($i = 0; $i < count($arr); $i++) {
        $str = trim($arr[$i]);
        if (strlen($str) > 1 && $str[0] == $quote && $str[strlen($str) - 1] == $quote) {
            $escaped = false;
            $token = '';
            $str = substr($str, 1, strlen($str) - 2);
            for ($j = 0; $j < strlen($str); $j++) {
                $c = $str[$j];
                if (!$escaped) {
                    if ($c == $escape) {
                        $escaped = true;
                        continue;
                    }
                } else {
                    $escaped = false;
                }
                $token .= $c;
            }
            $arr[$i] = $token;
        }
    }
    return $arr;
}

-$input = "'one|uno'||'three^'^''|'four^^^'^cuatro'|"
+$input = "'one|uno'||'three^'^''|'four^^^'^cuatro'|";
$output = token_unquote(token_with_quote($input));
echo json_encode($output) . "\n";

The missing semicolon was reported by php, but the missing variable took me a few minutes, because it always just resulted in Fatal error: Allowed memory size of 134217728 bytes exhausted 😅

Hope this helps

@mevdschee
Copy link
Member

Hope this helps

Yes, thank you very much! Sorry for any inconvenience this may have caused. :-)

@Norcoen
Copy link
Author

Norcoen commented Oct 22, 2019

woops, there was an error in mine, too
but lucky me, you did not copy it :)
$seplen = substr($str, $i, strlen($separator)) will never work correctly, i found that after making $escape strings possible instead of char only:

<?php
<?php
function token_with_quote($str, $quote = "'", $escape = '###', $separator = '|')
{
    $tokens = [];
    $token = '';
    $escaped = false;
    $quoted = false;
    $strlen = strlen($str);
    $seplen = strlen($separator);
    $esclen = strlen($escape);
    for ($i = 0; $i < $strlen; $i++) {
        $c = $str[$i];
        if (!$quoted) {
            if ($c == $quote) {
                if ($token == '') {
                    $quoted = true;
                }
                else {
                    // unescaped quote in unquoted token, insert escape sequence, ignore quote
                    $token .= $escape;
                }
            } elseif (($i + $seplen <= $strlen) && substr($str, $i, $seplen) == $separator) {
                $tokens[] = $token;
                $token = '';
                $i += $seplen - 1;
                continue;
            } elseif ($str[$i-1] == $quote && substr($str, ($i-1-$esclen), $esclen) != $escape && (strlen($token) > $esclen && substr($token, (-1-$esclen), $esclen) != $escape)) {
                // [last character was the quote sign], [it was not escaped and escape was not inserted above], [current char is not the separator (elseif above)]
                for ($j = $i; $j < $strlen; $j++) {
                    if (ctype_space($str[$j])) {
                        if (($j + $seplen <= $strlen) && substr($str, $j, $seplen) == $separator) {
                            // there was only whitespace until next separator, so closing the quote was correct and we do nothing
                            break;
                        }
                    }
                    else {
                        // there was an unescaped quote in the data, insert escape sequence in front of it and set quoted status back to true
                        $token = substr($token, 0, -1) . $escape . substr($token, -1);
                        $quoted = true;
                        break;
                    }
                }
            }
        } else {
            if (substr($str, $i, $esclen) == $escape) {
                $escaped = true;
                $token .= $escape;
                $i += $esclen -1;
                continue;
            }
            if (!$escaped) {
                if ($c == $quote) {
                    $quoted = false;
                }
            } else {
                $escaped = false;
            }
        }
        $token .= $c;
    }
    $tokens[] = $token;
    return $tokens;
}

$input = "'one|un'o'|bl'abla|'three###'###''|'four#########'###cuatro'|token|'token2'|'to'|ken4'|tok'en5";
$output = token_with_quote($input);
$out =  json_encode($output);
echo $out . PHP_EOL;
echo "Expected:" . PHP_EOL;
$expected =  "[\"'one|un###'o'\",\"bl###'abla\",\"'three###'###''\",\"'four#########'###cuatro'\",\"token\",\"'token2'\",\"'to'\",\"ken4###'\",\"tok###'en5\"]";
echo $expected . PHP_EOL;
if ($expected == $out)
        echo "!!! YAY !!!" . PHP_EOL;
else
        echo "!!! NOPE !!!" . PHP_EOL;
echo "Input: " . PHP_EOL;

This also contains some changes, to handle unescaped quotes that are in $str by mistake but could be escaped on-the-fly because they should not start or end a quote on this position anyway
(dont start a quote inside an unquoted token and dont close a quote if there is no $seperator to follow it, with exceptions to whitespace)

I really love fiddling with your work, maybe you find some of it useful

@mevdschee
Copy link
Member

This also contains some changes

Can you be more precise? I see that you are trying to support escape sequences that are longer than one character, which is a nice addon. Anything I missed?

I really love fiddling with your work, maybe you find some of it useful

Thank you very much, I really appreciate the bugs you report!

@Norcoen
Copy link
Author

Norcoen commented Oct 22, 2019

Can you be more precise? I see that you are trying to support escape sequences that are longer than one character, which is a nice addon. Anything I missed?

Yes, sorry - I did multiple changes and this was just the final result

First I thought there might be unescaped quotes inside unquoted as well as quoted data - we need to somehow handle this correctly or ignore it, so this was my testcase:

<?php
function token_with_quote($str, $quote = "'", $escape = '^', $separator = '|')
{
    $tokens = [];
    $token = '';
    $escaped = false;
    $quoted = false;
    $seplen = strlen($separator);
    for ($i = 0; $i < strlen($str); $i++) {
        $c = $str[$i];
        if (!$quoted) {
            if ($c == $quote) {
                if ($token == '') {
                    $quoted = true;
                }
                else {
                    // unescaped quote in unquoted token, insert escape char, ignore quote
                    $token .= $escape;
                }
            } elseif (substr($str, $i, $seplen) == $separator) {
                $tokens[] = $token;
                $token = '';
                $i += $seplen - 1;
                continue;
            } elseif ($str[$i-1] == $quote && $str[$i-2] != $escape && (strlen($token) > 1 && substr($token, -2, 1) != $escape)) {
                // [last character was the quote sign], [it was not escaped and escape was not inserted above], [current char is not the separator (elseif above)]
                for ($j = $i; $j < strlen($str); $j++) {
                    if (ctype_space($str[$j])) {
                        if (substr($str, $j, strlen($separator)) == $separator) {
                            // there was only whitespace until next separator, so closing the quote was correct and we do nothing
                            break;
                        }
                    }
                    else {
                        // there was an unescaped quote in the data, insert escape sequence in front of it and set quoted status back to true
                        $token = substr($token, 0, -1) . $escape . substr($token, -1);
                        $quoted = true;
                        break;
                    }
                }
            }
        } else {
            if (!$escaped) {
                if ($c == $quote) {
                    $quoted = false;
                } elseif ($c == $escape) {
                    $escaped = true;
                }
            } else {
                $escaped = false;
            }
        }
        $token .= $c;
    }
    $tokens[] = $token;
    return $tokens;
}

$input = "'one|un'o'|bl'abla|'three^'^''|'four^^^'^cuatro'|token|'token2'|'to'|ken4'|tok'en5";
$output = token_with_quote($input);
echo json_encode($output) . "\n";
echo "Erwartung:" . PHP_EOL;
echo "[\"'one|un^'o'\",\"bl^'abla\",\"'three^'^''\",\"'four^^^'^cuatro'\",\"token\",\"'token2'\",\"'to'\",\"ken4^'\",\"tok^'en5\"]" .PHP_EOL;
echo "Input: " . PHP_EOL;
echo $input . PHP_EOL;

The important changed I did are marked here:

<?php
function token_with_quote($str, $quote = "'", $escape = '^', $separator = '|')
{
    $tokens = [];
    $token = '';
    $escaped = false;
    $quoted = false;
    $seplen = strlen($separator);
    for ($i = 0; $i < strlen($str); $i++) {
        $c = $str[$i];
        if (!$quoted) {
            if ($c == $quote) {
-               $quoted = true;
+               if ($token == '') {
+                   $quoted = true;
+               }
+               else {
+                   // unescaped quote in unquoted token, insert escape char, ignore quote
+                   $token .= $escape;
+               }
            } elseif (substr($str, $i, $seplen) == $separator) {
                $tokens[] = $token;
                $token = '';
                $i += $seplen - 1;
                continue;
+           } elseif ($str[$i-1] == $quote && $str[$i-2] != $escape && (strlen($token) > 1 && substr($token, -2, 1) != $escape)) {
+               // [last character was the quote sign], [it was not escaped and escape was not inserted above], [current char is not the separator (elseif above)]
+               for ($j = $i; $j < strlen($str); $j++) {
+                   if (ctype_space($str[$j])) {
+                       if (substr($str, $j, strlen($separator)) == $separator) {
+                           // there was only whitespace until next separator, so closing the quote was correct and we do nothing
+                           break;
+                       }
+                   }
+                   else {
+                       // there was an unescaped quote in the data, insert escape sequence in front of it and set quoted status back to true
+                       $token = substr($token, 0, -1) . $escape . substr($token, -1);
+                       $quoted = true;
+                       break;
+                   }
+               }
            }
        } else {
            if (!$escaped) {
                if ($c == $quote) {
                    $quoted = false;
                } elseif ($c == $escape) {
                    $escaped = true;
                }
            } else {
                $escaped = false;
            }
        }
        $token .= $c;
    }
    $tokens[] = $token;
    return $tokens;
}

$input = "'one|un'o'|bl'abla|'three^'^''|'four^^^'^cuatro'|token|'token2'|'to'|ken4'|tok'en5";
$output = token_with_quote($input);
echo json_encode($output) . "\n";
echo "Erwartung:" . PHP_EOL;
echo "[\"'one|un^'o'\",\"bl^'abla\",\"'three^'^''\",\"'four^^^'^cuatro'\",\"token\",\"'token2'\",\"'to'\",\"ken4^'\",\"tok^'en5\"]" .PHP_EOL;
echo "Input: " . PHP_EOL;
echo $input . PHP_EOL;

Basically we don't set $quoted to true, if $token is not empty, because in this case we should be handling unquoted data and a sudden quote should be a quote that was forgotten to be escaped.

For the second part, we check after $quote was set to false in quoted data, because from my understanding the only valid follow up should be the separator or some whitespace if you allow it.
So if there is no separator or whitespace until the next separator, chances are high hat we have a forgotten zu escape our quote here, too.

The second part is not perfect, because you could forget to escape a quote AND have whitespace and/or $separator follow it, so the token will be split in two and the closing quote will be escaped by the first change of escaping quotes in unquoted data.

So I'm not sure if this is better or worse from error handling point of view, but the test cases in the echo's below looked good to me

The next step led to the result you see above your reply, because I wanted escape strings instead of escape chars, so I had to work with new $esclen variable to mimic your style from $separator.
In most parts that was just replacing something like $c == $escape with substr($str, $i, $esclen) == $escape

But I had to change the part where you only set $escape to true if it was not escaped previously, because that led tom mixup if separators followed one after another and being made up of multiple times the same char, so [###][###][###]' would not become ###' but be interpreted as [###]#[###]##(unescaped)'
No simply everytime we find $escape, we set $escaped to true and jump over it

I hope I could clear up most of my intentions to change the code to those states. Please ask back if there is something unclear

@Norcoen
Copy link
Author

Norcoen commented Oct 22, 2019

Follow up for the dependent token_unquotefunction:

<?php
function token_unquote($arr, $quote = "'", $escape = '###')
{
    $esclen = strlen($escape);
    for ($i = 0; $i < count($arr); $i++) {
        $str = trim($arr[$i]);
        if (strlen($str) > 1 && $str[0] == $quote && $str[strlen($str) - 1] == $quote) {
            // if the token has more than one char and the first char is the $quote and the last char is the $quote, then we just remove the quotes before moving on
            $str = substr($str, 1, strlen($str) - 2);  // den Teil des Tokens zwischen den quotes holen
        }
        $escaped = false;
        $token = '';
        for ($j = 0; $j < strlen($str); $j++) {
            $c = $str[$j];
            if (substr($str, $j, $esclen) == $escape) {
                // jump over escape sequence - remove it from token
                $escaped = true;
                $j += $esclen -1;
                continue;
            } else {
                $escaped = false;
            }
            $token .= $c;
        }
        $arr[$i] = $token;
    }
    return $arr;
}

$input = ["'one|un###'o'","bl###'abla","'three###'###''","'four#########'###cuatro'","token","'token2'","'to'","ken4###'","tok###'en5"];
$output = token_unquote($input);
echo json_encode($output) . "\n";
echo "Expected:" . PHP_EOL;
echo "[\"one|un'o\",\"bl'abla\",\"three''\",\"four'cuatro\",\"token\",\"token2\",\"to\",\"ken4'\",\"tok'en5\"]" .PHP_EOL;
echo "Input: " . PHP_EOL;
echo json_encode($input) . PHP_EOL;

With this it is now possible to unquote and remove escape strings from formerly created tokens with multi-char $escape strings

I closed the if-condition for removing start and end quotes earlier, to everything after is just a string which might contain escaped quotes.
Then I changed the condition in the loop to respect multi-char $escape strings and just jump over them

@Norcoen
Copy link
Author

Norcoen commented Oct 24, 2019

I had some time to take a deeper look into the Template-Class, now that I got the general idea, how you parse templates.

Now I have some questions of why and how.

i found that something like
{{if:10|eq(10)}} does not work, because you always expect a variable in the first part

Wouldn't it be better to prefix variables with a $ sign (for example)?

So we could have {{$varname}} and {{if:$varname|eq(10)}} as well as {{if:10|eq($varname)}} or {{if:'hello|eq('hello')}} with a small change

Also in applyFunctions()you only check for double quotes and is_numeric - is there a reason to not support single quotes and just not apply stripcslashes()?

What I could not make sense of yet, was why you set $node->value = $value;just before return in renderIfNode() and renderElseIfNode(). Was $node supposed to be passed as reference to keep track of the results?

@mevdschee
Copy link
Member

I had some time to take a deeper look into the Template-Class, now that I got the general idea, how you parse templates.

Great!

Also in applyFunctions()you only check for double quotes and is_numeric - is there a reason to not support single quotes and just not apply stripcslashes()?

No there is not.

What I could not make sense of yet, was why you set $node->value = $value;just before return in renderIfNode() and renderElseIfNode(). Was $node supposed to be passed as reference to keep track of the results

It is used to reference the "value" parameter for comparison in the next block (from 'if' to 'elseif' or from 'elseif' to 'else').

i found that something like
{{if:10|eq(10)}} does not work, because you always expect a variable in the first part
Wouldn't it be better to prefix variables with a $ sign (for example)?

There are very good templating languages out there (such as Blade and Twig), but really.. would you have been inspired by this code if it was a lot longer?

Adding features/code is not a virtue.. leaving it out is. It allows people to quickly grasp the essence of the code. Things like multi character escapes or literals in the left hand side of a comparison are not essential to a templating language.

Why don't you fork the project and make it more full-featured? You seem to like adding features and people may like it: I think there is room for a zero-dependency Twig or Blade.

@Norcoen
Copy link
Author

Norcoen commented Oct 24, 2019

It is used to reference the "value" parameter for comparison in the next block (from 'if' to 'elseif' or from 'elseif' to 'else').

Ooooh, I did not see createNode() casting the array to object, so it is always implicitly passed as reference - now it makes sense to me

Adding features/code is not a virtue.. leaving it out is. It allows people to quickly grasp the essence of the code. Things like multi character escapes or literals in the left hand side of a comparison are not essential to a templating language.

Why don't you fork the project and make it more full-featured? You seem to like adding features and people may like it: I think there is room for a zero-dependency Twig or Blade.

My target is to have a zero-dependency template engine I fully understand and that I can support for many years to come. For that to happen I would have to try and write one from scratch (not exactly my strong suit) or play around with yours or others long enough to grasp it 😳

I tried to make the changed as minimal as possible, because I hoped you might be interested in some to implement them in your own style.
Multi character escaped removed 3 lines and added 6, so only 3 additional lines, 4 if you want to use $esclenvariable.

Trying to make parsing a bit more robust for malformed templates added 19 lines, but that was just trying to understand the parser better - I dont know if it makes things better or worse

Prefixing variables with $ or other sign might make it easier to parse the template, reduce code and enable literals on the left hand side of comparison - or so I hope - I will try it out when I have some time on my hand again

So I just share some thoughts and am thankful for your input/replies - if any of that is useful for your library is only for you to decide - in the end I need to make a fork anyway, because I only need the template class, not the whole core (for now anyhow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants