-
Notifications
You must be signed in to change notification settings - Fork 52
optional
doesn't propagate fail when used with <|>
#235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's a serious problem… I just tried to remove as many excess |
Is this an issue with left recursion maybe? It could definitely be a bug also, but that's something to look at / consider if you've not encountered material about it before. |
I don't see recursion in the steps-to-reproduce, unless I'm missing something… |
EDIT: nm, I take that back, it doesn't use |
Just to clarify: even if it would, that would be a bug because it's documented to only return |
That doesn't seem to be true for me, dropping a parseTypes :: TextParser String
parseTypes = do
_ <- lexeme (char 'f' *> char 'o' *> char 'o')
fail "test failure" Also produces |
@garyb oh, interesting, let me cross off that line… The situation that quote was referring to was tested when instead of sequence of |
…although, now that I test, even with |
While re-checking the code, I just managed to reduce the problem even further! I guess I'll post it here, because at this point re-editing the issue may introduce confusion…? So, I just found that Here's the smaller steps to reproduce, no module Main where
import Prelude
import Data.Maybe (Maybe(..))
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators (optionMaybe)
import Parsing.String (string)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = do
optionMaybe parseTypes >>= case _ of
Nothing -> pure ""
Just x -> pure x
main :: Effect Unit
main = logShow $ runParser "foo" parseImpl |
I’m afk so can’t check just now, but it’s kinda pointing to the functor or
applicative instance doing something wrong with the consumed state.
|
optionMaybe
doesn't propagate fail when used with 2 optional
soptionMaybe
doesn't propagate fail when used whiteSpace
So, an even smaller reproducer is replacing whole paragraph matching on module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators ((<|>))
import Parsing.String (string)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImpl I guess this points out the problem is certainly in |
CC: @jamesdbrock as the I don't have enough understanding of
takeWhile :: forall m. (CodePoint -> Boolean) -> ParserT String m String
takeWhile predicate =
consumeWith \s ->
let
value = String.takeWhile predicate s
in
Right
{ consumed: value
, remainder: SCU.drop (SCU.length value) s
, value
} Now, in What should be done here instead is that code should test previous To test this idea, I took the previous comment code, and added a space as |
Nice find! That sounds like the culprit. |
Thank you! Any takers to send a PR? This sounds trivial to implement, I just don't know where and how to pattern-match on the older |
It's the last argument to purescript-parsing/src/Parsing/String.purs Line 285 in c38e6ea
Then update: purescript-parsing/src/Parsing/String.purs Line 290 in c38e6ea
to something like: -- prevConsumed is the consumed slot from ParseState
runFn2 done (ParseState remainder (updatePosString pos consumed remainder) (prevConsumed || not (String.null consumed))) value |
@natefaubion just to clarify: are you saying, the bug should be fixed in |
Well… apparently, there's more than one bug. I'll split this to a separate issue for the purpose of PR. |
optionMaybe
doesn't propagate fail when used whiteSpace
optionMaybe
doesn't propagate fail when used with 2 optional
s
So, on that case I sent a PR. Regarding the case here, here's the minimal steps that are buggy even with the fix (it boils down to having module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators (optional, (<|>))
import Parsing.String (string)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* optional whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImpl |
optionMaybe
doesn't propagate fail when used with 2 optional
soptional
doesn't propagate fail when used with <|>
Okay, folks, I think I more or less figured this one too, but something's doesn't add up, probably just because I don't know the code. In above code both After some digging I figured the bug seems to be in Alt implementation: instance Alt (ParserT s m) where
alt (ParserT k1) (ParserT k2) = ParserT
( mkFn5 \state1@(ParseState input pos _) more lift throw done ->
more \_ ->
runFn5 k1 (ParseState input pos false) more lift
( mkFn2 \state2@(ParseState _ _ consumed) err ->
more \_ ->
if consumed then
runFn2 throw state2 err
else
runFn5 k2 state1 more lift throw done
)
done
) Per my understanding, it should be changed to propagate previous consumed as well: --- a/src/Parsing.purs
+++ b/src/Parsing.purs
@@ -350,9 +350,9 @@ instance MonadError ParseError (ParserT s m) where
-- | section __2.3 Backtracking__.
instance Alt (ParserT s m) where
alt (ParserT k1) (ParserT k2) = ParserT
- ( mkFn5 \state1@(ParseState input pos _) more lift throw done ->
+ ( mkFn5 \state1@(ParseState input pos oldConsumed) more lift throw done ->
more \_ ->
- runFn5 k1 (ParseState input pos false) more lift
+ runFn5 k1 (ParseState input pos oldConsumed) more lift
( mkFn2 \state2@(ParseState _ _ consumed) err ->
more \_ ->
if consumed then It fixes the bug discussed, however it also breaks tests, so I'm wondering if perhaps I am misunderstanding something? |
An even smaller code-to-reproduce: module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators ((<|>))
import Parsing.String (string)
type TextParser = Parser String
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
void (fail "fail after consumption") <|> pure unit
main :: Effect Unit
main = logShow $ runParser "foo" parseTypes |
Okay guys, I have re-studied everything, and I am certain the change I described one comment above is correct. So now the question is: why after this change running tests results in: TESTS Indentation
file:///home/constantine/Projects/purescript-parsing/output/Test.Assert/foreign.js:4
if (!success) throw new Error(message);
^
Error: error: (ParseError "Predicate unsatisfied" (Position { column: 2, index: 1, line: 1 }))
Predicate unsatisfied at position index:1 (line:1, column:2)
▼
k
at file:///home/constantine/Projects/purescript-parsing/output/Test.Assert/foreign.js:4:27
at Module.__do (file:///home/constantine/Projects/purescript-parsing/output/Test.IndentationTests/index.js:158:511)
at __do (file:///home/constantine/Projects/purescript-parsing/output/Test.Main/index.js:539:27)
at file:///home/constantine/Projects/purescript-parsing/.spago/run.js:3:1
at ModuleJob.run (node:internal/modules/esm/module_job:271:25)
at async onImport.tracePromise.__proto__ (node:internal/modules/esm/loader:547:26)
at async asyncRunEntryPointWithESMLoader (node:internal/modules/run_main:98:5)
? From this output I don't even understand what test is failing. It refers to foreign JS for some reason, and "Predicate unsatisfied" message isn't even in tests but in core library 🤷♂️ |
I think the Any combinator that makes sequential progress (such as consumeWith) needs to take the previous consumed flag into account. That is, it's not a local decision of "this particular step consumed something". The consumed state means "this particular branch consumed something". |
Well, I don't have anything to say here. Your comment is done from high-level knowledge of the library, whereas mine is from low-level investigation of the generated JS and sprinkling |
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
void (fail "fail after consumption") <|> pure unit This isn't a correct example, or at least your expectation is incorrect. The bracketing of this is: parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
(void (fail "fail after consumption") <|> pure unit)
If you change the bracketing to: parseTypes :: TextParser Unit
parseTypes = (do
_ <- string "foo"
void (fail "fail after consumption")) <|> pure unit Then this will fail as expected. |
Part of the reason why this is so confusing is the semantics of |
Thank you! But you probably meant I tried that first and it did fail… but then I realized I had the changes to λ cat src/Main.purs
module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators ((<|>))
import Parsing.String (string)
type TextParser = Parser String
parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
(void (fail "fail after consumption")) <|> pure unit
main :: Effect Unit
main = logShow $ runParser "foo" parseTypes
λ spago run
[info] Build succeeded.
[warn] None of your project files import modules from some projects that are in the direct dependencies of your project.
These dependencies are unused. To fix this warning, remove the following packages from the list of dependencies in your config:
- maybe
- unicode
(Right unit) |
No, I don't mean that. Let me reformat: parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
void (fail "fail after consumption") <|> pure unit This is working correctly. The parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
(void (fail "fail after consumption") <|> pure unit) Algebraically, parseTypes :: TextParser Unit
parseTypes = do
_ <- string "foo"
pure unit If you change the bracketing to: parseTypes :: TextParser Unit
parseTypes =
( do
_ <- string "foo"
void (fail "fail after consumption")
) <|> pure unit Then you will get the error you expect. That's because the |
Oooh, I see, so, even if I bracket it as: […]
(void (fail "fail after consumption")) <|> pure unit it will still be executed as if it was […]
(void (fail "fail after consumption") <|> pure unit) ? |
Welp… this is confusing, but apparently that's how even Parsec works: module Main where
import Control.Monad (void)
import Text.Parsec
import Text.Parsec.String (Parser)
type TextParser a = Parsec String () a
parseTypes :: TextParser ()
parseTypes = do
_ <- string "foo"
(void (fail "fail after consumption")) <|> return ()
main :: IO ()
main = print $ parse parseTypes "" "foo" Similarly, returns |
Okay, I closed too early. We found the problem in the last reduced steps-to-reproduce, which is basically a tricky thing that one shouldn't use module Main where
import Prelude
import Effect (Effect)
import Effect.Console (logShow)
import Parsing (Parser, fail, runParser)
import Parsing.Combinators (optional, (<|>))
import Parsing.String (string)
import Parsing.String.Basic (whiteSpace)
type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* optional whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImpl Has no such problem. Here whole code that consumes and fails has no branching within do-expression. The nearest branching resides outside it. And yet it succeeds. |
To be clear, I still think there is the issue in consumeWith. |
|
In order to get your example to fail (ie, meet your expectation), I needed the fix to type TextParser = Parser String
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* whiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImpl
The parseInner :: TextParser String
parseInner = do
_ <- string "foo"
takeWhile isSpace <|> pure "" However, the following two alternatives behave expectedly: parseInner :: TextParser String
parseInner = do
_ <- string "foo"
takeWhile isSpace parseInner :: TextParser String
parseInner = do
_ <- string "foo"
takeWhile1 isSpace <|> pure "" So there's some confusing semantics around primitive parsers which can both:
That is, as long you consume input or fail, everything works as expected. One fix for this is to define The alternative (yuk yuk) fix would be to indeed fix the Alternative instance, but not in the way you highlighted. It would be to correct it to be in line with the semantics I noted above around writer, and we would need to consider the prevConsumed in the instance Alt (ParserT s m) where
alt (ParserT k1) (ParserT k2) = ParserT
( mkFn5 \state1@(ParseState input pos prevConsumed) more lift throw done ->
more \_ ->
runFn5 k1 (ParseState input pos false) more lift
( mkFn2 \state2@(ParseState _ _ consumed) err ->
more \_ ->
if consumed then
runFn2 throw state2 err
else
runFn5 k2 state1 more lift throw done
)
( mkFn2 \(ParseState input' pos' consumed) res ->
runFn2 done (ParseState input' pos' (prevConsumed || consumed)) res
)
) All of the three alternatives above are interchangeable then. I think if we wanted to go the route of fixing the alternative instance, I think we should just change the semantics internally to Writer. Then it becomes baked into the framework, and the consumed flag becomes a local decision rather than the awkward "whether the current branch has consumed anything", which is a source of endless confusion and subtlety. The downside is this is a breaking change to the internal representation. I don't know if we consider that part of the API though? We could also "for now" fix consumeWith to require a NonEmptyString for consumed, otherwise it must fail. This would uphold the "consume or fail" invariant. |
I got lost here. The reason the example you suggest works unexpectedly IIUC is the one discussed before I closed the issue. It boils down to the fact that in Parsing (and Parsec for that matter) the I understand you moved the
FTR, I tested this fix by running: myWhiteSpace :: TextParser String
myWhiteSpace = takeWhile1 isSpace <|> pure ""
parseTypes :: TextParser String
parseTypes = do
_ <- string "foo" <* optional myWhiteSpace
fail "test failure"
parseImpl :: TextParser String
parseImpl = parseTypes <|> pure ""
main :: Effect Unit
main = logShow $ runParser "foo" parseImpl But it still returns |
I don't really understand what you are getting at with "should never be used anywhere but the very top of a do-expression". I don't think this should be the takeaway at all. The reason the previous inaccurate example was incorrect was because I think you either had an incorrect assumption of how it's supposed to work, or you had an incorrect assumption about how the language was implicitly bracketing your code (ie, a question of precedence).
Then maybe changing the |
This effectively treats the consumed flag as a monoid appended on each sequential action (ie, Writer). This means it can always be a local decision so low-level combinators don't need to consult the previous state. Fixes purescript-contrib#235
Thank you! ❤️ |
I see, fixes have been accumulating since 2022, it is maybe worth making a new release, so that PureScript users would get the library from Registry with less bugs? |
Describe the bug
So, I'm not clear why, but when you execute underDown here are even simpler steps-to-reproduceoptionMaybe
a code that uses twooptional
s, then even if the code consumes input before failing,optionMaybe
thinks it didn't. The interesting feature of this bug is that if you remove at least oneoptional
, it will fail as expected.Spent a few hours digging, first to a parsing bug and then to the Parsing bug (pun intended), came down with the minimal steps below 😊
To Reproduce
UPD: simpler steps-to-reproduce are in this comment
Run the following code:
Expected behavior
Code should return
(Left "test failure")
because the paragraph being executed underoptionMaybe
has consumed the input (thrice even) before failing.Actual behavior
It returns
(Right "")
The text was updated successfully, but these errors were encountered: