Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of Network.setRequestInterception #31

Open
juba opened this issue Feb 28, 2019 · 13 comments
Open

Use of Network.setRequestInterception #31

juba opened this issue Feb 28, 2019 · 13 comments
Labels
question Further information is requested

Comments

@juba
Copy link

juba commented Feb 28, 2019

Hi,

If I want to check a page and get some informations about network operations, I can do something like the following :

url <- "https://www.r-project.org/"

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.responseReceived() %...T>% {
      print("received")
      print(.$result$response$url)
    }
)

However, I'd like to use Network.setRequestInterception to be able to capture only certain requests. I tried to do it this way, but it doesn't seem to work :

promise_all(
  chrome %>%
    Page.enable() %>%
    Page.navigate(url),
  chrome %>% 
    Network.enable() %>%
    Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
    Network.requestIntercepted() %...T>% {
      print("intercepted")
    }
)

Would you have any idea of what I'm doing wrong ?

Thanks !

@RLesur
Copy link
Owner

RLesur commented Feb 28, 2019

For this kind of task, I prefer to do the following script, that works:

library(crrri)

chrome <- chr_connect()
url = "https://www.rstudio.com"

intercepted <- 
  chrome %>%
  Page.enable() %>%
  Network.enable() %>%
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
  Network.requestIntercepted() %...T>% {
    print("intercepted")
  }

chrome %>%
  Page.navigate(url)

chr_disconnect(chrome)

This first creates a promise for intercepted network request then opens the url.

@RLesur
Copy link
Owner

RLesur commented Feb 28, 2019

If you want to execute a callback for each intercepted network request, you can use the .callback argument

library(crrri)

chrome <- chr_connect()
url = "https://www.rstudio.com"

chrome %>%
  Page.enable() %>%
  Network.enable() %>%
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>%
  Network.requestIntercepted(.callback = function(msg) {
    print(msg$params$request)
  })

chrome %>%
  Page.navigate(url)

chrome %>%
  Page.navigate("https://cdn.rawgit.com/juba/rmdformats/master/resources/examples/material/material.html")

chr_disconnect(chrome)

@juba
Copy link
Author

juba commented Feb 28, 2019

Thanks !

One more question about your first example : if I run all the code at once, the request is not intercepted. I have to run the intercepted promise, wait a little, and then run the Page.navigate. Putting a Sys.sleep() in between doesn't seem to work either.

And if I try with headless=FALSE, I get an error when running the Page.navigate promise :

[error] handle_read_frame error: websocketpp.transport:7 (End of File)

@RLesur
Copy link
Owner

RLesur commented Feb 28, 2019

You're right, I wrote an "interactive" script. My mistake.

library(crrri)

chrome <- chr_connect()
url = "https://www.rstudio.com"

configured <- 
  chrome %>%
  Page.enable() %>%
  Network.enable() %>%
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) 

intercepted <- 
  configured %>%
  Network.requestIntercepted() %...T>% {
    print("intercepted")
  }

configured %>%
  Page.navigate(url) %...!% {
  }

intercepted %...>% {
  chr_disconnect(chrome)
}

Using callback would be a better idea for this use case

@cderv
Copy link
Collaborator

cderv commented Feb 28, 2019

What another great use case for the event emitter API! I'll post a new example soon hoping it will work ok!

@juba
Copy link
Author

juba commented Mar 1, 2019

Ok, one more thing I fear...

Your last example works fine when using promises only, but it seems I'm in trouble if I try to use callbacks :

library(crrri)
chrome <- chr_connect("google-chrome") 
url = "https://www.rstudio.com" 

configured <- chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>% 
  Network.requestIntercepted(.callback = function(msg) { 
    print(msg$params$request) 
  }) 

ended <- configured %>% 
  Page.navigate(url) %>% 
  Page.loadEventFired() %>%
  DOM.getDocument() %>%
  DOM.getOuterHTML(nodeId = ~.$root$nodeId) %...T>%
  { print(.$result)}

ended %...>% { chr_disconnect(chrome) }

In this case I get an error : Unhandled promise error: objet de type 'closure' non indiçable. It seems that the promise returned when using .callback "breaks" the pipeline.

Once again, I'm sorry if this is a misunderstanding from my part.

@RLesur
Copy link
Owner

RLesur commented Mar 1, 2019

There are 2 different topics: one related to crrri and the other one related to the use of Chrome DevTools.

Using a callback in an event listener

As stated in the documentation, an event listener returns:

An async value of class promise. The value and the completion of the promise differ according to the use of a callback function. [...] When .callback is not NULL, the promise is fulfilled as soon as the callback is created; the value is a function without any argument that can be called to cancel the callback. When you use the .callback argument, you cannot send the result to any other command or event listener.

That means that the configured promise cannot be used as in this example.

Using Network.setRequestInterception()

In the example, all the requests are intercepted.
When the Page.navigate command is used, the request is intercepted and will never be sent...
So, the load event will never fire (the request is not sent, so there will be no response...)

Here's a modified and commented version of the last example:

library(crrri)
chrome <- chr_connect() 
url = "https://www.rstudio.com" 

configured <- 
  chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  # intercept all the requests (why?):
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) %>% 
  Network.requestIntercepted(.callback = function(msg) { 
    print(msg$params$request) 
  }) 

remove_callback <- function() {
  configured %...>%
    do.call(list())
}

# If you want to remove the callback, use:
# remove_callback()

ended <- # WARNING: this promise will never be resolved (see below)
  configured %...>% {
  chrome %>%
  Page.navigate(url) %>% # send a request
  Page.loadEventFired() %>% # Since all the requests are intercepted, the load event will never fire
  DOM.getDocument() %>%
  DOM.getOuterHTML(nodeId = ~.$root$nodeId) %...T>%
  { print(.$result)}
}

# WARNING: The ended promise will never be resolved (use a timeout!)
ended %...>% { chr_disconnect(chrome) }

@juba
Copy link
Author

juba commented Mar 1, 2019

Ok, thanks for the explanation. I didn't find this in the DevTools documentation, but that's what I think I mostly understood.

The way I see it, the only way for it to work would be to issue a Network.continueInterceptedRequest after the requestIntercepted. This work when using only promises, but by putting a callback in requestIntercepted, I don't see how this could be possible as the promises pipeline is "broken".

Anyway I think I can achieve mostly what I'm trying to do by using responseReceived, which seems to be non-blocking.

@RLesur
Copy link
Owner

RLesur commented Mar 1, 2019

You can use Network.continueInterceptedRequest like that:

library(crrri)
chrome <- chr_connect() 
url = "https://www.rstudio.com" 

configured <- 
  chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  # intercept all the requests:
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) 


configured %>% 
  Network.requestIntercepted() %...>% {
    print(.$result$request)
    Network.continueInterceptedRequest(.$ws, interceptionId = .$result$interceptionId)
  }
  
configured %>%
  Page.navigate(url) %>% # send a request
  Page.loadEventFired() %>% 
  DOM.getDocument() %>%
  DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>%
  { print(.$result)}

@juba
Copy link
Author

juba commented Mar 1, 2019

Yes, this works, but can you add a callback in Network.requestIntercepted in this way ?

@RLesur
Copy link
Owner

RLesur commented Mar 1, 2019

I think this script will be fine:

library(crrri)
chrome <- chr_connect() 
url = "https://www.rstudio.com" 

configured <- 
  chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  Network.setRequestInterception(patterns = list(list(urlPattern="*"))) 


configured %>% 
  Network.requestIntercepted(.callback = function(msg) {
    print(msg$params$request)
    configured %>% Network.continueInterceptedRequest(interceptionId = msg$params$interceptionId)
  })
  
configured %>%
  Page.navigate(url) %>% # send a request
  Page.loadEventFired() %>% 
  DOM.getDocument() %>%
  DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>%
  { print(.$result)}

@juba
Copy link
Author

juba commented Mar 1, 2019

For the record, I think this works if you add a interceptionStage="HeadersReceived" argument to setRequestInterception, something like :

library(crrri) 
chrome <- chr_connect("google-chrome", headless = TRUE) 
url <- "https://rstudio.com" 
configured <- chrome %>% 
  Page.enable() %>% 
  Network.enable() %>% 
  Network.setRequestInterception(patterns = list(list(urlPattern="*", interceptionStage="HeadersReceived"))) 
configured %>% 
  Network.requestIntercepted(.callback = function(msg) { 
    print(msg$params$request$url) 
    configured %>% 
      Network.continueInterceptedRequest(interceptionId = msg$params$interceptionId) 
  }) 
ended <- configured %>% 
  Page.navigate(url) %>% 
  Page.loadEventFired() %>% 
  DOM.getDocument() %>% 
  DOM.getOuterHTML(nodeId = ~ .res$root$nodeId) %...T>% 
  { print("---"); print(substring(.$result, 0, 100))}
ended %...>% {print("Done"); chr_disconnect(chrome)}

This is great, many many thanks !

@cderv cderv added the question Further information is requested label Mar 2, 2019
@cderv
Copy link
Collaborator

cderv commented Mar 2, 2019

What another great use case for the event emitter API! I'll post a new example soon hoping it will work ok!

OK for reference, I tested the new API using this use case.
https://gist.github.com/cderv/67d7ad8998559f2ce14b4eb4bb852fd1#file-request_interception-r

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants