Skip to content
This repository was archived by the owner on Jul 19, 2024. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
3f61eaa
add iexec-oracle-contract to deps
sulliwane Oct 4, 2017
e44acc0
add gitignore
sulliwane Oct 4, 2017
104105e
use IexecOracleAPI from npm instead of local
sulliwane Oct 4, 2017
0334db3
add DAPP_PRICE
Oct 17, 2017
d80832b
Update package.json
Oct 18, 2017
bf373ca
Update truffle.js
Oct 18, 2017
7485cf8
Update truffle.js
Oct 18, 2017
a8d115e
Update truffle.js
Oct 18, 2017
be235cb
Update MyContract.sol
Oct 25, 2017
8534944
Update iexec.js
Nov 10, 2017
188f445
Update package.json
Nov 10, 2017
d0627ab
missing coma
Nov 10, 2017
b1c5568
Adding missing comma to truffle.js
Andy92Pac Nov 11, 2017
4712e23
Merge pull request #6 from Andy92Pac/patch-1
sulliwane Nov 12, 2017
5830dc2
add README
Dec 8, 2017
3c54f1d
update oracle to v1.1.1
sulliwane Dec 13, 2017
5ac2792
immprove readme
sulliwane Dec 18, 2017
0172905
remove truffle.js
sulliwane Dec 18, 2017
19b5781
add example fields
sulliwane Dec 18, 2017
bdaaf5e
improve description
sulliwane Dec 19, 2017
527791d
improve desc
sulliwane Dec 19, 2017
58c6f39
improve readme
sulliwane Dec 19, 2017
55a95af
add License
sulliwane Dec 19, 2017
63b924e
remove deps
sulliwane Dec 19, 2017
fadd08b
Improve readme
sulliwane Dec 19, 2017
4b16baa
update init dapp readme
Dec 20, 2017
2d2e861
update init dapp readme
Dec 20, 2017
11fc114
update init dapp readme
Dec 20, 2017
b4db05b
Merge pull request #15 from iExecBlockchainComputing/IEXPROD-212-init
sulliwane Dec 20, 2017
148a971
[feature] initial scrapto commit
Jan 13, 2018
401d849
[feature] removed worker and moved to new repo
Jan 13, 2018
90e2880
[feature] updated readme
Jan 13, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Logs
logs
*.log
npm-debug.log*

# Runtime data
pids
*.pid
*.seed

# Coverage directory used by tools like istanbul
coverage

# node-waf configuration
.lock-wscript

# Dependency directory
node_modules

# Compiled JS directory
/dist/*
!/dist/iexec.js

# Optional npm cache directory
.npm

# Optional REPL history
.node_repl_history

docs/
wallet.json
account.json
165 changes: 165 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 3, 29 June 2007

Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.


This version of the GNU Lesser General Public License incorporates
the terms and conditions of version 3 of the GNU General Public
License, supplemented by the additional permissions listed below.

0. Additional Definitions.

As used herein, "this License" refers to version 3 of the GNU Lesser
General Public License, and the "GNU GPL" refers to version 3 of the GNU
General Public License.

"The Library" refers to a covered work governed by this License,
other than an Application or a Combined Work as defined below.

An "Application" is any work that makes use of an interface provided
by the Library, but which is not otherwise based on the Library.
Defining a subclass of a class defined by the Library is deemed a mode
of using an interface provided by the Library.

A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the Library
with which the Combined Work was made is also called the "Linked
Version".

The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source code
for portions of the Combined Work that, considered in isolation, are
based on the Application, and not on the Linked Version.

The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any data
and utility programs needed for reproducing the Combined Work from the
Application, but excluding the System Libraries of the Combined Work.

1. Exception to Section 3 of the GNU GPL.

You may convey a covered work under sections 3 and 4 of this License
without being bound by section 3 of the GNU GPL.

2. Conveying Modified Versions.

If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an Application
that uses the facility (other than as an argument passed when the
facility is invoked), then you may convey a copy of the modified
version:

a) under this License, provided that you make a good faith effort to
ensure that, in the event an Application does not supply the
function or data, the facility still operates, and performs
whatever part of its purpose remains meaningful, or

b) under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.

3. Object Code Incorporating Material from Library Header Files.

The object code form of an Application may incorporate material from
a header file that is part of the Library. You may convey such object
code under terms of your choice, provided that, if the incorporated
material is not limited to numerical parameters, data structure
layouts and accessors, or small macros, inline functions and templates
(ten or fewer lines in length), you do both of the following:

a) Give prominent notice with each copy of the object code that the
Library is used in it and that the Library and its use are
covered by this License.

b) Accompany the object code with a copy of the GNU GPL and this license
document.

4. Combined Works.

You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each of
the following:

a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.

b) Accompany the Combined Work with a copy of the GNU GPL and this license
document.

c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.

d) Do one of the following:

0) Convey the Minimal Corresponding Source under the terms of this
License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.

1) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (a) uses at run time
a copy of the Library already present on the user's computer
system, and (b) will operate properly with a modified version
of the Library that is interface-compatible with the Linked
Version.

e) Provide Installation Information, but only if you would otherwise
be required to provide such information under section 6 of the
GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the
Application with a modified version of the Linked Version. (If
you use option 4d0, the Installation Information must accompany
the Minimal Corresponding Source and Corresponding Application
Code. If you use option 4d1, you must provide the Installation
Information in the manner specified by section 6 of the GNU GPL
for conveying Corresponding Source.)

5. Combined Libraries.

You may place library facilities that are a work based on the
Library side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:

a) Accompany the combined library with a copy of the same work based
on the Library, uncombined with any other library facilities,
conveyed under the terms of this License.

b) Give prominent notice with the combined library that part of it
is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.

6. Revised Versions of the GNU Lesser General Public License.

The Free Software Foundation may publish revised and/or new versions
of the GNU Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the
Library as you received it specifies that a certain numbered version
of the GNU Lesser General Public License "or any later version"
applies to it, you have the option of following the terms and
conditions either of that published version or of any later version
published by the Free Software Foundation. If the Library as you
received it does not specify a version number of the GNU Lesser
General Public License, you may choose any version of the GNU Lesser
General Public License ever published by the Free Software Foundation.

If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
Library.
91 changes: 82 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,86 @@
# iexec dapps samples
# Scrapto
# Idea Proposal
Scrapto is developing technology to provide industry-leading services for data extraction.
It aims to takes full advantage of the decentralized cloud by using:
* iExec computing marketplace to perform data extraction tasks
* blockchain tech to enable per-request billing via microtransactions
## Why now
Three main components are vital in the design of our solution:
##### Micropayment model
The latest advances in cryptocurrency facilitates micropayments. Per-request billing is enabled by iExec RLC.
##### Computing marketplace
Without iExec computing marketplace we would have to implement a network of client/worker nodes ourselves specialized in scraping tasks. But most of the difficult work is already done by iExec:
* Creating software for the worker nodes
* Scheduling
* Proof of contribution
* and a lot of other things
##### Web scraping advances
Up until recently, it was difficult to fully simulate a user session during scraping. Launch of Chrome headless changed that. It's like a normal Chrome browser that can be controlled programmatically. This enables us:
* bypass web-scraping detection mechanisms more easily
* provide users with advanced scraping scenarios
## Our solution
### Data Extraction Tech
There are several important components in web scraping. In a nutshell:
* Having multiple ips. Requests coming from the same IPs are easily detected and then blocked.
* Using screenscrapers (headless browsers). Harder to detect, as they run scripts, render HTML, and can behave like a real human browsing a site.

This is the registry for sample iexec dapps, used by the iexec-sdk cli.
**Professional web scraping services**: use large networks of proxies and ever changing IP addresses to get around limits and blocks.
We are looking into iExec architecture to see if we can use individual worker nodes IPs to make data extraction requests. Or if it’s allowed.
This would mean we do not need to invest in proxy pools and the traffic might seem more organic.

Each branch name of this repo can be used as an argument to iexec init command.
**Headless parsing** The ultimate goal of a web scraper session is to be indistinguishable from a normal user session. A headless browser is a browser that can be used without a graphical interface. It can be controlled programmatically to automate tasks, such as doing tests or taking screenshots of webpages. We are planning to use Chrome headless to run our data extraction tasks on web pages.
### Components
#### Visual data extraction tool
A powerful visual tool to create data extraction flows. Available as a Chrome extension, extraction flows results can be seen live then exported and used as input in a chrome headless instance in a Iexec worker. Same data extraction code run in the extension is run in the headless browser as well.
#### Scrapto Dashboard
The Dashboard is the frontend dAPP of our application. In the Dashboard users can:
* create new scraper projects
* input data extraction flows
* devise scraper schedules
#### iExec Worker App
Worker running data extraction flows configured in the visual helper tool. Same code powering extraction results preview in the extension is used here.
Depending on the configured job the worker can run the flow in:
* jsdom: A JavaScript implementation of the WHATWG DOM and HTML standards, for use with node.js
* Chrome Headless run code in a Chrome browser without GUI.

ex:
# Roadmap
## Current work
* We already have a working prototype of the visual data extraction tool:
* Runs as an extension in Chrome
* Runs the same data extraction flow in jsdom (Node.js) as in Chrome
* We did not start from scratch. We are modifying a fork of an open source extension.
## Version 0.1 [Done]
* Bootstrap iExec project
* Create a binary from the Node.js app using nexe
* Simple Dapp smart contract
## Version 0.2 (3 months)
* Make job scheduling work (1 month)
* Invoke Ethereum iExec task from back-end
* Research and implement jobs results persistence (1 month)
* Add new features to visual extraction tool (1 month)
* Enable passing options to JSDOM to toggle script execution inside pages based on job details
* Add new headless provider as an alternative to jsdom: Google Chrome headless
* Research how to package app with new provider
## Version 0.3 (1 month)
* Develop Scrapto Dahsboard initial version

```bash
iexec init
iexec init factorial
iexec init echo
```
## Version 1.0 (2 months)
* Add notification streams when content changes. A change in a specific selector value will trigger this.
* Useful for :
* ecommerce price alignment and monitoring.
* competitor SEO monitoring
## Version 2.0 (1 month)
* add a plugin like architecture and let third-party plugins run on extracted content
* various layers of functionality can be added to the project
* content diffing
* sentiment analysis
# Component diagram
![component diagram](https://raw.githubusercontent.com/skunkworkscryptolab/iexec-dapp-samples/scrapto/img/scrapto-component.png "Component Diagram")
# Sequential diagram
![sequential diagram](https://raw.githubusercontent.com/skunkworkscryptolab/iexec-dapp-samples/scrapto/img/scrapto-sequence.png "Sequential Diagram")

# Resources
* Our visual tool and headless worker repo can be found here: https://github.com/skunkworkscryptolab/scrapto-client-worker.

# Copyright
Our visual tool extension is a fork of the amazing work done by: https://github.com/furstenheim/web-scraper-chrome-extension/
7 changes: 7 additions & 0 deletions apps/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Your iExec Dapp is composed of two parts:

* under the ```apps``` directory :
Put the offchain app (any kind of legacy application). The offchain app will be executed by the iExec decentralized cloud.

* under the ```contracts``` directory :
A smart contract that interfaces with your iExec Dapp, it will serve as a gateway from Ethereum to your offchain app.
Binary file added apps/Scrapto
Binary file not shown.
Loading