Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre/beta #21

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
8c841b3
Merge pull request #15 from ScrapeGraphAI/main
PeriniM Dec 5, 2024
8701eb2
feat: add localScraper functionality
DPende Dec 8, 2024
671161d
style: Improve formatting and style
DPende Dec 8, 2024
cca2d8c
Merge pull request #19 from ScrapeGraphAI/main
VinciGit00 Dec 10, 2024
89d30ff
Merge pull request #18 from ScrapeGraphAI/js-localScraper-implementation
VinciGit00 Dec 10, 2024
0b972c6
fix: minor fix version
VinciGit00 Dec 10, 2024
09257e0
fix: add revert
VinciGit00 Dec 10, 2024
24366b0
fix: python version
VinciGit00 Dec 10, 2024
d88a3ac
feat: revert to old release
VinciGit00 Dec 10, 2024
e719881
fix: .toml file
VinciGit00 Dec 10, 2024
2440f7f
fix: pyproject
VinciGit00 Dec 10, 2024
236d55b
ci(release): 1.9.0-beta.1 [skip ci]
semantic-release-bot Dec 10, 2024
77b67f6
fix: add new python compatibility
VinciGit00 Dec 10, 2024
59611f6
ci(release): 1.9.0-beta.2 [skip ci]
semantic-release-bot Dec 10, 2024
26d3a75
fix: come back to py 3.10
VinciGit00 Dec 10, 2024
a1bf542
Merge branch 'pre/beta' of https://github.com/ScrapeGraphAI/scrapegra…
VinciGit00 Dec 10, 2024
cbf2da4
ci(release): 1.9.0-beta.3 [skip ci]
semantic-release-bot Dec 10, 2024
62243f8
fix: improve api desc
PeriniM Jan 3, 2025
05d57ae
ci(release): 1.9.0-beta.4 [skip ci]
semantic-release-bot Jan 3, 2025
740933a
fix: updated hatchling version
PeriniM Jan 3, 2025
eb86328
Merge branch 'pre/beta' of https://github.com/ScrapeGraphAI/scrapegra…
PeriniM Jan 3, 2025
d03b9bf
ci(release): 1.9.0-beta.5 [skip ci]
semantic-release-bot Jan 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,11 @@ name: Upload Python Package
on:
release:
types: [published]
paths:
- 'scrapegraph-py/**'

jobs:
deploy:

runs-on: ubuntu-latest
# Only run if scrapegraph-py has changes
if: contains(github.event.release.body, 'scrapegraph-py/')

steps:
- uses: actions/checkout@v4
Expand Down
4 changes: 0 additions & 4 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,11 @@ on:
branches:
- main
- pre/*
paths:
- 'scrapegraph-py/**'

jobs:
build:
name: Build
runs-on: ubuntu-latest
# Only run if scrapegraph-py has changes
if: contains(github.event.head_commit.modified, 'scrapegraph-py/') || contains(github.event.head_commit.added, 'scrapegraph-py/') || contains(github.event.head_commit.removed, 'scrapegraph-py/')
steps:
- name: Install git
run: |
Expand Down
35 changes: 33 additions & 2 deletions scrapegraph-js/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ yarn add scrapegraph-js

```javascript
import { smartScraper } from 'scrapegraph-js';
import 'dotenv/config';

// Initialize variables
const apiKey = process.env.SGAI_APIKEY; // Set your API key as an environment variable
Expand Down Expand Up @@ -105,12 +106,43 @@ const schema = z.object({
})();
```

### Scraping local HTML

Extract structured data from local HTML content

```javascript
import { localScraper } from 'scrapegraph-js';

const apiKey = 'your_api_key';
const prompt = 'What does the company do?';

const websiteHtml = `<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
<div class="contact">
<p>Email: [email protected]</p>
</div>
</body>
</html>`;
(async () => {
try {
const response = await localScraper(apiKey, websiteHtml, prompt);
console.log(response);
} catch (error) {
console.error(error);
}
})();
```

### Markdownify

Converts a webpage into clean, well-structured markdown format.

```javascript
import { smartScraper } from 'scrapegraph-js';

const apiKey = "your_api_key";
const apiKey = 'your_api_key';
const url = 'https://scrapegraphai.com/';

(async () => {
Expand All @@ -123,7 +155,6 @@ const url = 'https://scrapegraphai.com/';
})();
```


### Checking API Credits

```javascript
Expand Down
33 changes: 33 additions & 0 deletions scrapegraph-js/examples/localScraper_example.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import { localScraper, getLocalScraperRequest } from 'scrapegraph-js';
import 'dotenv/config';

// localScraper function example
const apiKey = process.env.SGAI_APIKEY;
const prompt = 'What does the company do?';

const websiteHtml = `<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
<div class="contact">
<p>Email: [email protected]</p>
</div>
</body>
</html>`;

try {
const response = await localScraper(apiKey, websiteHtml, prompt);
console.log(response);
} catch (error) {
console.error(error);
}

// getLocalScraperFunctionExample
const requestId = 'b8d97545-9ed3-441b-a01f-4b661b4f0b4c';

try {
const response = await getLocalScraperRequest(apiKey, requestId);
console.log(response);
} catch (error) {
console.log(error);
}
28 changes: 28 additions & 0 deletions scrapegraph-js/examples/schema_localScraper_example.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { localScraper } from 'scrapegraph-js';
import { z } from 'zod';
import 'dotenv/config';

// localScraper function example
const apiKey = process.env.SGAI_APIKEY;
const prompt = 'extract contact';

const websiteHtml = `<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
<div class="contact">
<p>Email: [email protected]</p>
</div>
</body>
</html>`;

const schema = z.object({
contact: z.string().describe('email contact'),
});

try {
const response = await localScraper(apiKey, websiteHtml, prompt, schema);
console.log(response);
} catch (error) {
console.error(error);
}
1 change: 1 addition & 0 deletions scrapegraph-js/index.js
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
export { smartScraper, getSmartScraperRequest } from './src/smartScraper.js';
export { markdownify, getMarkdownifyRequest } from './src/markdownify.js';
export { localScraper, getLocalScraperRequest } from './src/localScraper.js';
export { getCredits } from './src/credits.js';
export { sendFeedback } from './src/feedback.js';
66 changes: 66 additions & 0 deletions scrapegraph-js/src/localScraper.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import axios from 'axios';
import handleError from './utils/handleError.js';
import { ZodType } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

/**
* Extract structured data from local HTML content using ScrapeGraph AI.
*
* @param {string} apiKey - The API key for ScrapeGraph AI.
* @param {string} websiteHtml - HTML content as a string from the local web page to scrape.
* @param {string} prompt - A natural language description of the data to extract.
* @param {Object} [schema] - (Optional) Schema object defining the structure of the desired output.
* @returns {Promise<string>} A JSON string containing the extracted data, formatted to match the schema.
* @throws {Error} If an HTTP error or validation issue occurs.
*/
export async function localScraper(apiKey, websiteHtml, prompt, schema = null) {
const endpoint = 'https://api.scrapegraphai.com/v1/localscraper';
const headers = {
'accept': 'application/json',
'SGAI-APIKEY': apiKey,
'Content-Type': 'application/json',
};

const payload = {
website_html: websiteHtml,
user_prompt: prompt,
};

if (schema) {
if (schema instanceof ZodType) {
payload.output_schema = zodToJsonSchema(schema);
} else {
throw new Error('The schema must be an instance of a valid Zod schema');
}
}

try {
const response = await axios.post(endpoint, payload, { headers });
return response.data;
} catch (error) {
handleError(error);
}
}

/**
* Retrieve the status or result of a localScraper request, including results of previous requests.
*
* @param {string} apiKey - The API key for ScrapeGraph AI.
* @param {string} requestId - The unique ID associated with the localScraper request.
* @returns {Promise<string>} A JSON string containing the status or result of the scraping request.
* @throws {Error} If an error occurs while retrieving the request details.
*/
export async function getLocalScraperRequest(apiKey, requestId) {
const endpoint = 'https://api.scrapegraphai.com/v1/localscraper/' + requestId;
const headers = {
'accept': 'application/json',
'SGAI-APIKEY': apiKey,
};

try {
const response = await axios.get(endpoint, { headers });
return response.data;
} catch (error) {
handleError(error);
}
}
10 changes: 5 additions & 5 deletions scrapegraph-js/src/markdownify.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import handleError from './utils/handleError.js';
* @returns {Promise<string>} A promise that resolves to the markdown representation of the webpage.
* @throws {Error} Throws an error if the HTTP request fails.
*/
export async function markdownify(apiKey, url){
export async function markdownify(apiKey, url) {
const endpoint = 'https://api.scrapegraphai.com/v1/markdownify';
const headers = {
'accept': 'application/json',
Expand All @@ -24,7 +24,7 @@ export async function markdownify(apiKey, url){
const response = await axios.post(endpoint, payload, { headers });
return response.data;
} catch (error) {
handleError(error)
handleError(error);
}
}

Expand All @@ -36,7 +36,7 @@ export async function markdownify(apiKey, url){
* @returns {Promise<string>} A promise that resolves with details about the status or outcome of the specified request.
* @throws {Error} Throws an error if the HTTP request fails.
*/
export async function getMarkdownifyRequest(apiKey, requestId){
export async function getMarkdownifyRequest(apiKey, requestId) {
const endpoint = 'https://api.scrapegraphai.com/v1/markdownify/' + requestId;
const headers = {
'accept': 'application/json',
Expand All @@ -47,6 +47,6 @@ export async function getMarkdownifyRequest(apiKey, requestId){
const response = await axios.get(endpoint, { headers });
return response.data;
} catch (error) {
handleError(error)
handleError(error);
}
}
}
45 changes: 45 additions & 0 deletions scrapegraph-py/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,48 @@
## [1.9.0-beta.5](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.9.0-beta.4...v1.9.0-beta.5) (2025-01-03)


### Bug Fixes

* updated hatchling version ([740933a](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/740933aff79a5873e6d1c633afcedb674d1f4cf0))

## [1.9.0-beta.4](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.9.0-beta.3...v1.9.0-beta.4) (2025-01-03)


### Bug Fixes

* improve api desc ([62243f8](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/62243f84384ae238c0bd0c48abc76a6b99376c74))

## [1.9.0-beta.3](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.9.0-beta.2...v1.9.0-beta.3) (2024-12-10)


### Bug Fixes

* come back to py 3.10 ([26d3a75](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/26d3a75ed973590e21d55c985bf71f3905a3ac0e))

## [1.9.0-beta.2](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.9.0-beta.1...v1.9.0-beta.2) (2024-12-10)


### Bug Fixes

* add new python compatibility ([77b67f6](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/77b67f646d75abd3a558b40cb31c52c12cc7182e))

## [1.9.0-beta.1](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.8.0...v1.9.0-beta.1) (2024-12-10)


### Features

* add localScraper functionality ([8701eb2](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/8701eb2ca7f108b922eb1617c850a58c0f88f8f9))
* revert to old release ([d88a3ac](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/d88a3ac6969a0abdf1f6b8eccde9ad8284d41d20))


### Bug Fixes

* .toml file ([e719881](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/e7198817d8dac802361ab84bc4d5d961fb926767))
* add revert ([09257e0](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/09257e08246d8aee96b3944ac14cc14b88e5f818))
* minor fix version ([0b972c6](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/0b972c69a9ea843d8ec89327f35c287b0d7a2bb4))
* pyproject ([2440f7f](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/2440f7f2a5179c6e3a86faf4eefa1d5edf7524c8))
* python version ([24366b0](https://github.com/ScrapeGraphAI/scrapegraph-sdk/commit/24366b08eefe0789da9a0ccafb8058e8744ee58b))

## [1.8.0](https://github.com/ScrapeGraphAI/scrapegraph-sdk/compare/v1.7.0...v1.8.0) (2024-12-08)


Expand Down
2 changes: 1 addition & 1 deletion scrapegraph-py/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ disallow_untyped_calls = true
ignore_missing_imports = true

[build-system]
requires = ["hatchling"]
requires = ["hatchling==1.26.3"]
build-backend = "hatchling.build"

[tool.poe.tasks]
Expand Down
2 changes: 1 addition & 1 deletion scrapegraph-py/scrapegraph_py/utils/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def validate_api_key(api_key: str) -> bool:
UUID(uuid_part)
except ValueError:
raise ValueError(
"Invalid API key format. API key must be 'sgai-' followed by a valid UUID."
"Invalid API key format. API key must be 'sgai-' followed by a valid UUID. You can get one at https://dashboard.scrapegraphai.com/"
)
return True

Expand Down
6 changes: 3 additions & 3 deletions scrapegraph-py/uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.