Skip to content

Commit

Permalink
Merge pull request #62 from microsoft/jamesqa
Browse files Browse the repository at this point in the history
Updated Sample Data README and tiktoken version.
  • Loading branch information
andrewldesousa authored Aug 30, 2024
2 parents b88d4f6 + 360b0d1 commit 0e25de3
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ azure-ai-documentintelligence==1.0.0b2
Markdown==3.4.4
requests==2.32.3
tqdm==4.66.1
tiktoken==0.4.0
tiktoken
langchain==0.2.12
bs4==0.0.1
urllib3==2.2.2
Expand Down
8 changes: 4 additions & 4 deletions scripts/SAMPLE_DATA.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- Copy and paste the contents from the scripts/.env.sample file.
- Replace the values for `<AZURE_OPENAI_RESOURCE>` and `<AZURE_OPENAI_KEY>` with the name of the Azure OpenAI resource and either KEY 1 or KEY 2.
- Save the .env file.
- Within the scripts folder, create a config file `config.json`. The format will be a list of JSON objects, with each object specifying a configuration of local data path and target search service and index. Assuming you used "Deploy to Azure" to deploy this solution accelerator, these values can be found within the resources themselves. Copy and paste the following script block into the config.json file and update accordingly.
- Within the scripts folder, create a config file `config.json`. The format will be a list of JSON objects, with each object specifying a configuration of local data path and target search service and index. Assuming you used "Deploy to Azure" to deploy this solution accelerator, these values can be found within the resources themselves. If you did not change the Search Index name, the default value is: promissory-notes-index. Copy and paste the following script block into the config.json file and update accordingly.

```
[
Expand All @@ -21,7 +21,7 @@
"subscription_id": "<subscription id>",
"resource_group": "<resource group name>",
"search_service_name": "<search service name to use>",
"index_name": "promissory-notes-index",
"index_name": "<search index name to use>",
"chunk_size": 1024,
"token_overlap": 128,
"semantic_config_name": "default",
Expand All @@ -36,8 +36,8 @@
- Create a virtual environment for the sample data preparation
- Open a terminal window.
- Create the virtual environment: `python -m venv scriptsenv`
- Activate the virtual environment: `.\scriptsenv\bin\activate`
- Install the necessary packages listed in scripts/requirements-dev.txt, e.g. `pip install --user -r requirements-dev.txt`
- Activate the virtual environment: `.\scriptsenv\Scripts\activate`
- Install the necessary packages listed in scripts/requirements-dev.txt, e.g. `pip install -r requirements-dev.txt`
- Create the index and ingest PDF data with Form Recognizer
- Replace `<form-rec-resource-name>` with the name of the existing or recently created Azure Document Intelligence (Form Recognizer) resource and replace `<form-rec-key>` with key 1 or key 2 of the existing or recently created Azure Document Intelligence (Form Recognizer) resource:
`python data_preparation.py --config config.json --njobs=1 --form-rec-resource <form-rec-resource-name> --form-rec-key <form-rec-key>`
Expand Down

0 comments on commit 0e25de3

Please sign in to comment.