Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lab5 fixes #53

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Lab5 fixes #53

wants to merge 2 commits into from

Conversation

Deaponn
Copy link
Contributor

@Deaponn Deaponn commented Dec 27, 2024

First commit fixes some typos
Second commit fixes not using every paragraph in train_data and dev_data datasets, details below.

Sometimes an article inside train_data or dev_data has more than one paragraph.
In this situation, data["paragraph"][0] only uses the first one.
The code
set([len(article["paragraphs"]) for article in train_data]) outputs {1, 2}
both for train_data and dev_data

Fixed the code to utilize every paragraph in the data souce. Changes:

In the cell right below header "Ładowanie danych", line 1376:
Train data articles: 8553 -> 11624
Dev data articles: 1402 -> 1453
Train questions: 41577 -> 56618
Dev questions: 6809 -> 7060

In the cell calculating all_contexts, line 1415:
len(all_contexts): 9955 -> 13077

In the cell which is changing the dataset to a shape of Pytanie: Kontekst:, line 1466:
Total count in train/dev: 75605/12372 -> 102805/12824
Positive count in train/dev: 34028/5563 -> 46187/5764

Sometimes, and article inside `train_data` has more than one paragraph.
In this situation, data["paragraph"][0] does not use the second one.
The code `set([len(article["paragraphs"]) for article in train_data])` outputs `{1, 2}` both for `train_data` and `dev_data`

Fixed the code to utilize every paragraph in the data souce.
Changes:

In the cell right below header "Ładowanie danych":
Train data articles: 8553 -> 11624
Dev data articles: 1402 -> 1453
Train questions: 41577 -> 56618
Dev questions: 6809 -> 7060

In the cell calculating `all_contexts`:
`len(all_contexts)`: 9955 -> 13077

In the cell which is changing the dataset to a shape of Pytanie: Kontekst:
Total count in train/dev: 75605/12372 -> 102805/12824
Positive count in train/dev: 34028/5563 -> 46187/5764
@Deaponn Deaponn changed the title Fix typos Lab5 fixes Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant