Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong main.py file inside pdf_reader folder. #1

Open
Ahmad-mufied opened this issue Jul 25, 2023 · 1 comment
Open

Wrong main.py file inside pdf_reader folder. #1

Ahmad-mufied opened this issue Jul 25, 2023 · 1 comment

Comments

@Ahmad-mufied
Copy link

Wrong main.py file which should contain all things related to PDF reader but instead contains about pygame

@carolinavc99
Copy link

carolinavc99 commented Nov 15, 2023

Here is the code. The only change i did was i turned print('-'*10) into a function called divide().

import re
from collections import Counter
from PyPDF2 import PdfReader


def extract_text_from_pdf(pdf_file: str) -> list[str]:
    with open(pdf_file, 'rb') as pdf:
        reader = PdfReader(pdf_file, strict=False)

        print('Pages:', len(reader.pages))
        divide()

        return [page.extract_text() for page in reader.pages]


def divide():
    print('-' * 75)


def count_words(text_list: list[str]) -> Counter:
    all_words: list[str] = []
    for text in text_list:
        split_text: list[str] = re.split(r'\s+|[,;?!.-]\s*', text.lower())
        all_words += [word for word in split_text if word] # exclude empty string
    return Counter(all_words)


def main():
    extracted_text: list[str] = extract_text_from_pdf('sample.pdf')
    counter: Counter = count_words(text_list=extracted_text)

    for page in extracted_text:
        print(page)

    divide()

    for word, mentions in counter.most_common(5):
        print(f'{word:10} : {mentions} uses')


if __name__ == '__main__':
    main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants