Text summarization for Ukrainian language

Relevant information:

There was no suitable dataset for news in Ukrainian so we parsed the articles ourselves from http://texty.org.ua/.

In the first approach in the file text_processing.py we perform our own lemmatization and tokenization of the dataset.

The embeddings that we used in the second approach were downloaded from the lang-uk website—the particular one is the 300d lowercase news Word2Vec.

Extractive Summarization

Extractive methods attempt to summarize articles by selecting a subset of words that retain the most important points, i.e. this approach weights the important part of sentences and uses the same to form the summary. We define weights for the sentences and further rank them based on importance and similarity among each other. Basically the algorithm goes as follows:

Input article → split into sentences → lemmatize → remove stop words → build a similarity matrix → generate rank based on matrix → pick top N sentences for summary
Deep Learning for Abstractive Summarization

Abstractive summarization, ideally, is closer to how a human would summarize a large document. In practice, it's very hard to implement. The goal of abstractive summarization is to generate sentences describing the content of the document, often using words and phrases that weren't even used in the original. In our case we used a traditional sequence-to-sequence model with attention. This model is specifically customized for the text summarization task.

Seq2Seq model:
- uses an encoder (multilayer RNN with LSTM)
- the decoder is built using a Bahdanau Attention model

Example of extractive summarization from our code:

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
deep_learning		deep_learning
extractive_summarization		extractive_summarization
images		images
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text summarization for Ukrainian language

About

Releases

Packages

Contributors 2

Languages

ua-textsummarization/UANewsSummarization

Folders and files

Latest commit

History

Repository files navigation

Text summarization for Ukrainian language

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages