Skip to content

Commit

Permalink
Merge pull request #87 from GeorgeKontsevik/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
Sandrro authored Jul 29, 2024
2 parents 6e41e0d + 1e34ba3 commit 03ac0d5
Show file tree
Hide file tree
Showing 179 changed files with 28,821 additions and 37,575 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -139,4 +139,4 @@ cache/
# idea
.idea/
.graphml
.geojson
.geojson
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![Documentation Status](https://readthedocs.org/projects/soika/badge/?version=latest)](https://soika.readthedocs.io/en/latest/?badge=latest)
[![PythonVersion](https://img.shields.io/badge/python-3.11-blue)](https://pypi.org/project/scikit-learn/)
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Tests](https://github.com/GeorgeKontsevik/sloyka/actions/workflows/2dev_ci_on_pr.yaml/badge.svg?branch=dev)](https://github.com/GeorgeKontsevik/sloyka/actions/workflows/2dev_ci_on_pr.yaml)
[![Tests](https://github.com/GeorgeKontsevik/sloyka/.github/workflows/2dev_ci_on_pr.yaml/badge.svg?branch=dev)](https://github.com/GeorgeKontsevik/sloyka/.github/workflows/2dev_ci_on_pr.yaml)

[![sloyka_community_chat](https://img.shields.io/badge/-community-blue?logo=telegram)](https://t.me/sloyka_community)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wCUJAqlq9GMKw1wpTsWrzYwr10pWDeHv?usp=sharing)
Expand Down Expand Up @@ -38,5 +38,5 @@ [email protected] (Александр Антонов, Project Lead)
[email protected] (just in case).

## Цитирование

Antonov, A., Gornova, G., Kontsevik, G., Turkov, L., Vorona, V., & Mityagin, S. (2024, July). Transformation of Local Communities from Neighborhoods to Urban Commons in the Production of Social Representations of Space. In International Conference on Computational Science and Its Applications (pp. 436-447). Cham: Springer Nature Switzerland.
---
Binary file added docs/build/doctrees/404.doctree
Binary file not shown.
Binary file not shown.
Binary file added docs/build/doctrees/about/installation.doctree
Binary file not shown.
Binary file added docs/build/doctrees/about/introduction.doctree
Binary file not shown.
Binary file added docs/build/doctrees/about/pipeline.doctree
Binary file not shown.
Binary file added docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file added docs/build/doctrees/index.doctree
Binary file not shown.
Binary file added docs/build/doctrees/modules/GeoDataGetter.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/build/doctrees/modules/Streets.doctree
Binary file not shown.
Binary file added docs/build/doctrees/modules/VKParser.doctree
Binary file not shown.
Binary file not shown.
Binary file added docs/build/doctrees/modules/data_getter.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/build/doctrees/modules/geocoder.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
4 changes: 4 additions & 0 deletions docs/build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 34fe29b96d534ed85aad7b7243c9abd1
tags: 645f666f9bcd5a90fca523b33c5a78b7
349 changes: 349 additions & 0 deletions docs/build/html/404.html

Large diffs are not rendered by default.

Binary file added docs/build/html/_images/etap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/build/html/_images/sloyka_map.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
335 changes: 335 additions & 0 deletions docs/build/html/_modules/index.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

927 changes: 927 additions & 0 deletions docs/build/html/_modules/sloyka/src/geocoder/geocoder.html

Large diffs are not rendered by default.

829 changes: 829 additions & 0 deletions docs/build/html/_modules/sloyka/src/geocoder/street_extractor.html

Large diffs are not rendered by default.

439 changes: 439 additions & 0 deletions docs/build/html/_modules/sloyka/src/risks/emotion_classifier.html

Large diffs are not rendered by default.

824 changes: 824 additions & 0 deletions docs/build/html/_modules/sloyka/src/risks/event_detector.html

Large diffs are not rendered by default.

400 changes: 400 additions & 0 deletions docs/build/html/_modules/sloyka/src/risks/text_classifier.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

402 changes: 402 additions & 0 deletions docs/build/html/_modules/sloyka/src/visual/graph_visualization.html

Large diffs are not rendered by default.

12 changes: 12 additions & 0 deletions docs/build/html/_sources/404.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Page Not Found
--------------

.. raw:: html

<script id="404-page-script">
const isPage404 = true
</script>

Sorry, we couldn't find that page.

Try using the search box or go to the homepage.
5 changes: 5 additions & 0 deletions docs/build/html/_sources/about/includes/pipeline.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Main pipeline
============


You can get more info about each step in:
8 changes: 8 additions & 0 deletions docs/build/html/_sources/about/installation.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Installation
============

To install Sloyka, run:

.. code-block:: bash
pip install sloyka
26 changes: 26 additions & 0 deletions docs/build/html/_sources/about/introduction.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
Introduction
============
Sloyka documentation
Date: June, 2024 Version: 0.6
SLOYKA is a library aimed at enriching digital city models with data obtained from textual data of citizens' digital footprints, as well as at modeling vernacular assessment of urban environment quality.

Its main element is a constructible spatial semantic hypergraph, augmented by machine recognition of urban entities and locations.

The SLOYKA's final result is a spatial semantic hypergraph, which generates after two main stages: data receiving
(messages from the social network, mentioning particular city objects in them) and additional processes of data tagging of the collected data to obtain new columns in the resulting GeoDataFrame.
The resulting hypergraph can be used to predict events within existing urban objects (module :ref:`regional_activity`),
or to visualize already existing nodes and links and their further interpretation (module :ref:`graph_visualization`)

SLOYKA also provides methods for modeling social risks regarding the emotional evaluation of mentioned places.

Main features
--------
* Social media parsing: getting posts, comments and replys
* City services and places extraction
* Emotion and text classifiers categorizing
* City's topic modelling
* Spatial-semantic graph building
* Regional activity evaluation

SLOYKA's Community chat:
https://t.me/sloyka_community
25 changes: 25 additions & 0 deletions docs/build/html/_sources/about/pipeline.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Main pipeline
============

By selecting a limited urbanized area and a list of online communities in a social network,
it is possible to run this dataset across all major library functions. However, in some cases,
the order in which they are run is important.

.. figure:: /image/etap.png
:align: center
:alt: photo

SLOYKA's sections

The main sections were divided into:
* Data receiving (a step possible to skip only if there is already geolocated
text data mentioning urban sites, otherwise the steps are very important - :ref:`data_getter` and :ref:`geocoder` )

* Data tagging: Characterization of messages and urban objects, which can be carried out in any order: :ref:`emotion_classifier` :ref:`text_classifier` :ref:`city_services` :ref:`topic_modeler`

* Data modelling: Section consists of further synthesis of the obtained data, risk assessment and forecasting.
Each of the methods in this group requires certain labeling columns: :ref:`sem_graph` :ref:`regional_activity`

* Data visualization: The last step is applied to the already generated semantic graph - :ref:`graph_visualization`

You can get more info about each step in!
61 changes: 61 additions & 0 deletions docs/build/html/_sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
Welcome to Sloyka's documentation!
==================================

Sloyka is a library for analyzing city identity using social media data.

.. toctree::
:maxdepth: 1
:caption: General:

about/introduction
about/installation
about/pipeline

.. figure:: /image/sloyka_map.png
:align: center
:alt: photo

SLOYKA'S ROADMAP

.. toctree::
:maxdepth: 1
:caption: Receiving:
:hidden:

modules/data_getter
modules/geocoder

.. toctree::
:maxdepth: 1
:caption: Tagging:
:hidden:

modules/city_services_extract
modules/emotion_classifier
modules/text_classifier
modules/topic_modeler

.. toctree::
:maxdepth: 1
:caption: Modelling:
:hidden:

modules/semantic_graph
modules/regional_activity
modules/event_dynamic_prediction

.. toctree::
:maxdepth: 1
:caption: Visualization:
:hidden:

modules/visualize_graph
404


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
9 changes: 9 additions & 0 deletions docs/build/html/_sources/modules/GeoDataGetter.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _GeoDataGetter:

GeoDataGetter
==========================
This class is used to retrieve geospatial data from OpenStreetMap (OSM) based on given OSM ID and tags.

.. autoclass:: sloyka.src.utils.data_getter.GeoDataGetter
:members:
:undoc-members:
9 changes: 9 additions & 0 deletions docs/build/html/_sources/modules/Geocoder_special.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _Geocoder_special:
Geocoder
==================
.. autoclass:: sloyka.src.geocoder.geocoder.Geocoder
:members:
:undoc-members:
:no-members: run

Back to all :ref:`geocoder`
15 changes: 15 additions & 0 deletions docs/build/html/_sources/modules/OtherGeoObjects.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. _geo_objects:

OtherGeoObjects
==================
.. currentmodule:: sloyka.src.geocoder

.. autoclass:: city_objects_extractor.OtherGeoObjects
:members:
:undoc-members:

.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.get_and_process_osm_data
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.run_osm_dfs
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.calculate_centroid
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.extract_geo_obj
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.restoration_of_normal_form
8 changes: 8 additions & 0 deletions docs/build/html/_sources/modules/StreetExtractor.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _street_extractor:

StreetExtractor
==================

.. autoclass:: sloyka.src.geocoder.street_extractor.StreetExtractor
:members:
:undoc-members: process_pipeline
8 changes: 8 additions & 0 deletions docs/build/html/_sources/modules/Streets.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _Streets:

Streets
==========================
A class for working with street data.
.. autoclass:: sloyka.src.utils.data_getter.Streets
:members:
:undoc-members:
7 changes: 7 additions & 0 deletions docs/build/html/_sources/modules/VKParser.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.. _vkparser:

VKparser
==========================
.. autoclass:: sloyka.src.utils.data_getter.VKParser
:members:
:undoc-members:
11 changes: 11 additions & 0 deletions docs/build/html/_sources/modules/city_services_extract.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _city_services:

Services extraction
==========================
The City_services class is designed to extract city service names from text using a string comparison algorithm, taking into account
the changing service endings in the text. Using the flair library, the City_services.run() method in messages extracts named entities
from the Sentence object as a list, as well as the most probable service type, and stores them in new columns of the original DataFrame().
.. automodule:: sloyka.src.utils.data_processing.city_services_extract
:members:
:undoc-members:

27 changes: 27 additions & 0 deletions docs/build/html/_sources/modules/data_getter.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. _data_getter:

Data getter
==========================

This module contains classes for retrieving and working with various types of data.
Sloyka uses class :ref:`vkparser` data from the social network VK (Vkontakte) , as well as data from the OSM retrieved using
:ref:`GeoDataGetter`

@class:: This class is used to retrieve geospatial data from OpenStreetMap (OSM) based on given OSM ID and tags.

.. _GeoDataGetter:
A class for parsing and working with VK comments and posts. Combines posts and comments into one dataframe.


@class:Streets: A class for working with street data.


more:
-------------------------------------
.. toctree::
:maxdepth: 1
:caption: Advanced geocoding

VKParser
Streets
GeoDataGetter
17 changes: 17 additions & 0 deletions docs/build/html/_sources/modules/emotion_classifier.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _emotion_classifier:

Emotion classifier
==================

.. automodule:: sloyka.src.risks.emotion_classifier
:members:
:undoc-members:


Example
-------
.. code-block:: bash
df = pd.read_csv('data.csv')
recognizer = EmotionRecognizer()
df['emotion'] = df['text'].apply(recognizer.recognize_emotion)
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _modules:

Event detector
==================

.. automodule:: sloyka.src.risks.event_detector
:members:
:undoc-members:

31 changes: 31 additions & 0 deletions docs/build/html/_sources/modules/geocoder.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
.. _geocoder:
==================
Geocoding
==================
To perform classification with generalized linear models, see
:ref:`Geocoder_special`.

Geocoder
-------------------
.. autoclass:: sloyka.src.geocoder.geocoder.Geocoder
:members: run

OtherGeoObjects
---------------------
.. autoclass:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects
:members: run

StreetExtractor
---------------------
.. autoclass:: sloyka.src.geocoder.street_extractor.StreetExtractor
:members: process_pipeline

more:
-------------------------------------
.. toctree::
:maxdepth: 1
:caption: Advanced geocoding

Geocoder_special
OtherGeoObjects
StreetExtractor
9 changes: 9 additions & 0 deletions docs/build/html/_sources/modules/regional_activity.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _regional_activity:

Regional activity
==================
The regional_activity module is designed to aggregate data by region and provide information about user activity.
The RegionalActivity class creates a GeoDataFrame with basic information about user activity, using other modules such as geocoder,
text classifier, city_services_extract and emotion_classifier to process the data. The processed data is stored in the class attribute
processed_geodata and can be called after the class is initialized with RegionalActivity.processed_geodata. The class includes the get_risks()
function, which returns a DataFrame with social risk information based on the provided texts.
13 changes: 13 additions & 0 deletions docs/build/html/_sources/modules/semantic_graph.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.. _sem_graph:

Semantic graph
-------------------------------------------------

.. automodule:: sloyka.src.semantic_graph.semantic_graph_builder
:members:
:undoc-members:

As a result of the main method Semgraph.build_graph(), the input set of messages is cleaned from duplicates, digits, identified place names
and references. For each message, a given number of keywords is extracted using the KeyBERT library model; thanks to the application of pytorch,
the semantic proximity between keywords is determined as the cosine distance in the resulting embeddings. The final result of the module is a graph,
the nodes of which are toponyms (obtained by the geolocation module) and keywords.
14 changes: 14 additions & 0 deletions docs/build/html/_sources/modules/text_classifier.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. _text_classifier:

Text Classifier
==========================
The text_classifiers module is designed to classify texts by city functions, such as housing and utilities, public amenities, transportation,
health care, and others, using a pre-trained BERT family model in Russian. The module processes the input text and classifies it into specific urban functions using a
pre-trained rubert-tiny2 model trained on 90,000 marked accesses. The main method, run_text_classifier(), calls the model, takes text as input, and returns up to three predicted
city functions with their probability of being correctly identified.


.. automodule:: sloyka.src.risks.text_classifier
:members:
:undoc-members:
:show-inheritance:
Loading

0 comments on commit 03ac0d5

Please sign in to comment.