-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
Dev
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -139,4 +139,4 @@ cache/ | |
# idea | ||
.idea/ | ||
.graphml | ||
.geojson | ||
.geojson |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
[![Documentation Status](https://readthedocs.org/projects/soika/badge/?version=latest)](https://soika.readthedocs.io/en/latest/?badge=latest) | ||
[![PythonVersion](https://img.shields.io/badge/python-3.11-blue)](https://pypi.org/project/scikit-learn/) | ||
[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) | ||
[![Tests](https://github.com/GeorgeKontsevik/sloyka/actions/workflows/2dev_ci_on_pr.yaml/badge.svg?branch=dev)](https://github.com/GeorgeKontsevik/sloyka/actions/workflows/2dev_ci_on_pr.yaml) | ||
[![Tests](https://github.com/GeorgeKontsevik/sloyka/.github/workflows/2dev_ci_on_pr.yaml/badge.svg?branch=dev)](https://github.com/GeorgeKontsevik/sloyka/.github/workflows/2dev_ci_on_pr.yaml) | ||
|
||
[![sloyka_community_chat](https://img.shields.io/badge/-community-blue?logo=telegram)](https://t.me/sloyka_community) | ||
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wCUJAqlq9GMKw1wpTsWrzYwr10pWDeHv?usp=sharing) | ||
|
@@ -38,5 +38,5 @@ [email protected] (Александр Антонов, Project Lead) | |
[email protected] (just in case). | ||
|
||
## Цитирование | ||
|
||
Antonov, A., Gornova, G., Kontsevik, G., Turkov, L., Vorona, V., & Mityagin, S. (2024, July). Transformation of Local Communities from Neighborhoods to Urban Commons in the Production of Social Representations of Space. In International Conference on Computational Science and Its Applications (pp. 436-447). Cham: Springer Nature Switzerland. | ||
--- |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 34fe29b96d534ed85aad7b7243c9abd1 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Page Not Found | ||
-------------- | ||
|
||
.. raw:: html | ||
|
||
<script id="404-page-script"> | ||
const isPage404 = true | ||
</script> | ||
|
||
Sorry, we couldn't find that page. | ||
|
||
Try using the search box or go to the homepage. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Main pipeline | ||
============ | ||
|
||
|
||
You can get more info about each step in: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Installation | ||
============ | ||
|
||
To install Sloyka, run: | ||
|
||
.. code-block:: bash | ||
pip install sloyka |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
Introduction | ||
============ | ||
Sloyka documentation | ||
Date: June, 2024 Version: 0.6 | ||
SLOYKA is a library aimed at enriching digital city models with data obtained from textual data of citizens' digital footprints, as well as at modeling vernacular assessment of urban environment quality. | ||
|
||
Its main element is a constructible spatial semantic hypergraph, augmented by machine recognition of urban entities and locations. | ||
|
||
The SLOYKA's final result is a spatial semantic hypergraph, which generates after two main stages: data receiving | ||
(messages from the social network, mentioning particular city objects in them) and additional processes of data tagging of the collected data to obtain new columns in the resulting GeoDataFrame. | ||
The resulting hypergraph can be used to predict events within existing urban objects (module :ref:`regional_activity`), | ||
or to visualize already existing nodes and links and their further interpretation (module :ref:`graph_visualization`) | ||
|
||
SLOYKA also provides methods for modeling social risks regarding the emotional evaluation of mentioned places. | ||
|
||
Main features | ||
-------- | ||
* Social media parsing: getting posts, comments and replys | ||
* City services and places extraction | ||
* Emotion and text classifiers categorizing | ||
* City's topic modelling | ||
* Spatial-semantic graph building | ||
* Regional activity evaluation | ||
|
||
SLOYKA's Community chat: | ||
https://t.me/sloyka_community |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
Main pipeline | ||
============ | ||
|
||
By selecting a limited urbanized area and a list of online communities in a social network, | ||
it is possible to run this dataset across all major library functions. However, in some cases, | ||
the order in which they are run is important. | ||
|
||
.. figure:: /image/etap.png | ||
:align: center | ||
:alt: photo | ||
|
||
SLOYKA's sections | ||
|
||
The main sections were divided into: | ||
* Data receiving (a step possible to skip only if there is already geolocated | ||
text data mentioning urban sites, otherwise the steps are very important - :ref:`data_getter` and :ref:`geocoder` ) | ||
|
||
* Data tagging: Characterization of messages and urban objects, which can be carried out in any order: :ref:`emotion_classifier` :ref:`text_classifier` :ref:`city_services` :ref:`topic_modeler` | ||
|
||
* Data modelling: Section consists of further synthesis of the obtained data, risk assessment and forecasting. | ||
Each of the methods in this group requires certain labeling columns: :ref:`sem_graph` :ref:`regional_activity` | ||
|
||
* Data visualization: The last step is applied to the already generated semantic graph - :ref:`graph_visualization` | ||
|
||
You can get more info about each step in! |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
Welcome to Sloyka's documentation! | ||
================================== | ||
|
||
Sloyka is a library for analyzing city identity using social media data. | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: General: | ||
|
||
about/introduction | ||
about/installation | ||
about/pipeline | ||
|
||
.. figure:: /image/sloyka_map.png | ||
:align: center | ||
:alt: photo | ||
|
||
SLOYKA'S ROADMAP | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Receiving: | ||
:hidden: | ||
|
||
modules/data_getter | ||
modules/geocoder | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Tagging: | ||
:hidden: | ||
|
||
modules/city_services_extract | ||
modules/emotion_classifier | ||
modules/text_classifier | ||
modules/topic_modeler | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Modelling: | ||
:hidden: | ||
|
||
modules/semantic_graph | ||
modules/regional_activity | ||
modules/event_dynamic_prediction | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Visualization: | ||
:hidden: | ||
|
||
modules/visualize_graph | ||
404 | ||
|
||
|
||
Indices and tables | ||
================== | ||
|
||
* :ref:`genindex` | ||
* :ref:`modindex` | ||
* :ref:`search` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
.. _GeoDataGetter: | ||
|
||
GeoDataGetter | ||
========================== | ||
This class is used to retrieve geospatial data from OpenStreetMap (OSM) based on given OSM ID and tags. | ||
|
||
.. autoclass:: sloyka.src.utils.data_getter.GeoDataGetter | ||
:members: | ||
:undoc-members: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
.. _Geocoder_special: | ||
Geocoder | ||
================== | ||
.. autoclass:: sloyka.src.geocoder.geocoder.Geocoder | ||
:members: | ||
:undoc-members: | ||
:no-members: run | ||
|
||
Back to all :ref:`geocoder` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
.. _geo_objects: | ||
|
||
OtherGeoObjects | ||
================== | ||
.. currentmodule:: sloyka.src.geocoder | ||
|
||
.. autoclass:: city_objects_extractor.OtherGeoObjects | ||
:members: | ||
:undoc-members: | ||
|
||
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.get_and_process_osm_data | ||
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.run_osm_dfs | ||
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.calculate_centroid | ||
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.extract_geo_obj | ||
.. automethod:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects.restoration_of_normal_form |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
.. _street_extractor: | ||
|
||
StreetExtractor | ||
================== | ||
|
||
.. autoclass:: sloyka.src.geocoder.street_extractor.StreetExtractor | ||
:members: | ||
:undoc-members: process_pipeline |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
.. _Streets: | ||
|
||
Streets | ||
========================== | ||
A class for working with street data. | ||
.. autoclass:: sloyka.src.utils.data_getter.Streets | ||
:members: | ||
:undoc-members: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
.. _vkparser: | ||
|
||
VKparser | ||
========================== | ||
.. autoclass:: sloyka.src.utils.data_getter.VKParser | ||
:members: | ||
:undoc-members: |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
.. _city_services: | ||
|
||
Services extraction | ||
========================== | ||
The City_services class is designed to extract city service names from text using a string comparison algorithm, taking into account | ||
the changing service endings in the text. Using the flair library, the City_services.run() method in messages extracts named entities | ||
from the Sentence object as a list, as well as the most probable service type, and stores them in new columns of the original DataFrame(). | ||
.. automodule:: sloyka.src.utils.data_processing.city_services_extract | ||
:members: | ||
:undoc-members: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
.. _data_getter: | ||
|
||
Data getter | ||
========================== | ||
|
||
This module contains classes for retrieving and working with various types of data. | ||
Sloyka uses class :ref:`vkparser` data from the social network VK (Vkontakte) , as well as data from the OSM retrieved using | ||
:ref:`GeoDataGetter` | ||
|
||
@class:: This class is used to retrieve geospatial data from OpenStreetMap (OSM) based on given OSM ID and tags. | ||
|
||
.. _GeoDataGetter: | ||
A class for parsing and working with VK comments and posts. Combines posts and comments into one dataframe. | ||
|
||
|
||
@class:Streets: A class for working with street data. | ||
|
||
|
||
more: | ||
------------------------------------- | ||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Advanced geocoding | ||
|
||
VKParser | ||
Streets | ||
GeoDataGetter |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
.. _emotion_classifier: | ||
|
||
Emotion classifier | ||
================== | ||
|
||
.. automodule:: sloyka.src.risks.emotion_classifier | ||
:members: | ||
:undoc-members: | ||
|
||
|
||
Example | ||
------- | ||
.. code-block:: bash | ||
df = pd.read_csv('data.csv') | ||
recognizer = EmotionRecognizer() | ||
df['emotion'] = df['text'].apply(recognizer.recognize_emotion) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
.. _modules: | ||
|
||
Event detector | ||
================== | ||
|
||
.. automodule:: sloyka.src.risks.event_detector | ||
:members: | ||
:undoc-members: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
.. _geocoder: | ||
================== | ||
Geocoding | ||
================== | ||
To perform classification with generalized linear models, see | ||
:ref:`Geocoder_special`. | ||
|
||
Geocoder | ||
------------------- | ||
.. autoclass:: sloyka.src.geocoder.geocoder.Geocoder | ||
:members: run | ||
|
||
OtherGeoObjects | ||
--------------------- | ||
.. autoclass:: sloyka.src.geocoder.city_objects_extractor.OtherGeoObjects | ||
:members: run | ||
|
||
StreetExtractor | ||
--------------------- | ||
.. autoclass:: sloyka.src.geocoder.street_extractor.StreetExtractor | ||
:members: process_pipeline | ||
|
||
more: | ||
------------------------------------- | ||
.. toctree:: | ||
:maxdepth: 1 | ||
:caption: Advanced geocoding | ||
|
||
Geocoder_special | ||
OtherGeoObjects | ||
StreetExtractor |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
.. _regional_activity: | ||
|
||
Regional activity | ||
================== | ||
The regional_activity module is designed to aggregate data by region and provide information about user activity. | ||
The RegionalActivity class creates a GeoDataFrame with basic information about user activity, using other modules such as geocoder, | ||
text classifier, city_services_extract and emotion_classifier to process the data. The processed data is stored in the class attribute | ||
processed_geodata and can be called after the class is initialized with RegionalActivity.processed_geodata. The class includes the get_risks() | ||
function, which returns a DataFrame with social risk information based on the provided texts. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
.. _sem_graph: | ||
|
||
Semantic graph | ||
------------------------------------------------- | ||
|
||
.. automodule:: sloyka.src.semantic_graph.semantic_graph_builder | ||
:members: | ||
:undoc-members: | ||
|
||
As a result of the main method Semgraph.build_graph(), the input set of messages is cleaned from duplicates, digits, identified place names | ||
and references. For each message, a given number of keywords is extracted using the KeyBERT library model; thanks to the application of pytorch, | ||
the semantic proximity between keywords is determined as the cosine distance in the resulting embeddings. The final result of the module is a graph, | ||
the nodes of which are toponyms (obtained by the geolocation module) and keywords. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
.. _text_classifier: | ||
|
||
Text Classifier | ||
========================== | ||
The text_classifiers module is designed to classify texts by city functions, such as housing and utilities, public amenities, transportation, | ||
health care, and others, using a pre-trained BERT family model in Russian. The module processes the input text and classifies it into specific urban functions using a | ||
pre-trained rubert-tiny2 model trained on 90,000 marked accesses. The main method, run_text_classifier(), calls the model, takes text as input, and returns up to three predicted | ||
city functions with their probability of being correctly identified. | ||
|
||
|
||
.. automodule:: sloyka.src.risks.text_classifier | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |