diff --git a/README.md b/README.md index 2b32e9f..c1fd674 100644 --- a/README.md +++ b/README.md @@ -1,20 +1,18 @@ A Search Tool for (parallel) Universal Dependencies treebanks that runs [in your browser](https://demo.spraakbanken.gu.se/stund). -![STUnD GUI](img/replacement2.png) +![STUnD GUI](docs/img/replacement2.png) +While STUnD can also be used on single dependency treebanks, its most unique feature is that it allows running parallel queries on sentence-aligned UD treebanks by combining [UD-based subtree alignment](https://github.com/harisont/concept-alignment) with [UD tree pattern matching](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md). +## Learn more - [live demo](https://demo.spraakbanken.gu.se/stund) -- [tutorial](tutorial.md) -- [installation](installation.md) -- [source code](https://github.com/harisont/STUnD) - -While STUnD can also be used on single dependency treebanks, its most unique feature is that it allows running parallel queries on sentence-aligned UD treebanks by combining [UD-based subtree alignment](https://github.com/harisont/concept-alignment) with [UD tree pattern matching](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md). +- [docs]([tutorial.md](https://harisont.github.io/STUnD/)) +## Citing STUnD is a Haskell+JavaScript web application built by Herbert Lange and Arianna Masciolini based on an initial prototype by Arianna Masciolini. -## Citing If you use this tool in your research, you are welcome to cite -> [Arianna Masciolini and Márton A Tóth. _STUnD: ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker._ In Proceedings of the Huminfra Conference, pages 95–109, Gothenburg, Sweden, 2024](https://doi.org/10.3384/ecp205013) ([bibtex](stund.bib)). +> [Arianna Masciolini and Márton A Tóth. _STUnD: ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker._ In Proceedings of the Huminfra Conference, pages 95–109, Gothenburg, Sweden, 2024](https://doi.org/10.3384/ecp205013) ([bibtex](docs/stund.bib)). A more extensive and up-to-date publication in English is currently in preparation. \ No newline at end of file diff --git a/_config.yaml b/docs/_config.yaml similarity index 76% rename from _config.yaml rename to docs/_config.yaml index 0add056..f4cd65a 100644 --- a/_config.yaml +++ b/docs/_config.yaml @@ -13,17 +13,15 @@ header_pages: show_excerpts: false -remote_theme: jekyll/minima +remote_theme: "jekyll/minima@1e8a445" minima: skin: dark date_format: "%-d %B %Y" social_links: - { platform: github, user_url: https://github.com/harisont } - - { platform: stackoverflow, user_url: https://stackoverflow.com/users/7729724/harisont } - { platform: instagram, user_url: https://www.instagram.com/unottica/ } - { platform: youtube, user_url: https://www.youtube.com/c/ImparalHaskellemettilodaparte } - - { platform: rss, user_url: https://harisont.github.io/feed.xml} plugins: - jekyll-feed \ No newline at end of file diff --git a/_includes/custom-head.html b/docs/_includes/custom-head.html similarity index 100% rename from _includes/custom-head.html rename to docs/_includes/custom-head.html diff --git a/_includes/footer.html b/docs/_includes/footer.html similarity index 64% rename from _includes/footer.html rename to docs/_includes/footer.html index 48bae9a..0187b6f 100644 --- a/_includes/footer.html +++ b/docs/_includes/footer.html @@ -5,13 +5,6 @@ - - diff --git a/_includes/image.html b/docs/_includes/image.html similarity index 100% rename from _includes/image.html rename to docs/_includes/image.html diff --git a/_sass/minima/_base.scss b/docs/_sass/minima/_base.scss similarity index 100% rename from _sass/minima/_base.scss rename to docs/_sass/minima/_base.scss diff --git a/_sass/minima/_layout.scss b/docs/_sass/minima/_layout.scss similarity index 100% rename from _sass/minima/_layout.scss rename to docs/_sass/minima/_layout.scss diff --git a/_sass/minima/initialize.scss b/docs/_sass/minima/initialize.scss similarity index 100% rename from _sass/minima/initialize.scss rename to docs/_sass/minima/initialize.scss diff --git a/_sass/minima/skins/dark.scss b/docs/_sass/minima/skins/dark.scss similarity index 100% rename from _sass/minima/skins/dark.scss rename to docs/_sass/minima/skins/dark.scss diff --git a/android-chrome-192x192.png b/docs/android-chrome-192x192.png similarity index 100% rename from android-chrome-192x192.png rename to docs/android-chrome-192x192.png diff --git a/android-chrome-512x512.png b/docs/android-chrome-512x512.png similarity index 100% rename from android-chrome-512x512.png rename to docs/android-chrome-512x512.png diff --git a/apple-touch-icon.png b/docs/apple-touch-icon.png similarity index 100% rename from apple-touch-icon.png rename to docs/apple-touch-icon.png diff --git a/browserconfig.xml b/docs/browserconfig.xml similarity index 100% rename from browserconfig.xml rename to docs/browserconfig.xml diff --git a/deployment.md b/docs/deployment.md similarity index 100% rename from deployment.md rename to docs/deployment.md diff --git a/favicon-16x16.png b/docs/favicon-16x16.png similarity index 100% rename from favicon-16x16.png rename to docs/favicon-16x16.png diff --git a/favicon-32x32.png b/docs/favicon-32x32.png similarity index 100% rename from favicon-32x32.png rename to docs/favicon-32x32.png diff --git a/favicon.ico b/docs/favicon.ico similarity index 100% rename from favicon.ico rename to docs/favicon.ico diff --git a/docs/img/bilingual_query.png b/docs/img/bilingual_query.png new file mode 100644 index 0000000..42f3ab2 Binary files /dev/null and b/docs/img/bilingual_query.png differ diff --git a/docs/img/conllu.png b/docs/img/conllu.png new file mode 100644 index 0000000..ea1e98c Binary files /dev/null and b/docs/img/conllu.png differ diff --git a/docs/img/diff_tree.png b/docs/img/diff_tree.png new file mode 100644 index 0000000..a99d4c6 Binary files /dev/null and b/docs/img/diff_tree.png differ diff --git a/docs/img/format_error.png b/docs/img/format_error.png new file mode 100644 index 0000000..9299a37 Binary files /dev/null and b/docs/img/format_error.png differ diff --git a/docs/img/null_search.png b/docs/img/null_search.png new file mode 100644 index 0000000..9d61ccd Binary files /dev/null and b/docs/img/null_search.png differ diff --git a/docs/img/presens_perfekt.png b/docs/img/presens_perfekt.png new file mode 100644 index 0000000..3a3266c Binary files /dev/null and b/docs/img/presens_perfekt.png differ diff --git a/docs/img/replacement1.png b/docs/img/replacement1.png new file mode 100644 index 0000000..87e655e Binary files /dev/null and b/docs/img/replacement1.png differ diff --git a/docs/img/replacement2.png b/docs/img/replacement2.png new file mode 100644 index 0000000..9837ec4 Binary files /dev/null and b/docs/img/replacement2.png differ diff --git a/docs/img/single.png b/docs/img/single.png new file mode 100644 index 0000000..41f73f8 Binary files /dev/null and b/docs/img/single.png differ diff --git a/docs/img/start.png b/docs/img/start.png new file mode 100644 index 0000000..decff17 Binary files /dev/null and b/docs/img/start.png differ diff --git a/docs/img/tree.png b/docs/img/tree.png new file mode 100644 index 0000000..57dd6dc Binary files /dev/null and b/docs/img/tree.png differ diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..2b32e9f --- /dev/null +++ b/docs/index.md @@ -0,0 +1,20 @@ +A Search Tool for (parallel) Universal Dependencies treebanks that runs [in your browser](https://demo.spraakbanken.gu.se/stund). + +![STUnD GUI](img/replacement2.png) + + +- [live demo](https://demo.spraakbanken.gu.se/stund) +- [tutorial](tutorial.md) +- [installation](installation.md) +- [source code](https://github.com/harisont/STUnD) + +While STUnD can also be used on single dependency treebanks, its most unique feature is that it allows running parallel queries on sentence-aligned UD treebanks by combining [UD-based subtree alignment](https://github.com/harisont/concept-alignment) with [UD tree pattern matching](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md). + +STUnD is a Haskell+JavaScript web application built by Herbert Lange and Arianna Masciolini based on an initial prototype by Arianna Masciolini. + +## Citing +If you use this tool in your research, you are welcome to cite + +> [Arianna Masciolini and Márton A Tóth. _STUnD: ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker._ In Proceedings of the Huminfra Conference, pages 95–109, Gothenburg, Sweden, 2024](https://doi.org/10.3384/ecp205013) ([bibtex](stund.bib)). + +A more extensive and up-to-date publication in English is currently in preparation. \ No newline at end of file diff --git a/installation.md b/docs/installation.md similarity index 63% rename from installation.md rename to docs/installation.md index 35a9c8c..11f1f66 100644 --- a/installation.md +++ b/docs/installation.md @@ -3,13 +3,13 @@ title: Installation layout: base --- -# Installation +# Local installation -To compile and run STUnD directly on your computer, you can use either [the Haskell Tool Stack](https://docs.haskellstack.org/en/stable/) or build and run it inside a [Docker](https://www.docker.com/) container. +To compile and run STUnD on your computer, you can use either [the Haskell Tool Stack](https://docs.haskellstack.org/en/stable/) or build and run it inside a [Docker](https://www.docker.com/) container. In either case, start by downloading the [source code](https://github.com/harisont/STUnD). -## Installation via Stack +## Installing STUnD via Stack If you have Stack, run ``` @@ -33,4 +33,9 @@ If you want to use Docker containers, the simplest way is to use `docker compose docker compose up stund-gui ``` -This will take a while for the first time because the image has to be built. Afterwards, running the container can be started directly with the same command. \ No newline at end of file +This will take a while for the first time because the image has to be built. Afterwards, running the container can be started directly with the same command. + +--- + +For troubleshooting Windows installations, see [here](win.md). +For details about deployment at SBX, see [here](deployment.md). \ No newline at end of file diff --git a/mstile-150x150.png b/docs/mstile-150x150.png similarity index 100% rename from mstile-150x150.png rename to docs/mstile-150x150.png diff --git a/pub.md b/docs/pub.md similarity index 60% rename from pub.md rename to docs/pub.md index d0cbbc2..c52cd6b 100644 --- a/pub.md +++ b/docs/pub.md @@ -3,8 +3,8 @@ title: Publications layout: base --- -# Publications using STUnD +# Publications -1. [Arianna Masciolini and Márton A Tóth. _STUnD: ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker._ In Proceedings of the Huminfra Conference, pages 95–109, Gothenburg, Sweden, 2024](https://doi.org/10.3384/ecp205013) +1. [Arianna Masciolini and Márton A Tóth. _STUnD: ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker._ In Proceedings of the Huminfra Conference, pages 95–109, Gothenburg, Sweden, 2024](https://doi.org/10.3384/ecp205013) ([bibtex](stund.bib)): this paper describes STUnD's first prototype and exemplifies its usage through a small case study on a bilingual treebank A more extensive and up-to-date publication in English is currently in preparation. diff --git a/safari-pinned-tab.svg b/docs/safari-pinned-tab.svg similarity index 100% rename from safari-pinned-tab.svg rename to docs/safari-pinned-tab.svg diff --git a/site.webmanifest b/docs/site.webmanifest similarity index 100% rename from site.webmanifest rename to docs/site.webmanifest diff --git a/stund.bib b/docs/stund.bib similarity index 100% rename from stund.bib rename to docs/stund.bib diff --git a/docs/tutorial.md b/docs/tutorial.md new file mode 100644 index 0000000..4c439fe --- /dev/null +++ b/docs/tutorial.md @@ -0,0 +1,140 @@ +--- +title: Tutorial +layout: base +--- + +# Getting started with STUnD + +## Specifying the input files +![start screen](img/start.png) + +The "Browse..." buttons are used to specify one or two (parallel) input files, which have to be in either __strict [CoNNL-U format](https://universaldependencies.org/format.html)__ or __"horizontal" plain text__ (one sentence per line). + +Plain text files are parsed with the [UDPipe 2 API](https://lindat.mff.cuni.cz/services/udpipe/api-reference.php), using the default model for the language they are written in. +The language can be made explicit as a two-letter code followed by an underscore in the beginning of the name of the file (for example, an input file called `sv_rawtext.txt` would be assumed to be in Swedish). +If this is not the case, the language is automatically inferred. + +If you only specify the input file(s), leaving the other fields blank, clicking "Search" will run a default query that retrieves all sentences in the treebank(s): + +![null search](img/null_search.png) + +## Running a query +Queries are specified in the first text input field: + +![query for presens perfekt](img/presens_perfekt.png) + +(note that double clicking on it will show the query history). + +### Monolingual queries +The example query in the picture is + +```haskell +TREE_ (FEATS_ "VerbForm=Sup") [AND [LEMMA "ha", FEATS_ "Tense=Pres"]] +``` + +This is a _simple_ or _monolingual_ query, looking for present perfect constructions in the Swedish treebank. +It reads as + +> Look for (sub)trees (`TREE_`) where the root is a supinum (`(FEATS_ "VerbForm=Sup")`), one of whose direct dependents is the present of the verb "ha" (`AND [LEMMA "ha", FEATS_ "Tense=Pres"]`). + +Now only the subtrees matching the query (often full trees in this case) are highlighted in bold (cf. last row). + +With some knowledge of Swedish, this particular query can be rewritten more concisely as + +```haskell +TREE_ (FEATS_ "VerbForm=Sup") [FORM "har"] +``` + +It is then very easy to modify the query for other structurally similar tenses: + +- `TREE_ (FEATS_ "VerbForm=Sup") [FORM "hade"]` (pluperfect) +- `TREE_ (FEATS_ "VerbForm=Sup") [FORM "ha"]` (perfect infinitive) + +Compared to the original "present perfect" query, these make it easier to see how queries work: first, the program tries to align the two treebanks to identify semantically equivalent subtrees; then the query is run on the left (Swedish) treebank (T1) and matching subtrees are returned alongside their English counterpart (T2). + +__Unlike query matching, the alignment step is not guaranteed to find _all_ correspondences__: in the picture above, you can see that sometimes a match is found in the Swedish treebank but nothing is highlighted in the corresponding English sentence (cf. row 4). +Some other times, a correspondence is found but is incorrect. +These two outcomes can be caused both by annotation errors and limitations of the alignment algorithm, which relies on a set of syntax-based heuristics. + +Of course, monolingual queries can also be run on single treebanks: + +![monolingual query on single treebank](img/single.png) + +### Parallel queries +Queries can also be _parallel_ or _bilingual_. For instance, we can use the following pattern to search for sentences where a Swedish present perfect corresponds to a passive present tense in English: + +```haskell +TREE_ (FEATS_ "VerbForm={Sup->Part}") [AND [LEMMA "{ha->be}", FEATS_ "Tense=Pres"]] +``` + +This produces the following results: + +![bilingual query](img/bilingual_query.png) + +Note that the second hit here is a false positive, due to the fact that "are" in the clause "there are already..." is also a direct dependent of the main lexical verb "dropped". +This is unfortunate, but difficult to avoid given how conjuncts are treated in UD. + +The basic query language ("UD patterns") is described [here](https://github.com/harisont/deptreehs/blob/main/docs/patterns.md), while its extended version for parallel (bilingual) queries (`{X -> Y}` syntax) is documented [here](https://github.com/harisont/L2-UD#l1-l2-patterns). + +## Refining the search results +The second input field can be used to specify a _replacement rule_ to be applied to all matching subtrees in both languages. +This can help highlight and manipulate the relevant parts of each query result them. + +Understanding replacement rules, which are described [alongside the basic query language](https://github.com/harisont/deptreehs/blob/main/docs/patterns.md), can be slightly more challenging. + +As a first example, + +```haskell +PRUNE (UPOS "VERB") 0 +``` + +decreases the depth of trees rooted in a verb to 0, eliminating all dependents: + +![drastic pruning](img/replacement1.png) + +The more complex pattern + +```haskell +CHANGES [FILTER_SUBTREES TRUE (OR [DEPREL_ "aux", DEPREL_ "cop"]), PRUNE TRUE 1] +``` + +uses dependency labels to isolate verb constructions of maximum depth 1, thus producing, in conjunction with the first query, the following output: + +![replacement rule](img/replacement2.png) + +## CoNNL-U and tree mode +So far, we have discussed how to use STUnD in plain text mode. +Switching to CoNNL-U mode allows inspecting the CoNNL-U (sub)trees __corresponding to bold text in the default text mode__: + +![CoNNL-U mode](img/conllu.png) + +Tree mode renders them as SVG trees: + +![Tree mode](img/tree.png) + +## Saving the search results +Query results can be saved as TSV, CoNNL-U and HTML-embedded SVG trees. +The output format depends on the mode in which STUnD is used: in the example above, for instance, results would be saved in SVG. + +Results obtained on parallel treebanks can be saved as two separate files, one per treebank, by clicking on "T1 file" and "T2 file" respectively, or as a single file by choosing "parallel file": + +- in text mode, the T1 and T2 files are horizontal text files, while the parallel file is TSV. This makes it easy to import search results in any spreadsheet program +- in CoNNL-U mode, the output is always a new CoNLL-U treebank that can be used, for instance, as input for more refined queries in StUnD, or simply imported into [another CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) for further inspection. In parallel files, sentences from the two input treebanks are interleaved +- similarly, parallel files in tree mode alternate T1 with T2 trees + +## Other functionalities + +### Validation +STUnD performs some basic (i.e. __less strict than the [official UD validator](github.com/UniversalDependencies/tools/blob/master/validate.py)__) validation of the input data, checking that all word lines contain 10 tab-separated fields, as well as that all token IDs, UPOS tags and DEPRELs are valid as per the [UD universal annotation guidelines](https://universaldependencies.org/guidelines.html). +If your treebank contains one or more format errors, these are listed in the user interface: + +![format errors](img/format_error.png) + +### Diff mode +Diff mode, activated by checking the "diff" box, helps you identify discrepancies between two similar treebanks. +This feature can be useful when exploring parallel (error-correction) learner treebanks, as well as when comparing alternative analyses of the same text, as in the following example: + +![format errors](img/diff_tree.png) + +### Manual editing +On top of replacement patterns, which can be used to apply systematic changes to the input treebank(s), basic manual editing functionalities are available in CoNLL-U mode. \ No newline at end of file diff --git a/win.md b/docs/win.md similarity index 93% rename from win.md rename to docs/win.md index dbec657..7ceb79b 100644 --- a/win.md +++ b/docs/win.md @@ -1,4 +1,4 @@ -# Compiling on Windows +# Known issues (Windows) It appears that `curl`, __but not `curllib`__, is installed by default on Windows10+. Therefore it is necessary to: 1. Download and extract `curl` binaries for windows(https://curl.se/windows/) diff --git a/img/bilingual_query.png b/img/bilingual_query.png deleted file mode 100644 index cff58d3..0000000 Binary files a/img/bilingual_query.png and /dev/null differ diff --git a/img/conllu.png b/img/conllu.png deleted file mode 100644 index e4a61b4..0000000 Binary files a/img/conllu.png and /dev/null differ diff --git a/img/download.png b/img/download.png deleted file mode 100644 index 755edea..0000000 Binary files a/img/download.png and /dev/null differ diff --git a/img/invalid_path.png b/img/invalid_path.png deleted file mode 100644 index a5561bd..0000000 Binary files a/img/invalid_path.png and /dev/null differ diff --git a/img/it_gender.png b/img/it_gender.png deleted file mode 100644 index c4abf6c..0000000 Binary files a/img/it_gender.png and /dev/null differ diff --git a/img/null_search.png b/img/null_search.png deleted file mode 100644 index 1703d99..0000000 Binary files a/img/null_search.png and /dev/null differ diff --git a/img/papers_please.png b/img/papers_please.png deleted file mode 100644 index b5ae53c..0000000 Binary files a/img/papers_please.png and /dev/null differ diff --git a/img/perf_inf.png b/img/perf_inf.png deleted file mode 100644 index 2624bcb..0000000 Binary files a/img/perf_inf.png and /dev/null differ diff --git a/img/presens_perfekt.png b/img/presens_perfekt.png deleted file mode 100644 index 6ed2fc5..0000000 Binary files a/img/presens_perfekt.png and /dev/null differ diff --git a/img/replacement1.png b/img/replacement1.png deleted file mode 100644 index 9038419..0000000 Binary files a/img/replacement1.png and /dev/null differ diff --git a/img/replacement2.png b/img/replacement2.png deleted file mode 100644 index 4861df2..0000000 Binary files a/img/replacement2.png and /dev/null differ diff --git a/img/saving.png b/img/saving.png deleted file mode 100644 index d52bec3..0000000 Binary files a/img/saving.png and /dev/null differ diff --git a/img/single.png b/img/single.png deleted file mode 100644 index d545992..0000000 Binary files a/img/single.png and /dev/null differ diff --git a/img/start.png b/img/start.png deleted file mode 100644 index be70325..0000000 Binary files a/img/start.png and /dev/null differ diff --git a/img/tree.png b/img/tree.png deleted file mode 100644 index bdeb059..0000000 Binary files a/img/tree.png and /dev/null differ diff --git a/static/langid b/static/langid deleted file mode 160000 index 92b670e..0000000 --- a/static/langid +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 92b670ec3bed2aae5a6ba4c6321a169493dd7033 diff --git a/stund.desktop b/stund.desktop deleted file mode 100644 index fd3c1d5..0000000 --- a/stund.desktop +++ /dev/null @@ -1,9 +0,0 @@ -[Desktop Entry] -Version=1.0 -Name=STUnD -Comment=A GUI Search Tool for (bilingual) parallel UD treebanks -Exec=bash -c 'stund-gui & sleep 1 && xdg-open http://127.0.0.1:8023;$SHELL' -Icon=utilities-terminal -Terminal=false -Type=Application -Categories=Application; \ No newline at end of file diff --git a/stund.ps1 b/stund.ps1 deleted file mode 100644 index 6d32fa2..0000000 --- a/stund.ps1 +++ /dev/null @@ -1 +0,0 @@ -start stund-gui; Start-Process http://127.0.0.1:8023 \ No newline at end of file diff --git a/tutorial.md b/tutorial.md deleted file mode 100644 index f193226..0000000 --- a/tutorial.md +++ /dev/null @@ -1,139 +0,0 @@ ---- -title: Tutorial -layout: base ---- - -# Getting started with STUnD - -## Specifying input files -![start screen](img/start.png) - -The "Browse..." buttons are used to specify one or two (parallel) input files, which have to be in strict [CoNNL-U format](https://universaldependencies.org/format.html). - -If you only specify the input file(s), leaving the other fields blank, clicking "search" will run a default query that returns the full treebank: - -![null search](img/null_search.png) - -## Running a query -Queries are specified in first text input field: - -![query for presens perfekt](img/presens_perfekt.png) - -(note that double clicking on it will show the query history). - -### Monolingual queries -The example query in the picture is - -```haskell -TREE_ (FEATS_ "VerbForm=Sup") [AND [LEMMA "ha", FEATS_ "Tense=Pres"]] -``` - -This is a _simple_ or _monolingual_ query, looking for present perfect constructions in the Swedish treebank. -It reads as - -> Look for (sub)trees (`TREE_`) where the root is a supinum (`(FEATS_ "VerbForm=Sup")`) and one of its direct dependents is the present of the verb "ha" (`AND [LEMMA "ha", FEATS_ "Tense=Pres"]`). - -Now only the subtrees matching the query (often full trees in this case) are highlighted in bold (cf. last row). - -With some knowledge of Swedish, this particular query can be rewritten more concisely as - -```haskell -TREE_ (FEATS_ "VerbForm=Sup") [FORM "har"] -``` - -It is then very easy to modify the query for other structurally similar tenses: - -- `TREE_ (FEATS_ "VerbForm=Sup") [FORM "hade"]` (pluperfect) -- `TREE_ (FEATS_ "VerbForm=Sup") [FORM "ha"]` (perfect infinitive) - -Compared to the original "present perfect" query, these make it easier to see how queries work: first, the program tries to align the two treebanks to identify semantically equivalent subtrees; then the query is run on the left (Swedish) treebank and matching subtrees are returned alongside their English counterpart. -For this reason, it can find correspondences such as "att ha sagt"-"as saying" (row 4), even though the English construction is not at all similar to the Swedish one. -Unlike query matching, the alignment step is not guaranteed to find all correspondences: in the picture below, you can see that sometimes a match is found in the Swedish treebank but nothing is highlighted in the corresponding English sentence (this is the case, for instance, in the last few rows). Some other times, a correspondence is found but is incorrect. - -![perfect infinitive](img/perf_inf.png) - -Of course, monolingual queries can also be run on single treebanks: - -![monolingual query on single treebank](img/single.png) - -### Parallel queries -Queries can also be _parallel_ or _bilingual_. For instance, we can use the following pattern to serach for sentences where a Swedish present perfect corresponds to a passive present tense in English: - -```haskell -TREE_ (FEATS_ "VerbForm={Sup->Part}") [AND [LEMMA "{ha->be}", FEATS_ "Tense=Pres"]] -``` - -This produces the following results: - -![bilingual query](img/bilingual_query.png) - -Note that the second hit here is a false positive, due to the fact that "are" in the clause "there are already..." is also a direct dependent of the main lexical verb "dropped". -This is unfortunate, but difficult to avoid given how conjuncts are treated in UD. - -The basic query language ("UD patterns") is described [here](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md), while the extended version for parallel (bilingual) queries (`{X -> Y}` syntax) is documented [here](https://github.com/harisont/L2-UD#l1-l2-patterns). - -## Adding a replacement rule -The last input field can be used to specify a _replacement rule_ to be applied to all matching subtrees in both languages. -This can help highlighting the relevant parts of each query result and manipulate them. - -Understanding replacement rules, which are described [alongside the basic query language](https://github.com/harisont/deptreehs/blob/main/pattern_matching_and_replacement.md), can be slightly more challenging. - -As a first example, - -```haskell -PRUNE (UPOS "VERB") 0 -``` - -decreases the depth of trees rooted in a verb to 0, eliminating all dependents: - -![drastic pruning](img/replacement1.png) - -The more complex pattern - -```haskell -CHANGES [FILTER_SUBTREES TRUE (OR [DEPREL_ "aux", DEPREL_ "cop"]), PRUNE TRUE 1] -``` - -uses dependency labels to isolate verb constructions of maximum depth 1, thus producing, in conjunction with the first query, the following output: - -![replacement rule](img/replacement2.png) - -## CoNNL-U and tree mode -So far, we have seen how to use STUnD in plain text mode. -Switching to CoNNL-U mode allows inspecting the CoNNL-U (sub)trees __corresponding to bold text in the default text mode__: - -![CoNNL-U mode](img/conllu.png) - -Tree mode renders them as SVG trees: - -![Tree mode](img/tree.png) - -## Saving the search results -Query results can be saved as plain text/TSV, CoNNL-U and HTML-embedded SVG trees. -The output format depends on the mode (in the example below, for instance, results would be saved as graphical trees). - -![download links for the search results](img/download.png) - -If two treebanks are being compared, results can be saved as two separate files, one per treebank, or as a single "parallel" file: - -- in text mode, the T1 and T2 files are simple text (one sentence per line), while the parallel file is tab-separated. This makes it easy to import search results in any spreadsheet program -- in CoNNL-U mode, the output is always a new treebank, treebank that can, for instance, be used as input for more refined queries in StUnD, or simply imported into [a CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) for further inspection.[^1] In the parallel file, sentences from the two input treebanks are alternated -- similarly, parallel files in tree mode consist in alternating T1-T2 trees - -## Other use cases -So far, we have shown how to use STUnD on multilingual treebanks. -Many of the tool's functionalities, however, are also relevant in other scenarios, such as comparing learner sentences with corrections: - -![example of L1-L2 query on VALICO](img/it_gender.png) - -In the image above, you can see STUnD in action on the [VALICO treebank of L2 Italian](https://github.com/UniversalDependencies/UD_Italian-Valico), looking for feminine nouns incorrectly inflected as masculine. - -By checking "highlight discrepancies", in addition, the tool will highlight all sentence pairs matching the query that present any difference: - -![highlight discrepancies](img/papers_please.png) - -In the case of a learner corpus, "discrepant" means "erroneous", but highlighting discrepancies can also be useful in other settings, such as when comparing different analyses of the same text to resolve disagreement in a linguistic annotation project. -This functionality is, however, still very rudimentary. -In the future, the plan is to refine it to only highlight sentences where the discrepancy occurs in the subtree matching the query. - -[^1]: technical note: this works because all extracted subtrees are adjusted so that they have a root node and valid (sequential) IDs.