Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from upstream repo github/linguist #2

Open
wants to merge 1,545 commits into
base: master
Choose a base branch
from

Conversation

backstroke-bot
Copy link

Hello!
The remote github/linguist has some new changes that aren't in this fork.

So, here they are, ready to be merged! 🎉

If this pull request can be merged without conflict, you can publish your software
with these new changes. Otherwise, if you have merge conflicts, this
is the place to fix them.

Have fun!


Created by Backstroke. Oh yea, I'm a bot.

@ghost
Copy link

ghost commented Oct 3, 2016

Hello!
The remote github/linguist has some new changes that aren't in this fork.

root = true

[*]
charset = utf-8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所得到的多

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi I'd like to pull request

xiaq and others added 9 commits September 2, 2022 11:53
* Add Gemini language

* Remove .gemini extension

Co-authored-by: printfn <[email protected]>
Co-authored-by: Colin Seymour <[email protected]>
* Add generic .tag to JSP

* Update lib/linguist/heuristics.yml

Co-authored-by: John Gardner <[email protected]>

Co-authored-by: John Gardner <[email protected]>
* Update all grammars

* Update all cached licenses

* Version 7.23.0

* Update all grammars

* Update cached licenses
This PR does not introduce any changes to installed packages, but it does change the filesystem by deleting the package caches.

- The `linux-headers` package is found in the base image, such that adding it is not required
- The removal of `build-base`, `libc-dev`, and `cmake` are handled by `apk del build_deps`, the virtual package we created. `linux-headers` can not be deleted, as it is required by `libffi-dev` in the base image.
- Removing `--no-cache` on the second `apk add` because the package caches were already downloaded and retained in the first `apk add`, so there's no need to fetch them again.
- Removing the caches from `/var/cache/apk/*` saves some KBees 🐝
Add Brewfile for bootstrapping deps on Mac

Co-authored-by: Colin Seymour <[email protected]>
* Repoint razor-plus grammar submodule at new repo

* Run list-grammars and update README

* Move razor-plus project into github-linguist org

* Update grammar README for last change
* Update linguist CLI to analyze specific revisions
* Update README to document new --rev option
Paranoid46 pushed a commit to Paranoid46/linguist that referenced this pull request Oct 6, 2022
Update from upstream repo github/linguist
Alhadis and others added 17 commits October 19, 2022 15:31
* Feat/cypher (#3)

* add cypher grammar
* add samples
* add missing file extension sample
* upd ordering
* add extra sample
* making license type precise

* remove license update mistake

* more samples (#4)

* Chore/add more samples (#5)

* more samples

* trim examples

* Chore/add more samples (#6)

* trim examples

* Delete graph_alg.cql

* remove cql (#7)

* Update languages.yml

* remove cql

* typo

* new examples

Co-authored-by: benf <benf@local>

Co-authored-by: benf <benf@local>
adding support for .jsh extension

Co-authored-by: Colin Seymour <[email protected]>
* Add SDC and XDC to TCL language

* Add better constraint samples

* Add aliases

Co-authored-by: Colin Seymour <[email protected]>
Co-authored-by: Colin Seymour <[email protected]>
* Add PDDL

* Update lib/linguist/languages.yml

Co-authored-by: Colin Seymour <[email protected]>

* remove large examples

* Add PDDL  to README

Co-authored-by: Colin Seymour <[email protected]>
starlark: support recognition of WORKSPACE.bazel

Both WORKSPACE and WORKSPACE.bazel are valid names for the WORKSPACE
file. The latter takes precedent, even though it is an alias because
other projects may have a similarly named file.

Alternatively, *.bazel could be added to extensions, but .bzl is
the recommended extension.

This adds a WORKSPACE.bazel file from
https://github.com/google/skia/blob/main/WORKSPACE.bazel
* Add language: Just

* Submodules: Update

* Update Justfile

* Rename to justfile

* Add license snapshot

Source: https://github.com/skellock/vscode-just/commit/e781b35a3ca38d8a3c4a0650f6982b5712b23406\#diff-c693279643b8cd5d248172d9c22cb7cf4ed163a3c98c8a3f69c2717edd3eacb7

* Update lib/linguist/languages.yml

Co-authored-by: Casey Rodarmor <[email protected]>

* Update grammars.yml

Co-authored-by: Casey Rodarmor <[email protected]>

* Fix just language id

* Update license

* Remove extensions

* Rerun ./script/list-grammars

* Apply suggestions from code review

* Update grammars.yml

* Samples/Justfiles -> samples/just

* Rerun tests

* Fix order

Co-authored-by: Casey Rodarmor <[email protected]>
Co-authored-by: Colin Seymour <[email protected]>
* Add OASv2 and OASv3 languages

* Add test fixtures for OASv2 and v3

Co-authored-by: Colin Seymour <[email protected]>
* Add Language: Imba

* Update lib/linguist/languages.yml

Co-authored-by: Colin Seymour <[email protected]>

Co-authored-by: Colin Seymour <[email protected]>
* Add Scenic language

* Update Scenic grammar

* forgot to update metadata

* Update Scenic grammar
* Add VB6

Adding the VB6 language and removing it as an alias of VBA.

* Remove .vb6 extension

No samples for .vb6 found on GitHub

* Add samples

* Update ids

* Change language name and adjust aliases

In response to requested change: #6124 (comment)

* Change .dsr to .Dsr

* Add addtionnal sample for .Dsr

* Change folder name

* Fix order

* Add missing VB6 line

Co-authored-by: Colin Seymour <[email protected]>
* Generate samples during bootstrap

* This isn't needed as the rake does it
RitikShah and others added 19 commits August 29, 2024 14:21
replace `language-mcfunction` -> `syntax-mcfunction`
* Change Cairo grammar repo to software-mansion-labs/cairo-tm-grammar

Signed-off-by: Marek Kaput <[email protected]>

* Rename `Cairo` to `Cairo 0` to reflect official language name change

Signed-off-by: Marek Kaput <[email protected]>

* Add Cairo language and heuristics to disambiguate it from Cairo 0

Signed-off-by: Marek Kaput <[email protected]>

* Add CASM samples to help classifier identify them as Cairo 0

Signed-off-by: Marek Kaput <[email protected]>

* Add more samples for Cairo and Cairo 0

* Remove Cairo 0 heuristics

This commit partially reverts a718fec

* Change language IDs for Cairo langs as asked in review, and group them

* Revert "Remove Cairo 0 heuristics"

This reverts commit 25dd32a.

* Add `ap++` sequence to Cairo 0 heuristic

* Assume Cairo if no Cairo 0 heuristic match

* Rename `Cairo 0` to `Cairo Zero`

This change has been suggested by the StarkWare Product Team,
so here it is.

---------

Signed-off-by: Marek Kaput <[email protected]>
* Update the references to the modern qsharp repository.

* Update yml files
* Add entry to language.yml and grammar

* Add samples

* Run script/update-ids

* Fix iCalendar's language.yml entry

* Add missing trailing newlines

* Update lib/linguist/languages.yml

Co-authored-by: John Gardner <[email protected]>

---------

Co-authored-by: John Gardner <[email protected]>
* Add vCard with sample

* Add sample + remove comment in yml

* Add vCard grammar

* Add id

* Edit aliases

* Add vcf to TSV + heuristics

* Add test
* Add language

* Add sample 1

https://github.com/AnywhereSoftware/B4X-Pleroma/blob/master/OAuth.bas

* Add sample 2

https://github.com/AnywhereSoftware/B4X-Pleroma/blob/master/RequestsManager.bas

* Language Id + Sample + Grammar

* Add heuristic

* Edit .bas heuristic test

* Edit heuristic

* Handle BOM issue with heuristic

* Limit search in the first 10 lines

* Simplify heuristic

* Simplify heuristic further

* Adjust heuristic

This commit moves the check for BOM at the start of the file and fixes a potential problem of compatibility with re2.
Note that `{3}?` in re2 is interpreted as matching the previous token exactly 3 times exactly while the Oniguruma engine interprets this as matching 3 or 0 times.

* Remove redundant `^`

* Use portable version
revise: updates the WDL language grammar
* Update heuristics.yml

FIx .yy heuristic to account for changes in property name in GMStudio 2.3

* Add sample

* Fix generated detection (WIP)

* Relax the constraint that the property has to be on the 3rd line

* Targeting JSON's heuristic directly

* Remove outdated comment
* chore: update grammar

* Revert "chore: update grammar"

This reverts commit c756098.

* Re-replace grammar

---------

Co-authored-by: Colin Seymour <[email protected]>
* Use match?

* Remove double-negation
* Add uv.lock to languages.yml as a TOML file

* Use a smaller sample file
* Add initial support for carbon

* Apply custom language ID to carbon

* Carbon classes

* example window creation

* Removing the .cb file extension for Carbon

* Puts V/Go syntax in Carbon syntax highlighting

* thanks for the fix

Co-authored-by: John Gardner <[email protected]>

* Carbon in vendor/README.md

---------

Co-authored-by: John Gardner <[email protected]>
* Add support for `HOSTS.TXT` files

* Update license hash
* Add `.peggy` for PEG.js

* Swap `semver` sample for `abnfp` for peggy
* Add extra aliases for vimscript

* Update lib/linguist/languages.yml

---------

Co-authored-by: Colin Seymour <[email protected]>
* New Centroid-based Classifier

Training:

* A fixed vocabulary is set to all tokens that appear in, at least, 2
  samples.
* All out-of-vocabulary tokens are discarded.
* For every token, we set its Inverse Class Frequency (ICF) to
`log(ct / cf) + 1` where `ct` is the total number of classes and `cf` is
the number of classes where the token occurs.
* Each sample is converted to a vector of `tf * icf` for every token in
the vocabulary. `tf` is `1 + log(freq)`, where `freq` is the
number of occurrences of the token in the given sample.
* Samples are L2-normalized.
* For each class (language), we compute the centroid of all its training
samples by averaging them and L2-normalizing the result.

Classification:

* For a new sample, we get the L2-normalized vector with `tf * icf`
terms for every known token, then classify the sample using the nearest
centroid. Cosine similarity is used as similarity measure for this.

* Fixture file is now detected as Raku

* Update lib/linguist/samples.rb

Co-authored-by: Colin Seymour <[email protected]>

* Update test/test_classifier.rb

Co-authored-by: Colin Seymour <[email protected]>

* Add exec bit

* Adjust acceptable errors

* Remove two useless samples

* Add a better R sample

* Remove fixmes

* Remove empty lines

---------

Co-authored-by: Colin Seymour <[email protected]>
Co-authored-by: Colin Seymour <[email protected]>
Co-authored-by: John Gardner <[email protected]>
* Add the "LiveCode Script" language.

* Add examples for the `*.lc` extension

* Removing the ".lc" extension and its samples

* Update vendor/licenses/git_submodule/vscode-livecodescript.dep.yml

---------

Co-authored-by: Colin Seymour <[email protected]>
* Switch PEG.js TM Scope to `source.peggy`

* Add missing license

* Re-gen grammar list

---------

Co-authored-by: Colin Seymour <[email protected]>
* Add Dune

* Remove dune-file which only has one use

* Merge all Dune entries into the same languages

Since they all share the same grammar, they should just be considered as
one language. The grammar used also only defines one source.dune scope.

* Reduce scope to just dune-project

- `dune` is only used by a bit over 100 repositories (5 pages), the
  1.8k in the search results isn't what we're counting here
- The two workspace files have even fewer uses
* add .resource extension to robot

* add resource file example

* docs

* add heuristics for RF resource files

* fix typo

* add robotframework keywords heuristic

---------

Co-authored-by: Colin Seymour <[email protected]>
Copy link

@Bambi66669 Bambi66669 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,3 @@
# Available versions: https://github.com/devcontainers/images/tree/main/src/ruby
FROM mcr.microsoft.com/devcontainers/ruby
RUN apt update && apt install -y cmake

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN apt update && apt install -y cmake
RUN apt update && apt install -y cmake

lildude and others added 3 commits September 2, 2024 09:23
* Update all grammars

* Update cached licenses

* v8.0.0

* Correct license type

* Update grammars

* Update cached licenses
* Update number of acceptable classification errors.

* Update number of acceptable errors when using --all
* Update Move grammar

* Update cached license

* v8.0.1
@lildude lildude deleted the branch octocat:master September 17, 2024 15:31
@lildude lildude deleted the master branch September 17, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.