-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test that license texts match SPDX plain license texts #636
Comments
The latest SPDX version changes the text of the MIT license slightly, compared to the version currently on choosealicense.com. Do you have plans for how you want to handle old and new versions of licenses that change over time? I use spdx-license-list when scaffolding new projects and the latest version updates its list to v3.4, which includes the change. Since updating to this version my new projects show unrecognized licenses, such as this one. |
@travi thanks for pointing that out. The change is the optional text added at spdx/license-list-XML@ca17b91#diff-a3960b442eb635386ec51a5d6d15af2d For better or worse SPDX doesn't AFAIK distinguish between optional but not usual and optional but preferred text, and outputs all optional text in https://github.com/spdx/license-list-data/blob/2d27e4c31441af8f343eba0293d03d27707d9c02/text/MIT.txt I don't think we can or should move to MIT including the optional text here. Can because that would cause license detection problems for most existing MIT licenses given the way licensee (which GitHub uses) is tied to texts curated here (choosealicense.com). Should because I'd rather encourage adoption of the most widely used text, which doesn't include the optional text added. There are tons of variations on MIT text, I've linked a paper about that a few times. I don't have a plan to implement, but here's what I'd like to see:
I would recommend not including the optional text now published by SPDX. For anyone who insists on doing that, yes, GitHub will identify that there is a license, but not what it is and show in "View license" rather than "MIT". Presently the only way for licensee to deal with optional text is to normalize it away before matching, but I think it would need to be a super common variation to justify doing that. Feel free to open an issue in licensee/licensee if you want to pursue there. |
My tool may help this issues. The two enhancements will make it easier to test continuously.
My tool can output the degree of similarity between documents and the number of words using a library called gensim. Currently there is no spdx/LGPL-3.0. For example, the similarity between the following two license files was 0.796, The difference in the number of words is the number of words in the header section. The comparison results for all other license texts are as follows. You can confirm that choosealicense.com-gh-pages/_licenses and "spdx license plain texts" were all similar by the following search.
|
@darkmorpher licensee can recognize both the GNU hosted text and SPDX version. You're pointing to a non-master branch in the MT repo. There is no license or copying file in the master branch in the root, that's why no license is detected. If you find a bug that you can reproduce in licensee, please open an issue in the licensee repo. |
IMO this is not a desirable goal as long as SPDX tampers with the original plain text version (if any) of a license, also see https://github.com/spdx/license-list-data/issues/44. This is because SPDX does not take a plain-text license as-is, but regenerates it from its own XML representation (as also described in the original post of this issue). |
@sschuberth yes I'm well aware of that. As I've written before (but am too lazy to search for now) I'd love to see the SPDX plaintext renderings be as close to the canonical plain text version of licenses, and have over the years contributed a few small fixes toward that. As I wrote in the issue comment above:
|
RE: @mlinksva (If still an issue) As a test case, Can you add one of these GitHub Actions to compare plaintext license and spdx data files in a new branch? Granted, all required files will have to be copied there too and repo will end up with duplicate files.
|
We should have a test that each license text in
_licenses
is the same as the plain text license in the SPDX collection to automate the requirement described at https://github.com/github/choosealicense.com/blob/gh-pages/CONTRIBUTING.md#adding-a-licenseThe test could clone spdx/license-list-data and compare each license we have cataloged in this project. Many existing licenses would probably have to be marked as expected failures due to bugs in SPDX output and discrepancies in how this project has cataloged some licenses. But we should address upfront for any new license cataloged here, and continue to chip away at the existing inconsistencies.
The text was updated successfully, but these errors were encountered: