forked from cleanlab/cleanlab
-
Notifications
You must be signed in to change notification settings - Fork 0
Regression ranking #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
krmayankb
wants to merge
269
commits into
master
Choose a base branch
from
regression
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
269 commits
Select commit
Hold shift + click to select a range
f2c5862
rank.py updates
krmayankb 7d97c26
outre: code updated for name for second method
krmayankb 305d11b
Apply Docstring suggestions from code review
krmayankb e9468dd
Support for array_like labels and predictions
krmayankb 6dbe3fb
Merge branch 'regression' of https://github.com/krmayankb/cleanlab in…
krmayankb 48c3f57
doctring for method modified
krmayankb 65e1a3c
datapoint -> example
krmayankb 7b15cba
check_valid_inputs update
krmayankb b9b9104
tutorial removed
krmayankb c05f1fe
support for array_like
krmayankb d6ac642
unit tests to factor array_like
krmayankb 2b327c3
Update docs/source/tutorials/index.rst
krmayankb 4283a67
added basic regression ranking
krmayankb 53455bf
minor fixes, docstring modified
krmayankb 987ae0e
tutorial added, added to docs index pages
krmayankb 7f9372b
unit tests added
krmayankb 581c1f0
reindexed tutorial, punctuation fix for docstring
krmayankb 13ab45e
plots changed in tutorial notebook
krmayankb 0eac776
typo fix
krmayankb 1a65c9a
cleanlab outlier based scoring method added
krmayankb e8a9a49
regression_utils created
krmayankb 98930fc
pred_labels changed to predictions
krmayankb e4e6307
unit tests for new scoring method
krmayankb af2454b
init merge conflict resolved
krmayankb be8afaa
tutorial draft1
krmayankb ea2f723
tutorial draft1
krmayankb f9af6eb
merge conflict
krmayankb 00bcf61
default modified for method in docstring
krmayankb 542e30f
grammatical correction in rank.py
krmayankb 3958b58
Update cleanlab/regression/rank.py
krmayankb db0bb5d
rank.py updates
krmayankb 0ea2981
outre: code updated for name for second method
krmayankb 9ab2092
Support for array_like labels and predictions
krmayankb c078d67
Apply Docstring suggestions from code review
krmayankb d1518da
doctring for method modified
krmayankb a819fe4
datapoint -> example
krmayankb ac52da7
check_valid_inputs update
krmayankb 8394ee1
tutorial removed
krmayankb 569b2ff
support for array_like
krmayankb 86532b0
unit tests to factor array_like
krmayankb cb596a9
Update docs/source/tutorials/index.rst
krmayankb 27ccc26
merge master to regression
krmayankb 313faee
Merge branch 'regression' of https://github.com/krmayankb/cleanlab in…
krmayankb 6c16a1a
unused imports removed
krmayankb f2b5f4e
Merge branch 'cleanlab:master' into regression
krmayankb 54ae993
tutorial added
krmayankb a25b236
default, frac_neighbors 0.1 -> 0.5
krmayankb 6bf61c6
updated tutorial notebook
krmayankb 15bfa43
review suggestion updated
krmayankb feb5797
suggestion in test corrected
krmayankb 3a49cfd
copyright updated
krmayankb cdfa82d
Error message suggestions updated
krmayankb 7a25675
Copyright update
krmayankb f4571b9
Suggestions from code review
krmayankb 22728d8
black formatting
krmayankb 79a509f
import cell correction
krmayankb 96d4ae0
estimator update to 10
krmayankb 1fe5b37
example in docstring updated
krmayankb 56bc771
chracterization test added
krmayankb 3d71ae1
notebook output cleared
krmayankb 9bca415
Fix broken link
anishathalye 36b46c1
mention applications beyond label error detection in readme (#580)
jwmueller aa1542b
Drop python 3.6 support from dependencies in setup.py (#579)
sanjanag 1e078e0
specify better default values
jwmueller 0d9fee8
add maximum line length (#583)
cgnorthcutt 044c5aa
ignore flake8 flagging unused submodule imports
jwmueller 7caed93
Update github actions (#589)
ulya-tkch 248bb91
Revamp text tutorial to use cleanlab Keras wrapper (#584)
huiwengoh fdfb029
clarify thresholding in issues_from_scores (#582)
jwmueller d911acc
Remove temp scaling from single annotator case (#590)
huiwengoh fe9efda
update docs dependencies (#593)
huiwengoh ebadffd
Use euclidean distance for identifying outliers for lower dimensional…
ulya-tkch 7b589f6
updating copyright year to include 2023 (#594)
aditya1503 4a5d065
Handle missing type parameters for generic type "ndarray" (#587)
elisno feb3696
decide -> suggest
jwmueller 71e21a9
remove temp scaling from ensemble active learning when data has singl…
huiwengoh 503a57a
Adding type hints for mypy strict compatibility (#585)
unna97 1cf563c
fix typo in outliers.ipynb (#603)
eltociear a31b3ca
tags in links (#604)
cmauck10 2b6c564
10x speedup in find_label_issues on linux via better multiprocessing …
clu0 b9b981d
add cleanlab/projects to ignored URLs check list
jwmueller 9f655a7
remove slash at end (#606)
jwmueller f9d32b1
Update tabular tutorial with better language (#609)
cmauck10 888246e
Improve num_label_issues usage of confident_joint to match find_label…
ulya-tkch 5f9fd95
update crowdlab paper name
jwmueller d6f40d8
crowdlab paper name update
jwmueller 5c1ba64
Removed duplicate classifier from setup.py (#612)
sanjanag 0c02ec9
Add two methods to filter.find_label_issues (#595)
cgnorthcutt 35d5323
Fix dictionary type annotation for OutOfDistribution object (#616)
ulya-tkch 266d947
shorten notebook name in link (#617)
jwmueller 6ed61e0
cap black version in CI (#618)
jwmueller c191d87
Fix format compatibility with latest black==23. release (#620)
ulya-tkch 6aaee83
Create new cleanlab.models module (#601)
huiwengoh e2611e7
upgrade torch in docs (#607)
jwmueller 5f6493f
fix bug: confidences -> confidence (#623)
jwmueller d99788b
Fixed duplicate issue removal in find_label_issues (#624)
ulya-tkch 686cbf6
Method to estimate label issues with limited memory via mini-batches …
jwmueller 5319a12
Make label_issues_batched documentation appear (#627)
jwmueller ac98282
add example script for find_label_issues_batched (#629)
jwmueller 4d2753a
Fix KerasWrapper summary method (#631)
huiwengoh 6acc7ae
Clarify rank.py not for multi-label classification (#626)
ulya-tkch 2adb8b6
Removed $ from shell commands to avoid it being copied (#625)
sanjanag 4850050
label_issues_batched multiprocessing (#630)
clu0 f4572dc
bugfix: missing self.n_jobs
jwmueller 8c424d3
mypy tpye annotations
jwmueller 731dd41
Support Zarr files in find_label_issues_batched (#632)
jwmueller c85bc7e
default n_jobs in label issues batched to =1 (#633)
jwmueller 3757637
Fix batched multiprocessing being slower on tall matrices (#634)
clu0 6eb6019
update tests to be more stringent (#635)
jwmueller 570ecbd
Switch to typing.Self (#489)
anishathalye 1deb349
list more tasks in quickstart (#640)
jwmueller 497947a
Fix tutorial hyperlinks in docs (#642)
huiwengoh dbc8711
Documentation improvements (#643)
huiwengoh 38e1dab
bump version number
jwmueller b8c034f
add 2.3.0 to release versions (#644)
jwmueller 4606fec
update readme links to point to latest docs (#645)
jwmueller b1bfdec
resize readme image (#650)
jwmueller 2a16c47
post v2.3.0 release version bump (#646)
jwmueller 44081c6
add activelab name to docs (#648)
jwmueller 70a2ed2
Add clipping of small probabilities to address issue #639 (#647)
ulya-tkch 0077664
Fix bug with call to find_overlapping_issues without specifying label…
huiwengoh fa1db6e
Bug fixes + improvements to multiannotator module (#654)
huiwengoh a5f35f3
additional faq question on handling train vs test data (#655)
jwmueller bed94f1
Update readme to better reflect current package (#656)
jwmueller 229718d
formatting+typos
jwmueller 7519e5d
update version for 2.3.1 release (#658)
jwmueller 29a930d
bump git version past stable version (#659)
jwmueller 6e247d5
link faq again at bottom of readme
jwmueller 8cf089e
add section on practicing data-centric ai to readme (#660)
jwmueller c464848
clarify no class is also an option
jwmueller 64c6cfc
Pass confident joint computed in CleanLearning to filter.find_label_i…
huiwengoh 352d904
Add Example codeblock to the docstrings of important functions in the…
Steven-Yiran 6d67175
Extract function for computating ood scores from distances (#664)
elisno 2435a5a
Added code block examples for remaining methods in the cleanlab.datas…
Steven-Yiran 1e653de
remove min batch size restriction in LabelInspector (#665)
huiwengoh f8c1866
move methods to multilabel_classification module (#657)
aditya1503 0fdf398
move int2onehot, onehot2int to top of multilabel tutorial (#666)
jwmueller d45a508
Update softmax to be more numerically stable (#667)
ulya-tkch ee5ed9b
Ensure multilabel docs appear in the documentation (#669)
jwmueller fd5fd2c
update fasttext installation requirement (#671)
huiwengoh b22b179
add dcai course to resources
cgnorthcutt 64edc95
Introduce Datalab (#614)
elisno 6353e54
Improve Datalab docstrings and fix "See also" block (#678)
elisno 9c32337
Update text and tabular tutorial (#673)
huiwengoh ff3bdd1
Add a less restrictive `Reporter.report()` method for Datalab (#680)
elisno 5650a87
Remove the "health_summary_kwargs" dictionary argument from LabelIssu…
elisno 1883cd6
Add descriptions of issues that Datalab can detect to docs (#682)
elisno aa2423b
Add docs link to description of issue types in tutorial (#684)
elisno d5e81f0
allow for kwargs in token find_label_issues (#686)
jwmueller 9ee78be
Fix report output formatting (#687)
elisno 2d00c91
Update numpy.typing import and annotations (#688)
elisno e33b757
MAINT: standardize documentation and simplify code for outlier (#689)
DerWeh ac64c9b
Resize plots in Datalab tutorials (#690)
elisno f8b11d3
refactor(datalab): ♻️ rename get_summary -> get_issue_summary (#691)
elisno a95a447
Update tests for custom issue manager example (#692)
jwmueller 036dcf3
Add testable example codeblock to functions in the dataset module (#668)
Steven-Yiran c999d84
change transformer model name (#693)
huiwengoh f77ef9f
increase hypothesis deadline to 500ms for test_find_overlapping_class…
elisno 5ee2eb8
Drop summary scores from Datalab.report() (#699)
elisno 79cc3eb
DOC: use default rules for shorter, more readable links (#700)
DerWeh 8e83b37
test_all_close commented
krmayankb f1755e4
Merge branch 'regression' of https://github.com/krmayankb/cleanlab in…
krmayankb 6df4974
Merge branch 'master' into regression
krmayankb 9e840a3
unit test fixed
krmayankb 7354708
Merge branch 'regression' of https://github.com/krmayankb/cleanlab in…
krmayankb 49559ef
Add dev note about linking functions (#704)
huiwengoh be1bd69
Seed noniid permutation tests (#694)
elisno cf5d2a0
Added installation instructions for package extras (#697)
sanjanag a3ae30f
CI tests without optional dependencies and min-versions installed (#701)
sanjanag c41c837
Set non iid num issues with an extra rule (#707)
elisno d16cdd6
Suppress too_slow health check for dataset generation strategy (#706)
elisno fcc9417
Simplify reporting for outlier issues (#705)
elisno 6ff06c8
ignore twitter in link check (#708)
jwmueller 7400051
Fix unbound knn variable in NonIIDIssueManager (#709)
elisno 87d9b73
Clarify when datalab quality score is / isnt comparable (#710)
jwmueller 11f23d6
Update pseudocode for example issue type in datalab guide (#711)
jwmueller 1767668
Add section for inspecting near duplicate issues (#712)
elisno 525b4b2
Datalab docs improvements (#714)
huiwengoh 4198e0a
headers for multilabel_classification modules docs (#716)
jwmueller 70d70f2
multilabel docs formatting fixes (#717)
huiwengoh 832a544
update readme for datalab release (#713)
jwmueller 05d45fd
Update quickstart to reflect Datalab (#719)
jwmueller 2193656
more updates to quickstart to reflect datalab (#720)
jwmueller 5830714
update RELEASE_VERSIONS to include v2.4 (#715)
jwmueller d3c385f
readme space
jwmueller 147d383
update clipping to use value from constants.py
huiwengoh 5814b62
Merge branch 'cleanlab:master' into regression
huiwengoh 6e59db3
bump version past v2.4 (#721)
jwmueller 4eddcde
v0 of cleanlearning
huiwengoh 375c76f
Include non-iid in default issue checks (#723)
elisno 72d4595
Delete repeated lines in CleanLearning.save_space (#724)
huiwengoh 8dc6fd6
add sample weight
huiwengoh 0095c46
add save_space
huiwengoh 6f14791
add some type checking + error catching
huiwengoh 48fd0a0
add unittests
huiwengoh 7ed00cf
clarify order matters for non-IID issue type (#726)
jwmueller b367479
fix typing
huiwengoh 1c720ff
add docs structure
huiwengoh 2fad7cf
Add option to purchase commerical license (#725)
cgnorthcutt 999a1bd
phrasing
jwmueller 114201b
ENH: make clipping unnecessary for entropy (#703)
DerWeh 624d639
add docstrings
huiwengoh a6381b0
add docs for helper methods
huiwengoh d761159
fix mypy
huiwengoh 8cdcdf7
edit datalab opening lines (#731)
jwmueller 99be5af
add more tips to understand issue types in datalab tutorial (#734)
jwmueller 43e24ba
Make labels optional in Datalab (#730)
elisno 7f3008f
add tutorial
huiwengoh 48f83b1
fix mypy
huiwengoh da65a97
identifying label errors in Object Detection data (#676)
ulya-tkch f851cc2
Merge branch 'master' into regression
huiwengoh bf3b56c
Add note about tensorflow>=2.11 (#738)
huiwengoh 7d84aa4
Bugfix object detection visualize function(#739)
ulya-tkch 95eb858
Suggestions from review
huiwengoh 7359318
update notebook + misc cleanup
huiwengoh 40955ee
Merge branch 'cleanlab:master' into regression
huiwengoh a0f4906
add unittests
huiwengoh c1420a0
update uncertainty functions
huiwengoh 336dd1a
reorder tutorials (#741)
jwmueller 9b2dfc0
Fix CI (#742)
anishathalye d51fa09
more detailed blurb (#743)
jwmueller 2922d7e
Set more datalab tutorials as some of the main tutorials (#729)
elisno d3302c0
include n_jobs in cleanlearning kwargs docstring (#744)
jwmueller 299d66f
Update object detection tutorial to clarify out-of-sample predictions…
ulya-tkch 860bd6f
Dont print recompute joint warning when estimation_method = off_calib…
gordon-lim 5f3de05
Temporary Github Actions workaround: Pin Python 3.7.16 (#748)
ulya-tkch a95fc07
reduce datalab tutorial plots to better fit into scroll window (#751)
tataganesh a62dc55
clarify OHE is optional
jwmueller 050628a
clarify block is about your dataset
jwmueller 8f2f580
remove label issue terminology from quickstart
jwmueller 7310dbe
language for cleanlearning description
jwmueller 44bc19d
clarify what label is
jwmueller 7bc1563
clarify label issue
jwmueller ff10d47
second defining of label issue
jwmueller b2ab7ce
comma to slash
jwmueller 9bb39ca
erroneous
jwmueller 7eba836
rank module header language
jwmueller 46fb58f
rmv ...
jwmueller 53d73a2
clarify second set of label quality scores
jwmueller f12153f
typo fix
jwmueller c04fa09
discuss methods and paper
jwmueller 26e909d
cleanlearning docstring edits
jwmueller 35606fc
label error detection in semantic segmentation datasets (#677)
vdlad 2857333
list dmlr papers (#753)
jwmueller 66516d9
duplicated issues
huiwengoh 65c08c0
Merge branch 'master' into regression
huiwengoh eb514d5
make methods private
huiwengoh d53b6a6
clear notebook outputs
huiwengoh 3da126d
black formatting
huiwengoh f138ae8
update docs
huiwengoh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,3 +8,4 @@ | |
| from . import multiannotator | ||
| from . import outlier | ||
| from . import token_classification | ||
| from . import regression | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| from . import rank |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| import numpy as np | ||
|
|
||
| """ generate label quality score for regression dataset""" | ||
|
|
||
|
|
||
| def get_label_quality_scores(labels: np.ndarray, pred_labels: np.ndarray) -> np.ndarray: | ||
| """ | ||
| Returns label quality score for each example in the regression dataset. | ||
|
|
||
| Each score is continous value in range [0,1] | ||
| 1 - clean label (given label is likely correct). | ||
| 0 - dirty label (given label is likely incorrect). | ||
|
|
||
| Parameters | ||
| ---------- | ||
| labels: | ||
| Raw labels from original dataset. | ||
| Array of shape ``(N, )`` consisting given labels, where N is number of datapoints in the regression dataset. | ||
|
|
||
| pred_labels: | ||
| Predicated labels from regressor fitted on the dataset. | ||
| Array of shape ``(N,)`` consisting predicted labels, where N is number of datapoints in the regression dataset. | ||
|
|
||
| Returns | ||
| ------- | ||
| label_quality_scores: | ||
| Array of shape ``(N, )`` of scores between 0 and 1, one per datapoint in the dataset. | ||
|
|
||
| Lower scores indicate datapoint more likely to contain a label issue. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> import numpy as np | ||
| >>> from cleanlab.regression.rank import get_label_quality_scores | ||
| >>> labels = np.array([1,2,3,4]) | ||
| >>> pred_labels = np.array([2,2,5,4.1]) | ||
| >>> label_quality_scores = get_label_quality_scores(labels, pred_labels) | ||
| >>> label_quality_scores | ||
| array([0.36787944, 1. , 0.13533528, 0.90483742]) | ||
| """ | ||
huiwengoh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| assert ( | ||
| labels.shape == pred_labels.shape | ||
| ), f"shape of label {labels.shape} and predicted labels {pred_labels.shape} are not same." | ||
|
|
||
| residual = pred_labels - labels | ||
| quality_scores = np.exp(-abs(residual)) | ||
| return quality_scores | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| regression | ||
| ============== | ||
|
|
||
| .. automodule:: cleanlab.regression | ||
| :autosummary: | ||
| :members: | ||
| :undoc-members: | ||
| :show-inheritance: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.