Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
269 commits
Select commit Hold shift + click to select a range
f2c5862
rank.py updates
krmayankb Dec 24, 2022
7d97c26
outre: code updated for name for second method
krmayankb Dec 24, 2022
305d11b
Apply Docstring suggestions from code review
krmayankb Dec 24, 2022
e9468dd
Support for array_like labels and predictions
krmayankb Dec 24, 2022
6dbe3fb
Merge branch 'regression' of https://github.com/krmayankb/cleanlab in…
krmayankb Dec 24, 2022
48c3f57
doctring for method modified
krmayankb Dec 24, 2022
65e1a3c
datapoint -> example
krmayankb Dec 24, 2022
7b15cba
check_valid_inputs update
krmayankb Dec 24, 2022
b9b9104
tutorial removed
krmayankb Dec 27, 2022
c05f1fe
support for array_like
krmayankb Dec 28, 2022
d6ac642
unit tests to factor array_like
krmayankb Dec 28, 2022
2b327c3
Update docs/source/tutorials/index.rst
krmayankb Dec 28, 2022
4283a67
added basic regression ranking
krmayankb Oct 11, 2022
53455bf
minor fixes, docstring modified
krmayankb Nov 4, 2022
987ae0e
tutorial added, added to docs index pages
krmayankb Nov 10, 2022
7f9372b
unit tests added
krmayankb Nov 10, 2022
581c1f0
reindexed tutorial, punctuation fix for docstring
krmayankb Nov 10, 2022
13ab45e
plots changed in tutorial notebook
krmayankb Nov 15, 2022
0eac776
typo fix
krmayankb Dec 8, 2022
1a65c9a
cleanlab outlier based scoring method added
krmayankb Dec 9, 2022
e8a9a49
regression_utils created
krmayankb Dec 9, 2022
98930fc
pred_labels changed to predictions
krmayankb Dec 12, 2022
e4e6307
unit tests for new scoring method
krmayankb Dec 22, 2022
af2454b
init merge conflict resolved
krmayankb Dec 23, 2022
be8afaa
tutorial draft1
krmayankb Dec 22, 2022
ea2f723
tutorial draft1
krmayankb Dec 22, 2022
f9af6eb
merge conflict
krmayankb Dec 23, 2022
00bcf61
default modified for method in docstring
krmayankb Dec 23, 2022
542e30f
grammatical correction in rank.py
krmayankb Dec 23, 2022
3958b58
Update cleanlab/regression/rank.py
krmayankb Dec 23, 2022
db0bb5d
rank.py updates
krmayankb Dec 24, 2022
0ea2981
outre: code updated for name for second method
krmayankb Dec 24, 2022
9ab2092
Support for array_like labels and predictions
krmayankb Dec 24, 2022
c078d67
Apply Docstring suggestions from code review
krmayankb Dec 24, 2022
d1518da
doctring for method modified
krmayankb Dec 24, 2022
a819fe4
datapoint -> example
krmayankb Dec 24, 2022
ac52da7
check_valid_inputs update
krmayankb Dec 24, 2022
8394ee1
tutorial removed
krmayankb Dec 27, 2022
569b2ff
support for array_like
krmayankb Dec 28, 2022
86532b0
unit tests to factor array_like
krmayankb Dec 28, 2022
cb596a9
Update docs/source/tutorials/index.rst
krmayankb Dec 28, 2022
27ccc26
merge master to regression
krmayankb Dec 29, 2022
313faee
Merge branch 'regression' of https://github.com/krmayankb/cleanlab in…
krmayankb Dec 29, 2022
6c16a1a
unused imports removed
krmayankb Dec 29, 2022
f2b5f4e
Merge branch 'cleanlab:master' into regression
krmayankb Dec 29, 2022
54ae993
tutorial added
krmayankb Dec 30, 2022
a25b236
default, frac_neighbors 0.1 -> 0.5
krmayankb Dec 30, 2022
6bf61c6
updated tutorial notebook
krmayankb Dec 30, 2022
15bfa43
review suggestion updated
krmayankb Dec 30, 2022
feb5797
suggestion in test corrected
krmayankb Dec 30, 2022
3a49cfd
copyright updated
krmayankb Dec 30, 2022
cdfa82d
Error message suggestions updated
krmayankb Dec 30, 2022
7a25675
Copyright update
krmayankb Dec 30, 2022
f4571b9
Suggestions from code review
krmayankb Dec 30, 2022
22728d8
black formatting
krmayankb Dec 30, 2022
79a509f
import cell correction
krmayankb Dec 30, 2022
96d4ae0
estimator update to 10
krmayankb Dec 30, 2022
1fe5b37
example in docstring updated
krmayankb Dec 30, 2022
56bc771
chracterization test added
krmayankb Dec 30, 2022
3d71ae1
notebook output cleared
krmayankb Dec 30, 2022
9bca415
Fix broken link
anishathalye Dec 31, 2022
36b46c1
mention applications beyond label error detection in readme (#580)
jwmueller Jan 1, 2023
aa1542b
Drop python 3.6 support from dependencies in setup.py (#579)
sanjanag Jan 2, 2023
1e078e0
specify better default values
jwmueller Jan 4, 2023
0d9fee8
add maximum line length (#583)
cgnorthcutt Jan 5, 2023
044c5aa
ignore flake8 flagging unused submodule imports
jwmueller Jan 5, 2023
7caed93
Update github actions (#589)
ulya-tkch Jan 6, 2023
248bb91
Revamp text tutorial to use cleanlab Keras wrapper (#584)
huiwengoh Jan 6, 2023
fdfb029
clarify thresholding in issues_from_scores (#582)
jwmueller Jan 6, 2023
d911acc
Remove temp scaling from single annotator case (#590)
huiwengoh Jan 6, 2023
fe9efda
update docs dependencies (#593)
huiwengoh Jan 6, 2023
ebadffd
Use euclidean distance for identifying outliers for lower dimensional…
ulya-tkch Jan 7, 2023
7b589f6
updating copyright year to include 2023 (#594)
aditya1503 Jan 7, 2023
4a5d065
Handle missing type parameters for generic type "ndarray" (#587)
elisno Jan 7, 2023
feb3696
decide -> suggest
jwmueller Jan 9, 2023
71e21a9
remove temp scaling from ensemble active learning when data has singl…
huiwengoh Jan 9, 2023
503a57a
Adding type hints for mypy strict compatibility (#585)
unna97 Jan 10, 2023
1cf563c
fix typo in outliers.ipynb (#603)
eltociear Jan 19, 2023
a31b3ca
tags in links (#604)
cmauck10 Jan 19, 2023
2b6c564
10x speedup in find_label_issues on linux via better multiprocessing …
clu0 Jan 20, 2023
b9b981d
add cleanlab/projects to ignored URLs check list
jwmueller Jan 20, 2023
9f655a7
remove slash at end (#606)
jwmueller Jan 20, 2023
f9d32b1
Update tabular tutorial with better language (#609)
cmauck10 Jan 25, 2023
888246e
Improve num_label_issues usage of confident_joint to match find_label…
ulya-tkch Jan 27, 2023
5f9fd95
update crowdlab paper name
jwmueller Jan 30, 2023
d6f40d8
crowdlab paper name update
jwmueller Jan 30, 2023
5c1ba64
Removed duplicate classifier from setup.py (#612)
sanjanag Jan 30, 2023
0c02ec9
Add two methods to filter.find_label_issues (#595)
cgnorthcutt Feb 1, 2023
35d5323
Fix dictionary type annotation for OutOfDistribution object (#616)
ulya-tkch Feb 2, 2023
266d947
shorten notebook name in link (#617)
jwmueller Feb 2, 2023
6ed61e0
cap black version in CI (#618)
jwmueller Feb 2, 2023
c191d87
Fix format compatibility with latest black==23. release (#620)
ulya-tkch Feb 7, 2023
6aaee83
Create new cleanlab.models module (#601)
huiwengoh Feb 7, 2023
e2611e7
upgrade torch in docs (#607)
jwmueller Feb 7, 2023
5f6493f
fix bug: confidences -> confidence (#623)
jwmueller Feb 10, 2023
d99788b
Fixed duplicate issue removal in find_label_issues (#624)
ulya-tkch Feb 10, 2023
686cbf6
Method to estimate label issues with limited memory via mini-batches …
jwmueller Feb 11, 2023
5319a12
Make label_issues_batched documentation appear (#627)
jwmueller Feb 11, 2023
ac98282
add example script for find_label_issues_batched (#629)
jwmueller Feb 11, 2023
4d2753a
Fix KerasWrapper summary method (#631)
huiwengoh Feb 13, 2023
6acc7ae
Clarify rank.py not for multi-label classification (#626)
ulya-tkch Feb 13, 2023
2adb8b6
Removed $ from shell commands to avoid it being copied (#625)
sanjanag Feb 13, 2023
4850050
label_issues_batched multiprocessing (#630)
clu0 Feb 13, 2023
f4572dc
bugfix: missing self.n_jobs
jwmueller Feb 13, 2023
8c424d3
mypy tpye annotations
jwmueller Feb 13, 2023
731dd41
Support Zarr files in find_label_issues_batched (#632)
jwmueller Feb 13, 2023
c85bc7e
default n_jobs in label issues batched to =1 (#633)
jwmueller Feb 14, 2023
3757637
Fix batched multiprocessing being slower on tall matrices (#634)
clu0 Feb 15, 2023
6eb6019
update tests to be more stringent (#635)
jwmueller Feb 16, 2023
570ecbd
Switch to typing.Self (#489)
anishathalye Feb 19, 2023
1deb349
list more tasks in quickstart (#640)
jwmueller Feb 27, 2023
497947a
Fix tutorial hyperlinks in docs (#642)
huiwengoh Feb 28, 2023
dbc8711
Documentation improvements (#643)
huiwengoh Mar 1, 2023
38e1dab
bump version number
jwmueller Mar 1, 2023
b8c034f
add 2.3.0 to release versions (#644)
jwmueller Mar 1, 2023
4606fec
update readme links to point to latest docs (#645)
jwmueller Mar 1, 2023
b1bfdec
resize readme image (#650)
jwmueller Mar 10, 2023
2a16c47
post v2.3.0 release version bump (#646)
jwmueller Mar 13, 2023
44081c6
add activelab name to docs (#648)
jwmueller Mar 13, 2023
70a2ed2
Add clipping of small probabilities to address issue #639 (#647)
ulya-tkch Mar 13, 2023
0077664
Fix bug with call to find_overlapping_issues without specifying label…
huiwengoh Mar 13, 2023
fa1db6e
Bug fixes + improvements to multiannotator module (#654)
huiwengoh Mar 21, 2023
a5f35f3
additional faq question on handling train vs test data (#655)
jwmueller Mar 24, 2023
bed94f1
Update readme to better reflect current package (#656)
jwmueller Mar 24, 2023
229718d
formatting+typos
jwmueller Mar 24, 2023
7519e5d
update version for 2.3.1 release (#658)
jwmueller Mar 28, 2023
29a930d
bump git version past stable version (#659)
jwmueller Mar 29, 2023
6e247d5
link faq again at bottom of readme
jwmueller Apr 4, 2023
8cf089e
add section on practicing data-centric ai to readme (#660)
jwmueller Apr 4, 2023
c464848
clarify no class is also an option
jwmueller Apr 5, 2023
64c6cfc
Pass confident joint computed in CleanLearning to filter.find_label_i…
huiwengoh Apr 6, 2023
352d904
Add Example codeblock to the docstrings of important functions in the…
Steven-Yiran Apr 6, 2023
6d67175
Extract function for computating ood scores from distances (#664)
elisno Apr 7, 2023
2435a5a
Added code block examples for remaining methods in the cleanlab.datas…
Steven-Yiran Apr 8, 2023
1e653de
remove min batch size restriction in LabelInspector (#665)
huiwengoh Apr 11, 2023
f8c1866
move methods to multilabel_classification module (#657)
aditya1503 Apr 14, 2023
0fdf398
move int2onehot, onehot2int to top of multilabel tutorial (#666)
jwmueller Apr 14, 2023
d45a508
Update softmax to be more numerically stable (#667)
ulya-tkch Apr 15, 2023
ee5ed9b
Ensure multilabel docs appear in the documentation (#669)
jwmueller Apr 17, 2023
fd5fd2c
update fasttext installation requirement (#671)
huiwengoh Apr 18, 2023
b22b179
add dcai course to resources
cgnorthcutt Apr 18, 2023
64edc95
Introduce Datalab (#614)
elisno Apr 25, 2023
6353e54
Improve Datalab docstrings and fix "See also" block (#678)
elisno Apr 25, 2023
9c32337
Update text and tabular tutorial (#673)
huiwengoh Apr 27, 2023
ff3bdd1
Add a less restrictive `Reporter.report()` method for Datalab (#680)
elisno Apr 27, 2023
5650a87
Remove the "health_summary_kwargs" dictionary argument from LabelIssu…
elisno Apr 28, 2023
1883cd6
Add descriptions of issues that Datalab can detect to docs (#682)
elisno Apr 28, 2023
aa2423b
Add docs link to description of issue types in tutorial (#684)
elisno Apr 28, 2023
d5e81f0
allow for kwargs in token find_label_issues (#686)
jwmueller Apr 28, 2023
9ee78be
Fix report output formatting (#687)
elisno Apr 28, 2023
2d00c91
Update numpy.typing import and annotations (#688)
elisno May 1, 2023
e33b757
MAINT: standardize documentation and simplify code for outlier (#689)
DerWeh May 1, 2023
ac64c9b
Resize plots in Datalab tutorials (#690)
elisno May 2, 2023
f8b11d3
refactor(datalab): ♻️ rename get_summary -> get_issue_summary (#691)
elisno May 2, 2023
a95a447
Update tests for custom issue manager example (#692)
jwmueller May 3, 2023
036dcf3
Add testable example codeblock to functions in the dataset module (#668)
Steven-Yiran May 3, 2023
c999d84
change transformer model name (#693)
huiwengoh May 3, 2023
f77ef9f
increase hypothesis deadline to 500ms for test_find_overlapping_class…
elisno May 4, 2023
5ee2eb8
Drop summary scores from Datalab.report() (#699)
elisno May 5, 2023
79cc3eb
DOC: use default rules for shorter, more readable links (#700)
DerWeh May 5, 2023
8e83b37
test_all_close commented
krmayankb May 7, 2023
f1755e4
Merge branch 'regression' of https://github.com/krmayankb/cleanlab in…
krmayankb May 7, 2023
6df4974
Merge branch 'master' into regression
krmayankb May 7, 2023
9e840a3
unit test fixed
krmayankb May 7, 2023
7354708
Merge branch 'regression' of https://github.com/krmayankb/cleanlab in…
krmayankb May 7, 2023
49559ef
Add dev note about linking functions (#704)
huiwengoh May 9, 2023
be1bd69
Seed noniid permutation tests (#694)
elisno May 9, 2023
cf5d2a0
Added installation instructions for package extras (#697)
sanjanag May 9, 2023
a3ae30f
CI tests without optional dependencies and min-versions installed (#701)
sanjanag May 9, 2023
c41c837
Set non iid num issues with an extra rule (#707)
elisno May 10, 2023
d16cdd6
Suppress too_slow health check for dataset generation strategy (#706)
elisno May 10, 2023
fcc9417
Simplify reporting for outlier issues (#705)
elisno May 10, 2023
6ff06c8
ignore twitter in link check (#708)
jwmueller May 10, 2023
7400051
Fix unbound knn variable in NonIIDIssueManager (#709)
elisno May 11, 2023
87d9b73
Clarify when datalab quality score is / isnt comparable (#710)
jwmueller May 11, 2023
11f23d6
Update pseudocode for example issue type in datalab guide (#711)
jwmueller May 11, 2023
1767668
Add section for inspecting near duplicate issues (#712)
elisno May 12, 2023
525b4b2
Datalab docs improvements (#714)
huiwengoh May 12, 2023
4198e0a
headers for multilabel_classification modules docs (#716)
jwmueller May 12, 2023
70d70f2
multilabel docs formatting fixes (#717)
huiwengoh May 12, 2023
832a544
update readme for datalab release (#713)
jwmueller May 12, 2023
05d45fd
Update quickstart to reflect Datalab (#719)
jwmueller May 13, 2023
2193656
more updates to quickstart to reflect datalab (#720)
jwmueller May 13, 2023
5830714
update RELEASE_VERSIONS to include v2.4 (#715)
jwmueller May 13, 2023
d3c385f
readme space
jwmueller May 13, 2023
147d383
update clipping to use value from constants.py
huiwengoh May 15, 2023
5814b62
Merge branch 'cleanlab:master' into regression
huiwengoh May 16, 2023
6e59db3
bump version past v2.4 (#721)
jwmueller May 16, 2023
4eddcde
v0 of cleanlearning
huiwengoh May 19, 2023
375c76f
Include non-iid in default issue checks (#723)
elisno May 19, 2023
72d4595
Delete repeated lines in CleanLearning.save_space (#724)
huiwengoh May 19, 2023
8dc6fd6
add sample weight
huiwengoh May 19, 2023
0095c46
add save_space
huiwengoh May 19, 2023
6f14791
add some type checking + error catching
huiwengoh May 22, 2023
48fd0a0
add unittests
huiwengoh May 22, 2023
7ed00cf
clarify order matters for non-IID issue type (#726)
jwmueller May 22, 2023
b367479
fix typing
huiwengoh May 23, 2023
1c720ff
add docs structure
huiwengoh May 23, 2023
2fad7cf
Add option to purchase commerical license (#725)
cgnorthcutt May 23, 2023
999a1bd
phrasing
jwmueller May 24, 2023
114201b
ENH: make clipping unnecessary for entropy (#703)
DerWeh May 25, 2023
624d639
add docstrings
huiwengoh May 25, 2023
a6381b0
add docs for helper methods
huiwengoh May 25, 2023
d761159
fix mypy
huiwengoh May 25, 2023
8cdcdf7
edit datalab opening lines (#731)
jwmueller May 26, 2023
99be5af
add more tips to understand issue types in datalab tutorial (#734)
jwmueller May 31, 2023
43e24ba
Make labels optional in Datalab (#730)
elisno Jun 5, 2023
7f3008f
add tutorial
huiwengoh Jun 7, 2023
48f83b1
fix mypy
huiwengoh Jun 7, 2023
da65a97
identifying label errors in Object Detection data (#676)
ulya-tkch Jun 7, 2023
f851cc2
Merge branch 'master' into regression
huiwengoh Jun 7, 2023
bf3b56c
Add note about tensorflow>=2.11 (#738)
huiwengoh Jun 7, 2023
7d84aa4
Bugfix object detection visualize function(#739)
ulya-tkch Jun 8, 2023
95eb858
Suggestions from review
huiwengoh Jun 8, 2023
7359318
update notebook + misc cleanup
huiwengoh Jun 8, 2023
40955ee
Merge branch 'cleanlab:master' into regression
huiwengoh Jun 8, 2023
a0f4906
add unittests
huiwengoh Jun 8, 2023
c1420a0
update uncertainty functions
huiwengoh Jun 9, 2023
336dd1a
reorder tutorials (#741)
jwmueller Jun 9, 2023
9b2dfc0
Fix CI (#742)
anishathalye Jun 10, 2023
d51fa09
more detailed blurb (#743)
jwmueller Jun 12, 2023
2922d7e
Set more datalab tutorials as some of the main tutorials (#729)
elisno Jun 13, 2023
d3302c0
include n_jobs in cleanlearning kwargs docstring (#744)
jwmueller Jun 13, 2023
299d66f
Update object detection tutorial to clarify out-of-sample predictions…
ulya-tkch Jun 15, 2023
860bd6f
Dont print recompute joint warning when estimation_method = off_calib…
gordon-lim Jun 19, 2023
5f3de05
Temporary Github Actions workaround: Pin Python 3.7.16 (#748)
ulya-tkch Jun 22, 2023
a95fc07
reduce datalab tutorial plots to better fit into scroll window (#751)
tataganesh Jun 23, 2023
a62dc55
clarify OHE is optional
jwmueller Jun 23, 2023
050628a
clarify block is about your dataset
jwmueller Jun 23, 2023
8f2f580
remove label issue terminology from quickstart
jwmueller Jun 23, 2023
7310dbe
language for cleanlearning description
jwmueller Jun 23, 2023
44bc19d
clarify what label is
jwmueller Jun 23, 2023
7bc1563
clarify label issue
jwmueller Jun 23, 2023
ff10d47
second defining of label issue
jwmueller Jun 23, 2023
b2ab7ce
comma to slash
jwmueller Jun 23, 2023
9bb39ca
erroneous
jwmueller Jun 23, 2023
7eba836
rank module header language
jwmueller Jun 23, 2023
46fb58f
rmv ...
jwmueller Jun 23, 2023
53d73a2
clarify second set of label quality scores
jwmueller Jun 23, 2023
f12153f
typo fix
jwmueller Jun 23, 2023
c04fa09
discuss methods and paper
jwmueller Jun 23, 2023
26e909d
cleanlearning docstring edits
jwmueller Jun 23, 2023
35606fc
label error detection in semantic segmentation datasets (#677)
vdlad Jun 23, 2023
2857333
list dmlr papers (#753)
jwmueller Jun 23, 2023
66516d9
duplicated issues
huiwengoh Jun 23, 2023
65c08c0
Merge branch 'master' into regression
huiwengoh Jun 23, 2023
eb514d5
make methods private
huiwengoh Jun 23, 2023
d53b6a6
clear notebook outputs
huiwengoh Jun 23, 2023
3da126d
black formatting
huiwengoh Jun 23, 2023
f138ae8
update docs
huiwengoh Jun 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cleanlab/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@
from . import multiannotator
from . import outlier
from . import token_classification
from . import regression
1 change: 1 addition & 0 deletions cleanlab/regression/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from . import rank
48 changes: 48 additions & 0 deletions cleanlab/regression/rank.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import numpy as np

""" generate label quality score for regression dataset"""


def get_label_quality_scores(labels: np.ndarray, pred_labels: np.ndarray) -> np.ndarray:
"""
Returns label quality score for each example in the regression dataset.

Each score is continous value in range [0,1]
1 - clean label (given label is likely correct).
0 - dirty label (given label is likely incorrect).

Parameters
----------
labels:
Raw labels from original dataset.
Array of shape ``(N, )`` consisting given labels, where N is number of datapoints in the regression dataset.

pred_labels:
Predicated labels from regressor fitted on the dataset.
Array of shape ``(N,)`` consisting predicted labels, where N is number of datapoints in the regression dataset.

Returns
-------
label_quality_scores:
Array of shape ``(N, )`` of scores between 0 and 1, one per datapoint in the dataset.

Lower scores indicate datapoint more likely to contain a label issue.

Examples
--------
>>> import numpy as np
>>> from cleanlab.regression.rank import get_label_quality_scores
>>> labels = np.array([1,2,3,4])
>>> pred_labels = np.array([2,2,5,4.1])
>>> label_quality_scores = get_label_quality_scores(labels, pred_labels)
>>> label_quality_scores
array([0.36787944, 1. , 0.13533528, 0.90483742])
"""

assert (
labels.shape == pred_labels.shape
), f"shape of label {labels.shape} and predicted labels {pred_labels.shape} are not same."

residual = pred_labels - labels
quality_scores = np.exp(-abs(residual))
return quality_scores
8 changes: 8 additions & 0 deletions docs/source/cleanlab/regression.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
regression
==============

.. automodule:: cleanlab.regression
:autosummary:
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ Tutorials
token_classification
pred_probs_cross_val
faq
regression
338 changes: 338 additions & 0 deletions docs/source/tutorials/regression.ipynb

Large diffs are not rendered by default.