You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: data_labeling_examples/bulk_labeling_java/README.md
+60-15
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,9 @@
1
1
# Annotate bulk number of records in OCI Data Labeling Service (DLS)
2
2
3
+
Introduction to Bulk Labeling Utility [demo video](https://otube.oracle.com/media/Bulk+Labeling+Utility/1_6jv76ouj)
3
4
## Data Labeling Service (DLS) Bulk-Labeling tool
4
5
5
-
Bulk-Labeling Tool provides the following scripts:
6
+
Bulk-Labeling Tool provides the following scripts:
6
7
7
8
**1. Upload files to object storage bucket**
8
9
@@ -62,7 +63,6 @@ Result of CUSTOM_LABELS_MATCH algorithm:
62
63
dog/dog2.png will be labeled with dog and pup labels
63
64
```
64
65
65
-
**Supported in bulklabelutility-v2.jar !!**
66
66
3. **BulkAssistedLabelingScript**: This script takes datasetId as input along with the labeling algorithm as ML_ASSISTED_LABELING. There are 3 different ways to use this script -
67
67
1. Use the pretrained model offered by the ai service to auto label records
68
68
2. Provide the OCID of the custom ML model that you have trained separately using OCI ai services to auto label records
@@ -88,7 +88,40 @@ Conditions -
88
88
TRAINING_DATASET_ID (Required only for training a new model)
89
89
90
90
```
91
+
**3. Remove labels of records in Data Labeling Service**
91
92
93
+
**RemoveLabelScript:** This script takes REMOVE_LABEL_PREFIX as input and remove the labels from records which are matching with REMOVE_LABEL_PREFIX.
94
+
REMOVE_LABEL_PREFIX will be a label name, or label name prefix or '*'.
95
+
96
+
If '*' is given as REMOVE_LABEL_PREFIX then it will remove all labels from all records.
97
+
98
+
```
99
+
Consider a dataset having following records:
100
+
cat1.jpeg, cat2.jpeg, dog1.jpeg, dog2.jpeg
101
+
Labels in dataset: dog, pup, cat, kitten
102
+
cat1.jpeg will be labeled with cat label
103
+
cat2.jpeg will be labeled with cat and kitten labels
104
+
dog1.png will be labeled with dog label
105
+
dog2.png will be labeled with dog and pup labels
106
+
107
+
1. If REMOVE_LABEL_PREFIX = 'c' then it will remove label 'cat' from all labeled records. Dataset will be as folows :
108
+
cat1.jpeg -> unlabeled
109
+
cat2.jpeg will be labeled with kitten labels
110
+
dog1.png will be labeled with dog label
111
+
dog2.png will be labeled with dog and pup labels
112
+
113
+
2. If REMOVE_LABEL_PREFIX = 'd' then it will remove label 'dog' from all labeled records. Dataset will be as folows :
114
+
cat1.jpeg will be labeled with cat label
115
+
cat2.jpeg will be labeled with kitten labels
116
+
dog1.png -> unlabeled
117
+
dog2.png will be labeled with dog and pup labels
118
+
119
+
3. If REMOVE_LABEL_PREFIX = '*' then it will remove all labels from all labeled records. Dataset will be as folows :
120
+
cat1.jpeg -> unlabeled
121
+
cat2.jpeg -> unlabeled
122
+
dog1.png -> unlabeled
123
+
dog2.png -> unlabeled
124
+
```
92
125
### Requirements
93
126
1. An Oracle Cloud Infrastructure account. <br/>
94
127
2. A user created in that account, in a group with a policy that grants the desired permissions. This can be a user for yourself, or another person/system that needs to call the API. <br/>
8. Run the below command to bulk label by "ML_ASSISTED_LABELING" labeling algorithm.
186
+
9. Run the below command to bulk label by "ML_ASSISTED_LABELING" labeling algorithm.
154
187
155
188
Before you run the command, please understand the limitations of this utility.
156
189
1. If you choose a pretrained model to predict labels on the records, the DLS dataset labels should be a part of the supported categories for the auto labeling to provide results.
@@ -173,7 +206,12 @@ Known issues -
173
206
Language service text classification returns the dominant category to which a particular text belongs. So, auto labeling is not supported for multilabel text classification usecase.
#Number of Parallel Threads for Bulk Labeling. Default is 20
202
-
THREAD_COUNT=30
246
+
THREAD_COUNT=20
203
247
204
248
# Algorithm that will be used to assign labels to DLS Dataset records : FIRST_LETTER_MATCH, FIRST_REGEX_MATCH, CUSTOM_LABELS_MATCH, ML_ASSISTED_LABELING
Copy file name to clipboardExpand all lines: data_labeling_examples/bulk_labeling_java/src/main/java/com/oracle/datalabelingservicesamples/constants/DataLabelingConstants.java
+2
Original file line number
Diff line number
Diff line change
@@ -31,4 +31,6 @@ public class DataLabelingConstants {
Copy file name to clipboardExpand all lines: data_labeling_examples/bulk_labeling_java/src/main/java/com/oracle/datalabelingservicesamples/labelingstrategies/FirstLetterMatch.java
+1-1
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ public class FirstLetterMatch implements RuleBasedLabelingStrategy {
Copy file name to clipboardExpand all lines: data_labeling_examples/bulk_labeling_java/src/main/java/com/oracle/datalabelingservicesamples/requests/Config.java
0 commit comments