Skip to content

Commit 49e9ef3

Browse files
authored
Added Remove label functionality and updated documentation and bug fixes (#282)
* dummy commit * Added Remove label functionality and updated documentation and bug fixes * changed DP service endpoint * added dummy commit
1 parent a5eac7e commit 49e9ef3

File tree

15 files changed

+504
-41
lines changed

15 files changed

+504
-41
lines changed

data_labeling_examples/bulk_labeling_java/README.md

+60-15
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
# Annotate bulk number of records in OCI Data Labeling Service (DLS)
22

3+
Introduction to Bulk Labeling Utility [demo video](https://otube.oracle.com/media/Bulk+Labeling+Utility/1_6jv76ouj)
34
## Data Labeling Service (DLS) Bulk-Labeling tool
45

5-
Bulk-Labeling Tool provides the following scripts:
6+
Bulk-Labeling Tool provides the following scripts :
67

78
**1. Upload files to object storage bucket**
89

@@ -62,7 +63,6 @@ Result of CUSTOM_LABELS_MATCH algorithm:
6263
dog/dog2.png will be labeled with dog and pup labels
6364
```
6465
65-
**Supported in bulklabelutility-v2.jar !!**
6666
3. **BulkAssistedLabelingScript**: This script takes datasetId as input along with the labeling algorithm as ML_ASSISTED_LABELING. There are 3 different ways to use this script -
6767
1. Use the pretrained model offered by the ai service to auto label records
6868
2. Provide the OCID of the custom ML model that you have trained separately using OCI ai services to auto label records
@@ -88,7 +88,40 @@ Conditions -
8888
TRAINING_DATASET_ID (Required only for training a new model)
8989

9090
```
91+
**3. Remove labels of records in Data Labeling Service**
9192
93+
**RemoveLabelScript:** This script takes REMOVE_LABEL_PREFIX as input and remove the labels from records which are matching with REMOVE_LABEL_PREFIX.
94+
REMOVE_LABEL_PREFIX will be a label name, or label name prefix or '*'.
95+
96+
If '*' is given as REMOVE_LABEL_PREFIX then it will remove all labels from all records.
97+
98+
```
99+
Consider a dataset having following records:
100+
cat1.jpeg, cat2.jpeg, dog1.jpeg, dog2.jpeg
101+
Labels in dataset: dog, pup, cat, kitten
102+
cat1.jpeg will be labeled with cat label
103+
cat2.jpeg will be labeled with cat and kitten labels
104+
dog1.png will be labeled with dog label
105+
dog2.png will be labeled with dog and pup labels
106+
107+
1. If REMOVE_LABEL_PREFIX = 'c' then it will remove label 'cat' from all labeled records. Dataset will be as folows :
108+
cat1.jpeg -> unlabeled
109+
cat2.jpeg will be labeled with kitten labels
110+
dog1.png will be labeled with dog label
111+
dog2.png will be labeled with dog and pup labels
112+
113+
2. If REMOVE_LABEL_PREFIX = 'd' then it will remove label 'dog' from all labeled records. Dataset will be as folows :
114+
cat1.jpeg will be labeled with cat label
115+
cat2.jpeg will be labeled with kitten labels
116+
dog1.png -> unlabeled
117+
dog2.png will be labeled with dog and pup labels
118+
119+
3. If REMOVE_LABEL_PREFIX = '*' then it will remove all labels from all labeled records. Dataset will be as folows :
120+
cat1.jpeg -> unlabeled
121+
cat2.jpeg -> unlabeled
122+
dog1.png -> unlabeled
123+
dog2.png -> unlabeled
124+
```
92125
### Requirements
93126
1. An Oracle Cloud Infrastructure account. <br/>
94127
2. A user created in that account, in a group with a policy that grants the desired permissions. This can be a user for yourself, or another person/system that needs to call the API. <br/>
@@ -125,32 +158,32 @@ java -version
125158
```
126159
git clone https://github.com/oracle-samples/oci-data-science-ai-samples.git
127160
```
128-
4. Go to data_labeling_examples directory
161+
4. Go to data_labeling_examples/bulk_labeling_java directory
129162
130163
```
131-
cd data_labeling_examples
164+
cd data_labeling_examples/bulk_labeling_java
132165
```
133166
5. Run the below command to upload files to object storage bucket.
134167
135168
```
136-
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DOBJECT_STORAGE_URL=https://objectstorage.<REGION>.oraclecloud.com -DOBJECT_STORAGE_BUCKET_NAME=<BUCKET_NAME> -DOBJECT_STORAGE_NAMESPACE=<NAMESPACE> -DDATASET_DIRECTORY_PATH=<DIRECTORY_PATH> -cp libs/bulklabelutility-v1.jar com.oracle.datalabelingservicesamples.scripts.UploadToObjectStorageScript
169+
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DOBJECT_STORAGE_URL=https://objectstorage.<REGION>.oraclecloud.com -DOBJECT_STORAGE_BUCKET_NAME=<BUCKET_NAME> -DOBJECT_STORAGE_NAMESPACE=<NAMESPACE> -DDATASET_DIRECTORY_PATH=<DIRECTORY_PATH> -cp libs/bulklabelutility-v3.jar com.oracle.datalabelingservicesamples.scripts.UploadToObjectStorageScript
137170
```
138171
6. Run the below command to bulk label by "FIRST_LETTER_MATCH" labeling algorithm.
139172
140173
```
141-
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.us-ashburn-1.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.compartment.oc1..aaaaaaaawob4faujxaqxqzrb555b44wxxrfkcpapjxwp4s4hwjthu46idr5a -DLABELING_ALGORITHM=FIRST_LETTER_MATCH -DLABELS=cat,dog -cp libs/bulklabelutility-v2.jar com.oracle.datalabelingservicesamples.scripts.SingleLabelDatasetBulkLabelingScript
174+
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.<REGION>.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46ia7ybplfjdfmohqxxmwpg4p6nftl4ypnuirvsljkzhlq3q -DLABELING_ALGORITHM=FIRST_LETTER_MATCH -DLABELS=cat,dog -cp libs/bulklabelutility-v3.jar com.oracle.datalabelingservicesamples.scripts.SingleLabelDatasetBulkLabelingScript
142175
```
143176
7. Run the below command to bulk label by "FIRST_REGEX_MATCH" labeling algorithm.
144177
145178
```
146-
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.us-ashburn-1.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.compartment.oc1..aaaaaaaawob4faujxaqxqzrb555b44wxxrfkcpapjxwp4s4hwjthu46idr5a -DLABELING_ALGORITHM=FIRST_REGEX_MATCH -DFIRST_MATCH_REGEX_PATTERN=^abc* -DLABELS=cat,dog -cp libs/bulklabelutility-v2.jar com.oracle.datalabelingservicesamples.scripts.SingleLabelDatasetBulkLabelingScript
179+
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.<REGION>.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46ia7ybplfjdfmohqxxmwpg4p6nftl4ypnuirvsljkzhlq3q -DLABELING_ALGORITHM=FIRST_REGEX_MATCH -DFIRST_MATCH_REGEX_PATTERN=^abc* -DLABELS=cat,dog -cp libs/bulklabelutility-v3.jar com.oracle.datalabelingservicesamples.scripts.SingleLabelDatasetBulkLabelingScript
147180
```
148181
8. Run the below command to bulk label by "CUSTOM_LABELS_MATCH" labeling algorithm.
149182
150183
```
151-
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.us-ashburn-1.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.compartment.oc1..aaaaaaaawob4faujxaqxqzrb555b44wxxrfkcpapjxwp4s4hwjthu46idr5a -DLABELING_ALGORITHM=CUSTOM_LABELS_MATCH -DCUSTOM_LABELS='{"dog/": ["dog"], "cat/": ["cat"] }' -cp libs/bulklabelutility-v2.jar com.oracle.datalabelingservicesamples.scripts.CustomBulkLabelingScript
184+
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlsprod-dp.<REGION>.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46ia7ybplfjdfmohqxxmwpg4p6nftl4ypnuirvsljkzhlq3q -DLABELING_ALGORITHM=CUSTOM_LABELS_MATCH -DCUSTOM_LABELS='{"dog/": ["dog"], "cat/": ["cat"] }' -cp libs/bulklabelutility-v3.jar com.oracle.datalabelingservicesamples.scripts.CustomBulkLabelingScript
152185
```
153-
8. Run the below command to bulk label by "ML_ASSISTED_LABELING" labeling algorithm.
186+
9. Run the below command to bulk label by "ML_ASSISTED_LABELING" labeling algorithm.
154187
155188
Before you run the command, please understand the limitations of this utility.
156189
1. If you choose a pretrained model to predict labels on the records, the DLS dataset labels should be a part of the supported categories for the auto labeling to provide results.
@@ -173,7 +206,12 @@ Known issues -
173206
Language service text classification returns the dominant category to which a particular text belongs. So, auto labeling is not supported for multilabel text classification usecase.
174207
175208
```
176-
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DTHREAD_COUNT=20 -DREGION=us-phoenix-1 -DLABELING_ALGORITHM=ML_ASSISTED_LABELING -DML_MODEL_TYPE=PRETRAINED -DCONFIDENCE_THRESHOLD=0.8 -DDATASET_ID=ocid1.datalabelingdataset.oc1.phx.amaaaaaaniob46ia4qae7hitbpxx6cmc6kmoowvxkckxmdlmdvtdprgibnsa -cp libs/bulklabelutility-v2.jar com.oracle.datalabelingservicesamples.scripts.BulkAssistedLabelingScript
209+
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DTHREAD_COUNT=20 -DREGION=us-phoenix-1 -DLABELING_ALGORITHM=ML_ASSISTED_LABELING -DML_MODEL_TYPE=PRETRAINED -DCONFIDENCE_THRESHOLD=0.8 -DDATASET_ID=ocid1.datalabelingdataset.oc1.phx.amaaaaaaniob46ia4qae7hitbpxx6cmc6kmoowvxkckxmdlmdvtdprgibnsa -cp libs/bulklabelutility-v3.jar com.oracle.datalabelingservicesamples.scripts.BulkAssistedLabelingScript
210+
```
211+
212+
10. Run the below command to remove labels from records
213+
```
214+
java -DCONFIG_FILE_PATH='~/.oci/config' -DCONFIG_PROFILE=DEFAULT -DDLS_DP_URL=https://dlstest-dp.<REGION>.oci.oraclecloud.com -DTHREAD_COUNT=20 -DDATASET_ID=ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46ia7ybplfjdfmohqxxmwpg4p6nftl4ypnuirvsljkzhlq3q -DREMOVE_LABEL_PREFIX='cat' -cp libs/bulklabelutility-v3.jar com.oracle.datalabelingservicesamples.scripts.RemoveLabelScript
177215
```
178216
179217
Note: You can override any config using -D followed by the configuration name. The list of all configurations are mentioned in following section.
@@ -189,17 +227,23 @@ CONFIG_FILE_PATH=~/.oci/config
189227
#Config Profile
190228
CONFIG_PROFILE=DEFAULT
191229

230+
#region identifier
231+
REGION=us-phoenix-1
232+
192233
#DLS DP URL
193-
DLS_DP_URL=https://dlsprod-dp.uk-london-1.oci.oraclecloud.com
234+
DLS_DP_URL=https://dlsprod-dp.<REGION>.oci.oraclecloud.com
235+
236+
#DLS CP URL
237+
DLS_CP_URL=https://dlsprod-cp.<REGION>.oci.oraclecloud.com
194238

195239
#OBJECT STORAGE URL
196-
OBJECT_STORAGE_URL=https://objectstorage.uk-london-1.oraclecloud.com
240+
OBJECT_STORAGE_URL=https://objectstorage.<REGION>.oraclecloud.com
197241

198242
#Dataset Id whose record you want to bulk label
199-
DATASET_ID=ocid1.compartment.oc1..aaaaaaaawob4faujxaqxqzrb555b44wxxrfkcpapjxwp4s4hwjthu46idr5a
243+
DATASET_ID=ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46ia7ybplfjdfmohqxxmwpg4p6nftl4ypnuirvsljkzhlq3q
200244

201245
#Number of Parallel Threads for Bulk Labeling. Default is 20
202-
THREAD_COUNT=30
246+
THREAD_COUNT=20
203247

204248
# Algorithm that will be used to assign labels to DLS Dataset records : FIRST_LETTER_MATCH, FIRST_REGEX_MATCH, CUSTOM_LABELS_MATCH, ML_ASSISTED_LABELING
205249
LABELING_ALGORITHM=FIRST_REGEX_MATCH
@@ -222,7 +266,8 @@ OBJECT_STORAGE_BUCKET_NAME=bucket-20220629-0913
222266
#Namespace of the object storage bucket
223267
OBJECT_STORAGE_NAMESPACE=idgszs0xipmn
224268

225-
REGION=us-phoenix-1
269+
#Prefix will be a label name, or label name prefix
270+
REMOVE_LABEL_PREFIX=
226271

227272
## All the following inputs are only for ML_ASSISTED_LABELING algorithm :
228273

Binary file not shown.

data_labeling_examples/bulk_labeling_java/pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,12 @@
2525
<dependency>
2626
<groupId>com.oracle.pic.commons</groupId>
2727
<artifactId>core</artifactId>
28-
<version>3.2.47</version>
28+
<version>3.2.50</version>
2929
</dependency>
3030
<dependency>
3131
<groupId>com.oracle.pic.commons</groupId>
3232
<artifactId>core-resources</artifactId>
33-
<version>1.2.225</version>
33+
<version>1.2.248</version>
3434
</dependency>
3535
<dependency>
3636
<groupId>com.oracle.oci.sdk</groupId>

data_labeling_examples/bulk_labeling_java/src/main/java/com/oracle/datalabelingservicesamples/constants/DataLabelingConstants.java

+2
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,6 @@ public class DataLabelingConstants {
3131
public static final String TENANT = "TENANT";
3232
public static final String DLS = "DLS";
3333
public static final String OBJECT_STORAGE = "OBJECT_STORAGE";
34+
public static final String REMOVE_LABEL = "REMOVE_LABEL";
35+
public static final String REMOVE_LABEL_PREFIX = "REMOVE_LABEL_PREFIX";
3436
}

data_labeling_examples/bulk_labeling_java/src/main/java/com/oracle/datalabelingservicesamples/labelingstrategies/FirstLetterMatch.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ public class FirstLetterMatch implements RuleBasedLabelingStrategy {
1111
@Override
1212
public List<String> getLabel(RecordSummary record) {
1313
for (String label : Config.INSTANCE.getLabels()) {
14-
if (record.getName().startsWith(label)) {
14+
if (record.getName().startsWith(String.valueOf(label.charAt(0)))) {
1515
return Arrays.asList(label);
1616
}
1717
}

data_labeling_examples/bulk_labeling_java/src/main/java/com/oracle/datalabelingservicesamples/requests/Config.java

+15
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@ public enum Config {
6969
private String objectStorageBucket;
7070
private String datasetDirectory;
7171

72+
private String removeLabelPrefix;
73+
7274
private Config() {
7375
try {
7476
Properties config = new Properties();
@@ -128,6 +130,9 @@ private Config() {
128130
datasetDirectory = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.DATASET_DIRECTORY_PATH))
129131
? config.getProperty(DataLabelingConstants.DATASET_DIRECTORY_PATH)
130132
: System.getProperty(DataLabelingConstants.DATASET_DIRECTORY_PATH);
133+
removeLabelPrefix = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.REMOVE_LABEL_PREFIX))
134+
? config.getProperty(DataLabelingConstants.REMOVE_LABEL_PREFIX)
135+
: System.getProperty(DataLabelingConstants.REMOVE_LABEL_PREFIX);
131136
String threadConfig = StringUtils.isEmpty(System.getProperty(DataLabelingConstants.THREAD_COUNT))
132137
? config.getProperty(DataLabelingConstants.THREAD_COUNT)
133138
: System.getProperty(DataLabelingConstants.THREAD_COUNT);
@@ -159,6 +164,10 @@ private void validateAndInitialize(Properties config) {
159164
case DataLabelingConstants.OBJECT_STORAGE:
160165
performAssertionOnObjectStorageInput();
161166
initializeObjectStorageClient();
167+
break;
168+
case DataLabelingConstants.REMOVE_LABEL:
169+
performAssertionOnRemoveLabelInput();
170+
initializeDpClient();
162171
}
163172
}
164173

@@ -175,6 +184,12 @@ private void performAssertionOnDLSInput() {
175184
assert labelingAlgorithm != null : "Labeling Strategy cannot be empty";
176185
}
177186

187+
private void performAssertionOnRemoveLabelInput(){
188+
assert dpEndpoint != null : "DLS DP URL cannot be empty";
189+
assert datasetId != null : "Dataset Id cannot be empty";
190+
assert removeLabelPrefix != null : "Remove Label Prefix cannot be empty";
191+
}
192+
178193
private void initializeLabelingStrategy() {
179194
switch (labelingAlgorithm) {
180195
case "FIRST_LETTER_MATCH":
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
package com.oracle.datalabelingservicesamples.requests;
2+
3+
import com.oracle.datalabelingservicesamples.constants.DataLabelingConstants;
4+
5+
public class RemoveLabel {
6+
static {
7+
System.setProperty(DataLabelingConstants.TENANT, DataLabelingConstants.REMOVE_LABEL);
8+
}
9+
}

0 commit comments

Comments
 (0)