Skip to content

Commit 144db11

Browse files
shirshankachakru-r
authored andcommitted
feat(sdk): structured properties - add support for listing (datahub-project#12283)
1 parent 4960b4a commit 144db11

File tree

6 files changed

+663
-34
lines changed

6 files changed

+663
-34
lines changed

docs/api/tutorials/structured-properties.md

+296-33
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ import TabItem from '@theme/TabItem';
66
## Why Would You Use Structured Properties?
77

88
Structured properties are a structured, named set of properties that can be attached to logical entities like Datasets, DataJobs, etc.
9-
Structured properties have values that are types. Conceptually, they are like “field definitions”.
9+
Structured properties have values that are typed and support constraints.
1010

1111
Learn more about structured properties in the [Structured Properties Feature Guide](../../../docs/features/feature-guides/properties/overview.md).
1212

@@ -15,6 +15,7 @@ Learn more about structured properties in the [Structured Properties Feature Gui
1515

1616
This guide will show you how to execute the following actions with structured properties.
1717
- Create structured properties
18+
- List structured properties
1819
- Read structured properties
1920
- Delete structured properties
2021
- Add structured properties to a dataset
@@ -32,7 +33,8 @@ Additionally, you need to have the following tools installed according to the me
3233
<Tabs>
3334
<TabItem value="CLI" label="CLI" default>
3435

35-
Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding DataHub Cloud release version is `v0.2.16.5`
36+
Install the relevant CLI version.
37+
Structured Properties were introduced in version `0.13.1`, but we continuously improve and add new functionality, so you should always [upgrade](https://datahubproject.io/docs/cli/#installation) to the latest cli for best results.
3638
Connect to your instance via [init](https://datahubproject.io/docs/cli/#init):
3739

3840
- Run `datahub init` to update the instance you want to load into.
@@ -56,33 +58,8 @@ Requirements for OpenAPI are:
5658
The following code will create a structured property `io.acryl.privacy.retentionTime`.
5759

5860
<Tabs>
59-
<TabItem value="graphql" label="graphQL" default>
6061

61-
```graphql
62-
mutation createStructuredProperty {
63-
createStructuredProperty(
64-
input: {
65-
id: "retentionTime",
66-
qualifiedName:"retentionTime",
67-
displayName: "Retention Time",
68-
description: "Retention Time is used to figure out how long to retain records in a dataset",
69-
valueType: "urn:li:dataType:datahub.number",
70-
allowedValues: [
71-
{numberValue: 30, description: "30 days, usually reserved for datasets that are ephemeral and contain pii"},
72-
{numberValue: 90, description:"description: Use this for datasets that drive monthly reporting but contain pii"},
73-
{numberValue: 365, description:"Use this for non-sensitive data that can be retained for longer"}
74-
],
75-
cardinality: SINGLE,
76-
entityTypes: ["urn:li:entityType:datahub.dataset", "urn:li:entityType:datahub.dataFlow"],
77-
}
78-
) {
79-
urn
80-
}
81-
}
82-
```
83-
84-
</TabItem>
85-
<TabItem value="CLI" label="CLI">
62+
<TabItem value="CLI" label="CLI" default>
8663

8764
Create a yaml file representing the properties you’d like to load.
8865
For example, below file represents a property `io.acryl.privacy.retentionTime`. You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/struct_props.yaml).
@@ -108,13 +85,41 @@ For example, below file represents a property `io.acryl.privacy.retentionTime`.
10885
```
10986
11087
Use the CLI to create your properties:
111-
```commandline
88+
```shell
11289
datahub properties upsert -f {properties_yaml}
11390
```
11491

11592
If successful, you should see `Created structured property urn:li:structuredProperty:...`
11693

11794
</TabItem>
95+
96+
<TabItem value="Graphql" label="GraphQL" default>
97+
98+
```graphql
99+
mutation createStructuredProperty {
100+
createStructuredProperty(
101+
input: {
102+
id: "retentionTime",
103+
qualifiedName:"retentionTime",
104+
displayName: "Retention Time",
105+
description: "Retention Time is used to figure out how long to retain records in a dataset",
106+
valueType: "urn:li:dataType:datahub.number",
107+
allowedValues: [
108+
{numberValue: 30, description: "30 days, usually reserved for datasets that are ephemeral and contain pii"},
109+
{numberValue: 90, description:"description: Use this for datasets that drive monthly reporting but contain pii"},
110+
{numberValue: 365, description:"Use this for non-sensitive data that can be retained for longer"}
111+
],
112+
cardinality: SINGLE,
113+
entityTypes: ["urn:li:entityType:datahub.dataset", "urn:li:entityType:datahub.dataFlow"],
114+
}
115+
) {
116+
urn
117+
}
118+
}
119+
```
120+
121+
</TabItem>
122+
118123
<TabItem value="OpenAPI v2" label="OpenAPI v2">
119124

120125
```shell
@@ -236,9 +241,182 @@ Example Response:
236241
</TabItem>
237242
</Tabs>
238243

239-
## Read Structured Properties
244+
## List Structured Properties
245+
246+
You can list all structured properties in your DataHub instance using the following methods:
247+
248+
<Tabs>
249+
<TabItem value="CLI" label="CLI" default>
250+
251+
```shell
252+
datahub properties list
253+
```
254+
255+
This will show all properties with their full details.
256+
257+
Example Response:
258+
```json
259+
{
260+
"urn": "urn:li:structuredProperty:clusterName",
261+
"qualified_name": "clusterName",
262+
"type": "urn:li:dataType:datahub.string",
263+
"description": "Test Cluster Name Property",
264+
"display_name": "Cluster's name",
265+
"entity_types": [
266+
"urn:li:entityType:datahub.dataset"
267+
],
268+
"cardinality": "SINGLE"
269+
}
270+
{
271+
"urn": "urn:li:structuredProperty:projectNames",
272+
"qualified_name": "projectNames",
273+
"type": "urn:li:dataType:datahub.string",
274+
"description": "Test property for project name",
275+
"display_name": "Project Name",
276+
"entity_types": [
277+
"urn:li:entityType:datahub.dataset",
278+
"urn:li:entityType:datahub.dataFlow"
279+
],
280+
"cardinality": "MULTIPLE",
281+
"allowed_values": [
282+
{
283+
"value": "Tracking",
284+
"description": "test value 1 for project"
285+
},
286+
{
287+
"value": "DataHub",
288+
"description": "test value 2 for project"
289+
}
290+
]
291+
}
292+
```
293+
294+
295+
If you only want to see the URNs, you can use:
296+
297+
```shell
298+
datahub properties list --no-details
299+
```
300+
301+
Example Response:
302+
```
303+
[2025-01-08 22:23:00,625] INFO {datahub.cli.specific.structuredproperties_cli:134} - Listing structured property urns only, use --details for more information
304+
urn:li:structuredProperty:clusterName
305+
urn:li:structuredProperty:clusterType
306+
urn:li:structuredProperty:io.acryl.dataManagement.deprecationDate
307+
urn:li:structuredProperty:projectNames
308+
```
309+
310+
To download all the structured property definitions into a single file that you can use with the `upsert` command as described in the [create section](#create-structured-properties), you can run the list command with the `--to-file` option.
311+
312+
```shell
313+
datahub properties list --to-file structured_properties.yaml
314+
```
315+
316+
Example Response:
317+
```yaml
318+
- urn: urn:li:structuredProperty:clusterName
319+
qualified_name: clusterName
320+
type: urn:li:dataType:datahub.string
321+
description: Test Cluster Name Property
322+
display_name: Cluster's name
323+
entity_types:
324+
- urn:li:entityType:datahub.dataset
325+
cardinality: SINGLE
326+
- urn: urn:li:structuredProperty:clusterType
327+
qualified_name: clusterType
328+
type: urn:li:dataType:datahub.string
329+
description: Test Cluster Type Property
330+
display_name: Cluster's type
331+
entity_types:
332+
- urn:li:entityType:datahub.dataset
333+
cardinality: SINGLE
334+
- urn: urn:li:structuredProperty:io.acryl.dataManagement.deprecationDate
335+
qualified_name: io.acryl.dataManagement.deprecationDate
336+
type: urn:li:dataType:datahub.date
337+
display_name: Deprecation Date
338+
entity_types:
339+
- urn:li:entityType:datahub.dataset
340+
- urn:li:entityType:datahub.dataFlow
341+
- urn:li:entityType:datahub.dataJob
342+
- urn:li:entityType:datahub.schemaField
343+
cardinality: SINGLE
344+
- urn: urn:li:structuredProperty:io.acryl.privacy.enumProperty5712
345+
qualified_name: io.acryl.privacy.enumProperty5712
346+
type: urn:li:dataType:datahub.string
347+
description: The retention policy for the dataset
348+
entity_types:
349+
- urn:li:entityType:datahub.dataset
350+
cardinality: MULTIPLE
351+
allowed_values:
352+
- value: foo
353+
- value: bar
354+
... etc.
355+
```
356+
357+
</TabItem>
358+
359+
<TabItem value="OpenAPI v3" label="OpenAPI v3">
360+
361+
Example Request:
362+
```bash
363+
curl -X 'GET' \
364+
'http://localhost:9002/openapi/v3/entity/structuredproperty?systemMetadata=false&includeSoftDelete=false&skipCache=false&aspects=structuredPropertySettings&aspects=propertyDefinition&aspects=institutionalMemory&aspects=structuredPropertyKey&aspects=status&count=10&sortCriteria=urn&sortOrder=ASCENDING&query=*' \
365+
-H 'accept: application/json'
366+
```
367+
368+
Example Response:
369+
```json
370+
{
371+
"scrollId": "...",
372+
"entities": [
373+
{
374+
"urn": "urn:li:structuredProperty:clusterName",
375+
"propertyDefinition": {
376+
"value": {
377+
"immutable": false,
378+
"qualifiedName": "clusterName",
379+
"displayName": "Cluster's name",
380+
"valueType": "urn:li:dataType:datahub.string",
381+
"description": "Test Cluster Name Property",
382+
"entityTypes": [
383+
"urn:li:entityType:datahub.dataset"
384+
],
385+
"cardinality": "SINGLE"
386+
}
387+
},
388+
"structuredPropertyKey": {
389+
"value": {
390+
"id": "clusterName"
391+
}
392+
}
393+
}
394+
]
395+
}
396+
```
397+
398+
Key Query Parameters:
399+
- `count`: Number of results to return per page (default: 10)
400+
- `sortCriteria`: Field to sort by (default: urn)
401+
- `sortOrder`: Sort order (ASCENDING or DESCENDING)
402+
- `query`: Search query to filter properties (* for all)
403+
404+
</TabItem>
405+
</Tabs>
406+
407+
The list endpoint returns all structured properties in your DataHub instance. Each property includes:
408+
- URN: Unique identifier for the property
409+
- Qualified Name: The property's qualified name
410+
- Type: The data type of the property (string, number, date, etc.)
411+
- Description: A description of the property's purpose
412+
- Display Name: Human-readable name for the property
413+
- Entity Types: The types of entities this property can be applied to
414+
- Cardinality: Whether the property accepts single (SINGLE) or multiple (MULTIPLE) values
415+
- Allowed Values: If specified, the list of allowed values for this property
240416

241-
You can see the properties you created by running the following command:
417+
## Read a single Structured Property
418+
419+
You can read an individual property you created by running the following command:
242420

243421
<Tabs>
244422
<TabItem value="CLI" label="CLI" default>
@@ -279,6 +457,91 @@ If successful, you should see metadata about your properties returned.
279457
}
280458
```
281459

460+
</TabItem>
461+
<TabItem value="GraphQL" label="GraphQL">
462+
463+
Example Request:
464+
```graphql
465+
query {
466+
structuredProperty(urn: "urn:li:structuredProperty:projectNames") {
467+
urn
468+
type
469+
definition {
470+
qualifiedName
471+
displayName
472+
description
473+
cardinality
474+
allowedValues {
475+
value {
476+
... on StringValue {
477+
stringValue
478+
}
479+
... on NumberValue {
480+
numberValue
481+
}
482+
}
483+
description
484+
}
485+
entityTypes {
486+
urn
487+
info {
488+
type
489+
qualifiedName
490+
}
491+
}
492+
}
493+
}
494+
}
495+
```
496+
497+
Example Response:
498+
```json
499+
{
500+
"data": {
501+
"structuredProperty": {
502+
"urn": "urn:li:structuredProperty:projectNames",
503+
"type": "STRUCTURED_PROPERTY",
504+
"definition": {
505+
"qualifiedName": "projectNames",
506+
"displayName": "Project Name",
507+
"description": "Test property for project name",
508+
"cardinality": "MULTIPLE",
509+
"allowedValues": [
510+
{
511+
"value": {
512+
"stringValue": "Tracking"
513+
},
514+
"description": "test value 1 for project"
515+
},
516+
{
517+
"value": {
518+
"stringValue": "DataHub"
519+
},
520+
"description": "test value 2 for project"
521+
}
522+
],
523+
"entityTypes": [
524+
{
525+
"urn": "urn:li:entityType:datahub.dataset",
526+
"info": {
527+
"type": "DATASET",
528+
"qualifiedName": "datahub.dataset"
529+
}
530+
},
531+
{
532+
"urn": "urn:li:entityType:datahub.dataFlow",
533+
"info": {
534+
"type": "DATA_FLOW",
535+
"qualifiedName": "datahub.dataFlow"
536+
}
537+
}
538+
]
539+
}
540+
}
541+
},
542+
"extensions": {}
543+
}
544+
```
282545
</TabItem>
283546

284547
<TabItem value="OpenAPI v2" label="OpenAPI v2">
@@ -389,7 +652,7 @@ Example Response:
389652
This action will set/replace all structured properties on the entity. See PATCH operations to add/remove a single property.
390653

391654
<Tabs>
392-
<TabItem value="graphQL" label="GraphQL" default>
655+
<TabItem value="GraphQL" label="GraphQL" default>
393656

394657
```graphql
395658
mutation upsertStructuredProperties {
@@ -537,7 +800,7 @@ datahub dataset get --urn {urn}
537800
For reading all structured properties from a dataset:
538801

539802
<Tabs>
540-
<TabItem value="graphql" label="GraphQL" default>
803+
<TabItem value="Graphql" label="GraphQL" default>
541804

542805
```graphql
543806
query getDataset {

0 commit comments

Comments
 (0)