Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APP-4737 Enhance MDIR with data domains and products #1239

Merged
merged 23 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
8cb0b01
First Iteration of Report using Data Domains and Products
pavanmanishd Jan 21, 2025
82bf772
Added UI support and write to file
pavanmanishd Jan 21, 2025
41c63b2
Merge branch 'main' into APP-4737
pavanmanishd Jan 21, 2025
f78b9a0
Domain Creation and Asset Linking to Data Product
pavanmanishd Jan 21, 2025
9b38c14
Export to excel sheet and remove previous tests
pavanmanishd Jan 21, 2025
e2b9739
Merge branch 'main' into APP-4737
pavanmanishd Jan 21, 2025
4ad3eb0
Remove Domains, SubDomains and Products after tests
pavanmanishd Jan 22, 2025
cbdccb5
Remove unnecessary ui inputs
pavanmanishd Jan 22, 2025
e5d539b
Remove glossary from config
pavanmanishd Jan 22, 2025
cb08cc1
ImportReportTest using Domains and Products
pavanmanishd Jan 22, 2025
156b6e5
Remove unused code
pavanmanishd Jan 22, 2025
035ebca
Formatting Changes
pavanmanishd Jan 22, 2025
aaccdf0
Initial Conversion CSV Test
pavanmanishd Jan 22, 2025
a34b50e
ImpactReportCSVTest for Domains and Products
pavanmanishd Jan 22, 2025
32c02e9
Merge branch 'main' into APP-4737
pavanmanishd Jan 22, 2025
620013a
Unique Domain for each test
pavanmanishd Jan 23, 2025
59e8987
Check subdomains and Removed unique for subdomains
pavanmanishd Jan 27, 2025
a08d221
Check all data products
pavanmanishd Jan 27, 2025
0c2c746
Optimized the retreival with queries and reformatted the code
pavanmanishd Jan 28, 2025
e13159d
Added Queries instead of loops for effiecient reterival
pavanmanishd Jan 29, 2025
044ec89
Merge branch 'main' into APP-4737
cmgrote Jan 29, 2025
4423264
Removed unused code, optimized some queries
cmgrote Jan 29, 2025
d1b67ed
Wait until all the datadomains are resolved
pavanmanishd Jan 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ data class MetadataImpactReportCfg(
@JsonProperty("include_glossary") val includeGlossary: String = "TRUE",
@JsonProperty("glossary_name") val glossaryName: String = "Metadata metrics",
pavanmanishd marked this conversation as resolved.
Show resolved Hide resolved
@JsonProperty("include_details") val includeDetails: Boolean = false,
@JsonProperty("include_data_products") val includeDataProducts: String = "TRUE",
@JsonProperty("data_domain") val dataDomain: String = "Metadata metrics",
@JsonProperty("file_format") val fileFormat: String = "XLSX",
@JsonProperty("delivery_type") val deliveryType: String = "DIRECT",
@JsonProperty("email_addresses") val emailAddresses: String? = null,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ package com.atlan.pkg.mdir
import MetadataImpactReportCfg
import com.atlan.AtlanClient
import com.atlan.exception.NotFoundException
import com.atlan.model.assets.DataDomain
import com.atlan.model.assets.DataProduct
import com.atlan.model.assets.Glossary
import com.atlan.model.assets.GlossaryCategory
import com.atlan.model.assets.GlossaryTerm
Expand Down Expand Up @@ -90,6 +92,13 @@ object Reporter {
CAT_ADOPTION to "**Metrics that can be used to monitor Atlan's adoption within your organization.** You may want to consider these alongside some of the headline numbers to calculate percentages of enrichment points that are important to your organization.",
)

val SUBDOMAINS =
mapOf(
CAT_HEADLINES to "**Metrics that break down Atlan-managed assets as overall numbers.** These are mostly useful to contextualize the overall asset footprint of your data ecosystem.",
CAT_SAVINGS to "**Metrics that can be used to discover potential cost savings.** These are areas you may want to investigate for cost savings, though there are caveats with each one that are worth reviewing to understand potential limitations.",
CAT_ADOPTION to "**Metrics that can be used to monitor Atlan's adoption within your organization.** You may want to consider these alongside some of the headline numbers to calculate percentages of enrichment points that are important to your organization.",
)
pavanmanishd marked this conversation as resolved.
Show resolved Hide resolved

private val reports =
listOf(
AUM::class.java,
Expand Down Expand Up @@ -133,14 +142,23 @@ object Reporter {
Paths.get(filePath).toFile().createNewFile()
}

val glossary =
if (ctx.config.includeGlossary == "TRUE") {
createGlossaryIdempotent(ctx.client, ctx.config.glossaryName)
val domain =
if (ctx.config.includeDataProducts == "TRUE") {
createDomainIdempotent(ctx.client, ctx.config.dataDomain)
} else {
null
}
val categoryNameToGuid = createCategoriesIdempotent(ctx.client, glossary)
val fileOutputs = runReports(ctx, outputDirectory, batchSize, glossary, categoryNameToGuid)
val subdomainNameToGuid = createSubDomainsIdempotent(ctx.client, domain)
val fileOutputs = runReports(ctx, outputDirectory, batchSize, domain, subdomainNameToGuid)

// val glossary =
// if (ctx.config.includeGlossary == "TRUE") {
// createGlossaryIdempotent(ctx.client, ctx.config.glossaryName)
// } else {
// null
// }
// val categoryNameToGuid = createCategoriesIdempotent(ctx.client, glossary)
// val fileOutputs = runReports(ctx, outputDirectory, batchSize, glossary, categoryNameToGuid)
pavanmanishd marked this conversation as resolved.
Show resolved Hide resolved

when (ctx.config.deliveryType) {
"EMAIL" -> {
Expand Down Expand Up @@ -377,4 +395,203 @@ object Reporter {
batch?.close()
}
}

private fun createDomainIdempotent(
client: AtlanClient,
domainName: String,
): DataDomain =
try {
DataDomain.findByName(client, domainName)[0]!!
pavanmanishd marked this conversation as resolved.
Show resolved Hide resolved
} catch (e: NotFoundException) {
val create = DataDomain.creator(domainName).build()
val response = create.save(client)
response.getResult(create)
}

private fun createSubDomainsIdempotent(
client: AtlanClient,
domain: DataDomain?,
): Map<String, String> {
if (domain == null) return emptyMap()
val nameToResolved = mutableMapOf<String, String>()
val placeholderToName = mutableMapOf<String, String>()
AssetBatch(client, 20).use { batch ->
SUBDOMAINS.forEach { (name, description) ->
val builder =
try {
val found = DataDomain.findByName(client, name)[0]
pavanmanishd marked this conversation as resolved.
Show resolved Hide resolved
found.trimToRequired().guid(found.guid)
} catch (e: NotFoundException) {
DataDomain.creator(name, domain.qualifiedName)
}
val subdomain = builder.description(description).build()
placeholderToName[subdomain.guid] = name
batch.add(subdomain)
}
batch.flush()
placeholderToName.forEach { (guid, name) ->
val resolved = batch.resolvedGuids.getOrDefault(guid, guid)
nameToResolved[name] = resolved
}
}
return nameToResolved
}

private fun runReports(
ctx: PackageContext<MetadataImpactReportCfg>,
outputDirectory: String,
batchSize: Int = 300,
domain: DataDomain? = null,
subdomainNameToGuid: Map<String, String>? = null,
): List<String> {
if (ctx.config.fileFormat == "XLSX") {
val outputFile = "$outputDirectory${File.separator}mdir.xlsx"
ExcelWriter(outputFile).use { xlsx ->
val overview = xlsx.createSheet("Overview")
overview.writeHeader(
mapOf(
"Metric" to "",
"Description" to "",
"Result" to "Numeric result for the metric",
"Caveats" to "Any caveats to be aware of with the metric",
"Notes" to "Any other information to be aware of with the metric",
// "Percentage" to "Percentage of total for the metric",
),
)
reports.forEach { repClass ->
val metric = Metric.get(repClass, ctx.client, batchSize, logger)
outputReportDomain(ctx, metric, overview, xlsx.createSheet(metric.getShortName()), batchSize, domain, subdomainNameToGuid)
}
}
return listOf(outputFile)
} else {
val overviewFile = "$outputDirectory${File.separator}${CSV_FILES["overview"]}"
val outputFiles = mutableListOf<String>()
CSVWriter(overviewFile).use { overview ->
overview.writeHeader(
mapOf(
"Metric" to "",
"Description" to "",
"Result" to "Numeric result for the metric",
"Caveats" to "Any caveats to be aware of with the metric",
"Notes" to "Any other information to be aware of with the metric",
// "Percentage" to "Percentage of total for the metric",
),
)
reports.forEach { repClass ->
val metric = Metric.get(repClass, ctx.client, batchSize, logger)
val metricFile = "$outputDirectory${File.separator}${CSV_FILES[metric.getShortName()]}"
CSVWriter(metricFile).use { details ->
outputReportDomain(ctx, metric, overview, details, batchSize, domain, subdomainNameToGuid)
}
outputFiles.add(metricFile)
}
}
return outputFiles
}
}

private fun outputReportDomain(
ctx: PackageContext<MetadataImpactReportCfg>,
metric: Metric,
overview: TabularWriter,
details: TabularWriter,
batchSize: Int,
domain: DataDomain? = null,
subdomainNameToGuid: Map<String, String>? = null,
) {
logger.info { "Quantifying metric: ${metric.name} ..." }
val quantified = metric.quantify()
val product =
if (ctx.config.includeDataProducts == "TRUE") {
writeMetricToDomain(ctx.client, metric, quantified, domain!!, subdomainNameToGuid!!)
} else {
null
}
writeMetricToFile(ctx.client, metric, quantified, overview, details, ctx.config.includeDetails, product, batchSize)
}

private fun writeMetricToDomain(
client: AtlanClient,
metric: Metric,
quantified: Double,
domain: DataDomain,
subdomainNameToGuid: Map<String, String>,
): DataProduct {
val builder =
try {
DataProduct.findByName(client, metric.name)[0]!!.trimToRequired()
pavanmanishd marked this conversation as resolved.
Show resolved Hide resolved
} catch (e: NotFoundException) {
val qualifiedName = DataDomain.findByName(client, metric.category)[0]!!.qualifiedName
pavanmanishd marked this conversation as resolved.
Show resolved Hide resolved
DataProduct.creator(client, metric.name, qualifiedName, metric.query().build())
}
val prettyQuantity = NumberFormat.getNumberInstance(Locale.US).format(quantified)
if (metric.caveats.isNotBlank()) {
builder
.announcementType(AtlanAnnouncementType.WARNING)
.announcementTitle("Caveats")
.announcementMessage(metric.caveats)
.certificateStatus(CertificateStatus.DRAFT)
} else {
builder.certificateStatus(CertificateStatus.VERIFIED)
}
if (metric.notes.isNotBlank()) {
builder
.announcementType(AtlanAnnouncementType.INFORMATION)
.announcementTitle("Note")
.announcementMessage(metric.notes)
}

val product =
builder
.displayName(metric.displayName)
.description(metric.description)
.certificateStatusMessage(prettyQuantity)
.build()
val response = product.save(client)
return response.getResult(product) ?: product.trimToRequired().guid(response.getAssignedGuid(product)).build()
}

private fun writeMetricToFile(
client: AtlanClient,
metric: Metric,
quantified: Double,
overview: TabularWriter,
details: TabularWriter,
includeDetails: Boolean,
product: DataProduct?,
batchSize: Int,
) {
overview.writeRecord(
listOf(
metric.name,
metric.description,
quantified,
metric.caveats,
metric.notes,
),
)
if (includeDetails) {
val batch =
if (product != null) {
AssetBatch(
client,
batchSize,
false,
AssetBatch.CustomMetadataHandling.IGNORE,
true,
false,
false,
false,
AssetCreationHandling.FULL,
false,
)
} else {
null
}
metric.outputDetailedRecords(details, product, batch)
batch?.flush()
batch?.close()
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ package com.atlan.pkg.mdir.metrics

import com.atlan.AtlanClient
import com.atlan.model.assets.Asset
import com.atlan.model.assets.DataProduct
import com.atlan.model.assets.GlossaryTerm
import com.atlan.model.search.AggregationBucketResult
import com.atlan.model.search.FluentSearch.FluentSearchBuilder
Expand Down Expand Up @@ -96,6 +97,21 @@ abstract class Metric(
}
}

fun outputDetailedRecords(
writer: TabularWriter,
product: DataProduct?,
batch: AssetBatch?,
) {
val header = getDetailedHeader()
if (header.isNotEmpty()) {
writer.writeHeader(header)
query().stream().forEach { asset ->
val row = getDetailedRecord(asset)
writer.writeRecord(row)
}
}
}
cmgrote marked this conversation as resolved.
Show resolved Hide resolved

/**
* Query that defines the results for this particular report.
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,24 @@ uiConfig {
helpText = "Whether to include detailed results (Yes), or only the headline metrics (No) in the Excel file produced."
fallback = false
}
["include_data_products"] = new Radio {
title = "Include data products"
pavanmanishd marked this conversation as resolved.
Show resolved Hide resolved
required = true
possibleValues {
["TRUE"] = "Yes"
["FALSE"] = "No"
}
default = "TRUE"
helpText = "Whether to include data products in the report."
fallback = default
}
["data_domain"] = new TextInput {
title = "Data domain"
required = true
helpText = "Name of the data domain to which the metadata metrics belong."
placeholderText = "Metadata metrics"
fallback = placeholderText
}
}
}
["Delivery"] {
Expand Down
Loading
Loading