- Logging
- [SECURITY] Move to use Open Distro for Elasticsearch 1.13.3 (addresses LOG4J security vulnerability)
**UPDATE: Due to incomplete remediation of the log4j issue, do not use 1.1.3; use a more recent version
-
Overall
- [CHANGE] The ingress sample is deprecated in favor of the TLS sample
- [FIX] The TLS Sample is now more consistent across monitoring/logging and host/path-based ingress
- [FIX] The CloudWatch sample has been updated to support IMDSv2, which is used by viya4-iac-aws
- [CHANGE] Samples have been reviewed and updated as needed for consistency and correctness
-
Monitoring
- No changes this release
-
Logging
- [FEATURE] A new Kibana user
logadm
has been created. This user is intended to be the primary Kibana used for routine day-to-day log monitoring. See The logadm User and Its Access Controls. - [CHANGE] Documentation on security and controling access to log messages has been revised extensively. See Limiting Access to Logs
- [CHANGE] The Event Router component is now deployed to the logging ($LOG_NS) namespace
instead of to the
kube-system
namespace. During upgrades of existing deployments, Event Router will be removed from thekube-system
namespace and redeployed in the logging ($LOG_NS) namespace. - [FIX] Upgrade of an existing deployment using Open Distro for Elasticsearch 1.7.0 to the current release (which uses Open Distro for Elasticsearch 1.13.2) no longer fails.
- [FEATURE] A new Kibana user
**UPDATE: Due to incomplete remediation of the log4j issue, do not use 1.1.2; use a more recent version
-
Overall
- [CHANGE] Samples now use Ingress v1 for Kubernetes 1.22 compatibility
-
Monitoring
- [CHANGE] Monitoring components now use Ingress v1 for Kubernetes 1.22 compatibility
- [FIX] The SAS Jobs dashboards properly handle large numbers of jobs
- [FIX] The network metric recording rule for SAS Jobs has been fixed to support kube-state-metrics 2.x
- [FIX] Using LOG_COLOR_ENABLE=false now shows log levels in output
- [FIX] Deployments without an active TERM now run properly again
- [FIX] Perf/Utilization dashboard metrics display properly again
-
Logging
- [SECURITY] Moved to Open Distro for Elasticsearch 1.13.3 (addresses LOG4J security vulnerability)
- [FEATURE] Access controls supporting a new class of users with access to all log messages are now created during the deployment process.
- [FEATURE] Kibana content in a given directory is loaded as a single 'batch' rather than individually during the deployment process.
- [TASK] Feature-flag logic controlling enablement of Kibana tenant spaces and other application multi-tenancy related capabilities has been removed since these capabilities are no longer optional.
- [FIX] The path-based ingress sample for accessing Kibana after the move to Open Distro for Elasticsearch 1.13.2 now works.
- [FIX] Improvements for handling failures when deploying specific components made in logging deployment scripts
- [FIX] New Fluent Bit configuration setting to prevent "stale" Kubernetes metadata being added to collected log messages.
-
Overall
- [FIX] Running in a non-interactive shell (no
$TERM
) caused automated deployments to fail
- [FIX] Running in a non-interactive shell (no
-
Known Issues
- On Openshift clusters, upgrading an existing deployment using Open Distro for Elasticsearch 1.7.0 to this release (which uses Open Distro for Elasticsearch 1.13.2) fails. Deploying this release onto a new OpenShift cluster is possible.
-
Overall
- [FEATURE] A new flag LOG_VERBOSE_ENABLE is now available to suppress detailed logging during script execution. The default setting of this flag is true.
-
Monitoring
- [CHANGE] Most monitoring component versions have been updated
- kube-prometheus-stack Helm chart upgraded from 15.0.0 to 19.0.3
- Prometheus Operatator upgraded from 0.47.0 to 0.51.2
- Prometheus upgraded from 2.26.1 to 2.30.3
- Alertmanager upgraded from 0.21.0 to 0.23.0
- Grafana upgraded from 7.5.4 to 8.2.1
- Node Exporter upgraded from 1.0.1 to 1.2.2
- kube-state-metrics upgraded from 1.9.8 to 2.2.1
- [FIX] Several dashboards were fixed to adjust to the kube-state-metrics 2.x metrics
- [FIX] The KubeHpaMaxedOut alert has been patched to not fire when max instances == current instances == 1
- [CHANGE] Most monitoring component versions have been updated
-
Logging
-
[CHANGE] Open Distro for Elasticsearch (i.e. Elasticsearch and Kibana) upgraded to version 1.13.2. This includes significant changes to Kibana user-interface, see Important Information About Kibana in the New Release for details.
-
[FEATURE] A significant number of changes to support application multi-tenancy in SAS Viya; including the ability to limit users to log messages from a specific Viya deployment and tenant. See Tenant Logging for details.
-
-
Known Issues
- On Openshift clusters, upgrading an existing deployment using Open Distro for Elasticsearch 1.7.0 to this release (which uses Open Distro for Elasticsearch 1.13.2) fails. Deploying this release onto a new OpenShift cluster is possible.
- Logging
- [FIX] Addressed a serious issue (introduced in Version 1.0.12) that prevented the successful deployment of the logging components when configured using ingress
-
Overall
- [CHANGE] The minimum supported version of OpenShift is now 4.7. OpenShift support itself is still experimental
- [FIX] There is now a check for the presence of the
sha256sum
utility in thePATH
- [FIX] There is now a timeout (default 10 min) when deleting namespaces
using
LOG_DELETE_NAMESPACE_ON_REMOVE
orMON_DELETE_NAMESPACE_ON_REMOVE
The timeout can be set viaKUBE_NAMESPACE_DELETE_TIMEOUT
-
Monitoring
- [FIX] Metrics will be properly collected from the SAS Deployment Operator
- [CHANGE] Internal improvements, refactoring and preparations for future support of application multi-tenancy in SAS Viya
- [FIX] The two SAS Jobs dashboards have been updated and slightly optimized
-
Logging
- [CHANGE] Fluent Bit has been upgraded to version 1.8.7
- [CHANGE] Internal improvements, refactoring and preparations for future support of application multi-tenancy in SAS Viya
-
Monitoring
- [FEATURE] SAS Job dashboards now support a 'queue' filter for SAS Workload Orchestrator
- [FEATURE] SAS Job dashboards 'Job' filter now displays user-provided job names if available
- [DEPRECATION] In the next release, NodePorts will be disabled by default
for Prometheus and AlertManager for security reasons. Set the environment
variable
PROM_NODEPORT_ENABLE=true
to maintain current behavior as it will default to 'false' in the next release
-
Logging
- Internal improvements, refactoring and preparations for application multi- tenancy in SAS Viya
-
Overall
- [FEATURE] The version of
viya4-monitoring-kubernetes
deployed is now saved in-cluster for support purposes
- [FEATURE] The version of
-
Monitoring
- [FIX] Grafana update fails with PVC multi-attach error
-
Logging
- [FEATURE] SAS Update Checker Report added to Kibana
- [FIX] Enabled NodePort for Elasticsearch causes update-in-place to fail
- [FIX] Eventrouter references deprecated version of K8s authorization API
-
Overall
- [FEATURE] OpenShift version checking has been added
- Version 4.6.x is supported
- Version 4.5 or lower generates an error
- Version 4.7 or higher generates a warning
- [FEATURE] Integration with the SAS Viya workload node placement strategy
can be enabled with
NODE_PLACEMENT_ENABLE=true
- [FEATURE] OpenShift: Path-based ingress can be enabled
with
OPENSHIFT_PATH_ROUTES=true
- [FEATURE] OpenShift version checking has been added
-
Monitoring
- [FIX] OpenShift: Some of the Perf dashboards displayed empty charts
- [CHANGE] Prometheus version changed from 2.26.0 to 2.26.1
- [FEATURE] OpenShift: A custom route hostname can be set with
OPENSHIFT_ROUTE_HOST_GRAFANA
- [FIX] The memory limit of the Prometheus Operator has been increased to 1 GiB
-
Logging
- [CHANGE] Fluent Bit has been updated to version 1.7.9
- [FEATURE] Fluent Bit disk buffering is now enabled
- [FIX] Fluent Bit pods were not restarted properly during an upgrade-in-place
- [FIX] OpenShift: Upgrade-in-place now functions properly
- [FEATURE] OpenShift: A custom route hostname can be set with
OPENSHIFT_ROUTE_HOST_KIBANA
andOPENSHIFT_ROUTE_HOST_ELASTICSEARCH
-
Monitoring
- [EXPERIMENTAL] OpenShift automation
- Deployment to OpenShift clusters is now supported via
monitoring/bin/deploy_monitoring_openshift.sh
- OpenShift authentication for Grafana is enabled by default, but can be
disabled using
OPENSHIFT_AUTH_ENABLE=false
- TLS is always enabled for both ingress and in-cluster communication
- OpenShift support is still under development. Usage and features may change until the feature set is finalized.
- Documentation is available in Deploying Monitoring on OpenShift
- Deployment to OpenShift clusters is now supported via
- [FEATURE] The new
NGINX_DASH
environemnt variable now controls whether the NGINX dashboard gets deployed when usingdeploy_monitoring_*.sh
ordeploy_dashboards.sh
.
- [EXPERIMENTAL] OpenShift automation
-
Logging
- [EXPERIMENTAL] OpenShift automation
- Deployment to OpenShift clusters is now supported via
logging/bin/deploy_logging_open_openshift.sh
- OpenShift support is still under development. Usage and features may change until the feature set is finalized.
- Documentation is available in Deploying Log Monitoring on OpenShift
- Deployment to OpenShift clusters is now supported via
- [FEATURE] Container runtimes other than Docker are now supported.
The container runtime is now determined during script execution and
will be used to determine the format of container logs. However,
the
KUBERNETES_RUNTIME_LOGFMT
environment varible can be used to explicitly identify the format of container logs (e.g. docker or cri-o).
- [EXPERIMENTAL] OpenShift automation
-
Overall
- Research was completed that will enable OpenShift support in a future release
-
Monitoring
- [CHANGE] Severtal component versions have been updated
- Grafana: 7.4.1 -> 7.5.4
- Prometheus: 2.24.1 -> 2.26.0
- Prometheus Operator: 0.45.0 -> 0.47.0
- Prometheus Operator Helm Chart: 13.7.2 -> 15.0.0
- kube-state-metrics: 1.9.7 -> 1.9.8
- [FIX] Upgrade-in-place of the Prometheus Pushgateway fails
- [FIX] CAS dashboard: Uptime widget format changed
- [FIX] CAS dashboard: Dashboard errors with some CAS configurations
- Instructions are now available for manual cleanup if the monitoring namespace is deleted instead of running the remove_* scripts
- [CHANGE] Severtal component versions have been updated
-
Logging
- Instructions are now available for manual cleanup if the logging namespace is deleted instead of running the remove_* scripts
- [FIX] The change_internal_password.sh script no longer fails if Helm is not installed (Helm was never required)
-
Overall
- [FEATURE] Custom names for the NGINX controller service are now supported
via the
NGINX_SVCNAME
environment variable (oruser.env
setting). - [CHANGE] Several updates to documentation have been made to improve clarity and organize the content in a more useful way.
- [FEATURE] Custom names for the NGINX controller service are now supported
via the
-
Monitoring
- [FEATURE] There is a new sample that demonstrates how to enable Google Cloud's Operation Suite to collect metrics a Prometheus instance that is scraping metrics from SAS Viya components
- [CHANGE] The Amazon CloudWatch sample has been updated to include many more metrics and mappings. Almost all metrics exposed by SAS Viya and third party components are now mapped properly to sets of dimensions. A new reference documents the metrics by dimention, by source, and by metric name.
-
Logging
- [FIX] Missing Kubernetes metadata on log messages from some pods (inc. CAS
server pod) has been fixed. Prior to fix, the kube.namespace field was set
to
missing_ns
and all otherkube.*
fields were not present.
- [FIX] Missing Kubernetes metadata on log messages from some pods (inc. CAS
server pod) has been fixed. Prior to fix, the kube.namespace field was set
to
-
Overall
- There is a new document discussing support of various Cloud providers
-
Monitoring
- [FEATURE] The
monitoring/bin/deploy_dashboards.sh
script now accepts a file or directory argument to deploy user-provided dashboards - [FEATURE] A new
$USER_DIR/monitoring/dashboards
directory is now supported to supply user-provided dashboards at deployment time - [FEATURE] The new CloudWatch sample provides instructions on configuring the CloudWatch agent to scrape metrics from SAS Viya components
- [FEATURE] The browser-accessible URL for Grafana is now included in
the output of
monitoring/bin/deploy_monitoring_cluster.sh
(including if ingress is configured) - [CHANGE] Several component versions have been upgraded
- Prometheus: v2.23.0 -> v2.24.0
- Grafana: 7.3.6 -> 7.4.1
- Prometheus Operator: 0.44.1 -> 0.45.0
- kube-prometheus-stack: 12.8.0 -> 13.7.2
- [CHANGE] The following optional Grafana plugins are no longer installed by default:
- grafana-piechart-panel
- grafana-clock-panel
- camptocamp-prometheus-alertmanager-datasource
- flant-statusmap-panel
- btplc-status-dot-panel
- [CHANGE] cert-manager resources now use 'v1' to align with their use in SAS Viya 4.x
- [FEATURE] The
-
Logging
- [FEATURE] The browser-accessible URL for Kibana included in the output
of
logging/bin/deploy_logging_open.sh
now takes into account ingress configuration - [EXPERIMENTAL] A new experimental script
logging/bin/getlogs.sh
allows exporting logs to CSV formatDocumentation
- [FIX] The
logging/bin/change_internal_password.sh
script no longer outputs passwords as debug messages
- [FEATURE] The browser-accessible URL for Kibana included in the output
of
-
Overall
- Improved documentation for overall deployment process
- Improved documentation related to use of TLS
- Removed references to TLS in ingress sample (samples/ingress); TLS enabled ingress shown in TLS sample (samples/tls)
-
Monitoring
- [FIX] ENABLE_TLS should set proper port and targetport for v4m-prometheus service
- [FIX] Remove memory limit on kube-state-metrics
- [FIX] Kubernetes Cluster Dashboard disk usage not working on EKS
-
Logging
- Moved Helm chart from deprecated
stable/fluent-bit
tofluent/fluent-bit
- Fluent Bit version upgraded from 1.5.4 to 1.6.10
- Moved Helm chart from deprecated
-
Overall
- Significantly improved documentation for deployment customization
KEEP_TMP_DIR
option added to keep the temporary working directory around for troublshooting purposes- There is now an early check for
kubectl
cluster admin capabilities
-
Monitoring
- Component versions upgraded
- Helm Chart: 11.1.3->12.8.0
- Prometheus Operator: 0.43.2->0.44.1
- Prometheus: v2.22.2-> v2.23.0
- Grafana: 7.3.1->7.3.6
- The application filter on the SAS Java Services dashboard is now sorted
- The Perf/Node Utilization dashboard now uses node names instead of IP addresses to identify nodes
- Component versions upgraded
-
Logging
- Moved Helm chart from deprecated
stable/elasticsearch-exporter
toprometheus-community/elasticsearch-exporter
- Improved handling of log message fragment created due to excessively long log messages (>16KB)
- FIX: Eliminated hard-coded namespace in change_internal_password.sh script
- Moved Helm chart from deprecated
- Fixed breaking script error in TLS
- Minor tweaks to SAS Java Services and Perf/Node Utilitzation dashboards
- Overall
- [BREAKING CHANGE] - The default passwords for both Grafana and Kibana
are now randomly generated by default. The generated password is logged
during the initial deployment. It is possible to explicitly set each
password via environment variables or
user.env
files. - TLS support has been enhanced with improved logging and more accurate
checking of when
cert-manager
is required - Helm 2.x has reached end-of-life and support for it has been removed
- [BREAKING CHANGE] - The default passwords for both Grafana and Kibana
are now randomly generated by default. The generated password is logged
during the initial deployment. It is possible to explicitly set each
password via environment variables or
- Monitoring
- The
KubeHpaMaxedOut
alert has been modified to only trigger if the max replicas is > 1
- The
- Logging
- Refactored deployment/removal scripting internals
- Added new dashboards & visualizations to Kibana
- Added support for non-standard Docker root
This is the first public release.
-
Overall
- Minor edits and cleanup to README files and sample user response files
-
Monitoring
- Grafana version bumped to 7.3.1
- Prometheus Operator version bumped to 0.43.2
- Prometheus version bumped to 2.22.2
- Prometheus Pushgateway version bumped to 1.3.0
-
Overall
- Helm 2.x has reached end-of-life and is no longer supported. Helm 3.x is now required.
- Support added for the SAS Viya Workload Node Placement
- By default, monitoring and logging pods are deployed to untainted nodes
- A new flag,
NODE_PLACEMENT_ENABLE
supports deploying pods to appropriate workload node placement nodes
-
Monitoring
- Several helm charts have moved from stable to prometheus-community.
- The prometheus-operator helm chart has been deprecated and moved to kube-prometheus-stack.
- SAS Java and Go ServiceMonitors converted to PodMonitors to properly support merged services
-
Logging
- Support for SAS Viya move to Crunchy Data 4.5
- Support for changing retention period of log messages
- Node anti-affinity for Elasticsearch replicas
- Support for multi-role Elasticsearch nodes (including sample to demonstrate usage)
- Additional documentation on using TLS
- Removed traces of support for ODFE "demo" security configuration
- Alternate monitoring solution(proof-of-concept): Fluent Bit ==> Azure Monitor (Log Analytics workspace)
- Monitoring
- Support for sas-elasticsearch metric collection
- Refreshed Istio dashboard collection
- Samples refactored out of monitoring/logging directories into a top-level
samples
directory. Additionally, each subdirectory is structured to be compatible withUSER_DIR
customizations. - A new sample,
generic-base
has been created as a template for customization. It contains a full set of user response files available to customize. - Documentation for the samples has been improved
- Logging
- Kubernetes events are now stored in the index associated with the namespace of the source of the event instead of a global (cluster) index
- Multiple fixes to RBAC scripts
- Adjust to breaking Helm 3.3.2 change(Issue #1)
- Refactored samples into a top-level
samples
directory - Force in-cluster TLS for logging (Issue #2)
- Initial versioned release