Skip to content

Conversation

@mmnelemane
Copy link

No description provided.

Rick Salevsky and others added 30 commits May 2, 2018 13:29
rabbitmq: create empty users list which is expected by some recipes
…ort_on_startup_backport

[4.0] rabbitmq: block client port on startup
[4.0] tempest: remove world-readable permission from tempest.conf
[4.0] keystone: Add retry loop to _get_token (bsc#1087466)
various encloud related backports for SOC7
(cherry picked from commit fd3505c)
The "always" setting has dubious negative performance impact
and the SLES default is "madvise", so we should be using that instead.

See https://www.kernel.org/doc/Documentation/vm/transhuge.txt for
details.

(cherry picked from commit f5bb2c5)
[4.0] transparent hugetable fixes (backport of PR 1659)
…ck_client_port_on_startup_backport

Revert "[4.0] rabbitmq: block client port on startup"
…829)

Issuing a new keystone token immediately after updating the
admin user password may sometimes return an invalid token.

In the context of crowbar, this issue can be triggered when calling
the keystone_register 'wakeup' action immediately after the admin
password has been updated. When triggered, it results in timeout errors
on non-founder nodes, while the founder node is stuck doing retry
iterations with an expired token.

As a workaround for bsc#1091829, the 'wakeup' action is updated with
an optional 'reissue_token_on_error' argument, which, when set, will
re-issue a token *before* checking the keystone API again, instead of
reusing the same token for subsequent attempts.

(cherry picked from commit 3d664ed)
[4.0] keystone: avoid race condition during admin password change
because it defaults to off
but a lot of people rely on nested virt being available

While in https://fate.suse.com/320082 the virtualisation team
declined to promote nested virt to fully supported status for SLE12,
we are using this since 2012 in all kinds of places without problems.

(cherry picked from commit afbcc5c)
nova: allow to enable nested virt on Intel
Disabled by default default. It can be set to avoid filling up image
related tables.

Though the tables are only filled by POST requests this limit is for all
request types.

See https://wiki.openstack.org/wiki/OSSN/OSSN-0076 for details.

(cherry picked from commit c4c1b8e)
Backport of crowbar#1677
…ate-limit

[4.0] Add rate limiting for glance api (bsc#1005886)
The alarm_history_ttl config option for aodh was
not previously configurable.

(cherry picked from commit 47a7d26)
aodh: Add config for alarm_history_ttl (bsc#1073703)
This commit adds various new elasticsearch tunables and passes them to
monasca-installer.

(cherry picked from commit a91f9c5)
[4.0] mariadb: Remove redundant config values
[4.0] mariadb: Add prefix to configs
The correct field name for the Ceph cluster name is cephfs_cluster_name,
correct it for the custom view so configurations using CephFS can be
successfully applied.

(cherry picked from commit 72a0f7f)
As the resource agent for rabbitmq with cluster HA restart the rabbitmq
service several times, the current check can fail to validate rabbitmq
status, as it could do the check just on one of those times that rabbit
is up while creating/joining the cluster. Then if the check passed and
continued the chef execution, the next steps could fail as they are
dependant on having a running rabbitmq, while the rabbitmq server may
still be restarting.

Instead expand the checks to first look for a rabbit master for the
resource and expand the check for a local runing rabbit to make sure
we are checking for the local copy. Also add an extra check after
the crm checks to make sure there are no pending operations for the
resource so we can try to avoid continuing if there is a promotion
going on.

(cherry picked from commit 3060a3e)
…h-field-4.0

manila: Correct field name for cluster name
As the other checks are not enough, as pacemaker keeps restarting
rabbitmq, we need a more robust way of checking that rabbit has entered
an stable situation.

So check that rabbit is up 5 times in a row with a delay of 2 seconds
between checks to make sure pacemaker has left it alone.

Also, only trigger that check for rabbit if the pacemaker_transaction is
updated, otherwise there is no need to do so

(cherry picked from commit 8b56894)
This commit improves the execution of monasca-installer in various ways:

* Run monasca-installer from dedicated wrapper script
* Determine whether to run monasca-installer in wrapper script
* Signal changed resources by deleting wrapper script's version information
  file (causes a re-run)
* Add time stamps to /var/log/monasca-installer.log

(cherry picked from commit 5e05e24)
…nstaller

monasca: various monasca-installer improvements
[4.0]  rabbitmq: Make sure rabbitmq is running on cluster HA
apache2 reload causes responses 406 from keystone

bso#1083093

(cherry picked from commit cfda234)
Such scenario is not supported by non-disruptive upgrade.
User can still place the role to compute nodes manually, this only
affects automatic allocation.
toabctl and others added 26 commits May 15, 2019 13:34
It is sometimes useful to be able to add extra options (like
gcache.size, gcs.fc_debug, ...) to the galera wsrep_provider_options
configuration variable.
This can now be done via the Crowbar RAW view.
Also make (the currently hardcoded) "gcs.fc_limit" multiplier
configurable.
Also make (the currently hardcoded) "gcs.fc_factor"
configurable.

(cherry picked from commit 94eb559)
[4.0] database: Raise and align promote/demote timeouts (bsc#1131791)
[4.0] database: Make wsrep_provider_options configurable (fate#327745)
[4.0] Fix neutron-ha-tool restart on configuration changes
In shared storage based HA setup, rabbitmq uses fixed uid/gid=91.
This user/group modification was done after (optional) SSL certificate
generation. The ACLs on the SSL key were incorrect making rabbitmq
unable to start because with EACCESS errors.
The sync failed when certs and/or keys were located in non-default paths.
…se correct paths when syncing certs (crowbar#2146)

[4.0] rabbitmq: Fix ACL of SSL key after uid/gid change + keystone: Use correct paths when syncing certs
In some cases the flavor create call succeeds but client still returns
non-zero status. Retries of the create call fail with "Flavor already
exists" and the retry loop never succeeds. Added check is executed in
every loop turn and will stop reytring if the flavor already exists.

Example scenario where flavor might be correctly created but client
doesn't return zero is when one of HA nodes executes flavor create
commands while others perform delayed restart of nova API after config
files are modified. If the "create" request hits the API just before
restart it could be accepted but the client might not get the correct
response back.

(cherry picked from commit 8085fb9)
l3-ha service doesn't use .openrc so it doesn't need to be restarted
when that file is modified.

(cherry picked from commit 56e4bed)
This openrc should no longer be needed now that we have the neutron
l3-ha-service which consumes its configuration from a yaml file.

(cherry picked from commit aa20716)
The agent status is anyway not very quickly updated, and if
we retry too quickly we still might not have the underlying problem
fixes. Lets be more conservative and check every 30s instead of 10s.

(cherry picked from commit 59672c8)
[4.0] nova: Don't retry creating existing flavors
[4.0] neutron: Don't restart l3-ha on .openrc change + timeout extension
This patch allows the user to change
the ovs inactivity_probe timeout from the neutron barclamp, in the 'raw'
view. Previously, this value was always set to the OVS default, 5.

It provides crowbar support for this upstream patch:

https://review.opendev.org/#/c/663024/
[4.0] Make ovs of_inactivity_probe configurable from neutron barclamp
Creating magnum flavors is done as a delayed action, alongside
all other delayed actions, such as restarting services.
If one of those restarted services is apache (e.g. because it
was triggered by another barclamp configuration change), then
keystone and other API services might not be available right
away, in which case the magnum flavor creation will fail.

Re-attempting magnum flavor creation fixes this issue.
[4.0] magnum: retry flavor creation (SOC-9991)
Creating the magnum image is done as a delayed action, alongside
all other delayed actions, such as restarting services.
If one of those restarted services is apache (e.g. because it
was triggered by another barclamp configuration change), then
keystone and other API services might not be available right
away, in which case the magnum image creation will fail.

Re-attempting magnum image creation fixes this issue.
[4.0] magnum: retry magnum image creation (SOC-10015)
In some cases, VMs my contain more threads than permissible by
the systemd set default of 16000. In this case, this tuneable
makes it possible to set a higher limit in qemu.conf through
the Nova barclamp.

(cherry picked from commit a62fd70)

Backport changes: migration renamed to 126_add_max_threads.rb and schema
                  revision adjusted accordingly.
SOC-10001: add max_threads_per_process tuneable
It was replaced with dns_domain in 9e96051,
but newton neutron dhcp needs dhcp_domain.
neutron: restore dhcp_domain in stable/4.0 (bsc#1145867)
Sometimes there is a race condition and ohai didn't collect the ruby
version. to_f evalutes then the version to 0.0 and zypper fails to
install the rubygem `ruby0.0-rubygem-cstruct' not found in package
names`.

(cherry picked from commit 6c02091)
[4.0] database: Hardcode ruby version for package installation (SOC-10010)
This commit provides changes in plugin packages and config files
needed for integration of SOC with ACI 4.1 and higher versions.
ACI 4.1 uses a slightly different set of plugin packages and configs
for integration with OpenStack. This includes:
 - python-gbpclient renamed to python-group-based-policy-client
 - ovs-bridge-name in opflex-agent-ovs.conf removed
 - addition of int-bridge-name and access-bridge-name in opflex-agent-ovs.conf
 - Renaming of agent-ovs to opflex-agent
For uniformity, the template for opflex-agent-ovs.conf is now renamed
from 10-opflex-agent-ovs.conf.erb to opflex-agent-ovs.conf.erb
- The neutron template schema and json templates are updated to provide
integration_bridge and access_bridge details with default values. The
corresponding migration scripts are also updated.

(cherry picked from commit cb5347d)
@mmnelemane mmnelemane changed the base branch from master to stable/4.0 September 6, 2019 14:31
@mmnelemane mmnelemane force-pushed the aci_4.1_soc7_multivmm branch from 5d0b3ad to 4ee7a94 Compare September 13, 2019 12:15
A Single ACI fabric can support multiple VMM domains. Each VMM domain
can be governed by a different controller (Eg: VMWare vCenter or
OpenStack or MicroSoft SCVMM). Several production data centers tend
to use multiple VMM domains and expect to be able to monitor and
control network policies from a single ACI fabric. Integration of
OpenStack with such a setup requires crowbar to provide parameters
specific to each VMM domain. This commit adds the additional
parameters and logic to validate and send these to the correct
config location. The changes now allow to provide "Vmware" or
"OpenStack" as the VMM type. Multiple entries of either types
are possible.

- Also added "ssl_mode" as a configurable parameter which is
needed to be in "encrypted" mode if ESXi is used as compute.
Other use-cases may need to change it as required and hence
included it as a configurable parameter within the opflex
node structure.

(cherry picked from commit 1f16436)
@mmnelemane mmnelemane force-pushed the aci_4.1_soc7_multivmm branch from 4ee7a94 to dacd4ee Compare September 27, 2019 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.