Aci 4.1 soc7 multivmm #2

mmnelemane · 2019-09-06T14:31:17Z

No description provided.

rabbitmq: create empty users list which is expected by some recipes

…ort_on_startup_backport [4.0] rabbitmq: block client port on startup

[4.0] tempest: remove world-readable permission from tempest.conf

[4.0] keystone: Add retry loop to _get_token (bsc#1087466)

various encloud related backports for SOC7

(cherry picked from commit fd3505c)

The "always" setting has dubious negative performance impact and the SLES default is "madvise", so we should be using that instead. See https://www.kernel.org/doc/Documentation/vm/transhuge.txt for details. (cherry picked from commit f5bb2c5)

[4.0] transparent hugetable fixes (backport of PR 1659)

…ck_client_port_on_startup_backport Revert "[4.0] rabbitmq: block client port on startup"

…829) Issuing a new keystone token immediately after updating the admin user password may sometimes return an invalid token. In the context of crowbar, this issue can be triggered when calling the keystone_register 'wakeup' action immediately after the admin password has been updated. When triggered, it results in timeout errors on non-founder nodes, while the founder node is stuck doing retry iterations with an expired token. As a workaround for bsc#1091829, the 'wakeup' action is updated with an optional 'reissue_token_on_error' argument, which, when set, will re-issue a token *before* checking the keystone API again, instead of reusing the same token for subsequent attempts. (cherry picked from commit 3d664ed)

[4.0] keystone: avoid race condition during admin password change

because it defaults to off but a lot of people rely on nested virt being available While in https://fate.suse.com/320082 the virtualisation team declined to promote nested virt to fully supported status for SLE12, we are using this since 2012 in all kinds of places without problems. (cherry picked from commit afbcc5c)

nova: allow to enable nested virt on Intel

Disabled by default default. It can be set to avoid filling up image related tables. Though the tables are only filled by POST requests this limit is for all request types. See https://wiki.openstack.org/wiki/OSSN/OSSN-0076 for details. (cherry picked from commit c4c1b8e) Backport of crowbar#1677

…ate-limit [4.0] Add rate limiting for glance api (bsc#1005886)

The alarm_history_ttl config option for aodh was not previously configurable. (cherry picked from commit 47a7d26)

aodh: Add config for alarm_history_ttl (bsc#1073703)

This commit adds various new elasticsearch tunables and passes them to monasca-installer. (cherry picked from commit a91f9c5)

[4.0] mariadb: Remove redundant config values

[4.0] mariadb: Add prefix to configs

The correct field name for the Ceph cluster name is cephfs_cluster_name, correct it for the custom view so configurations using CephFS can be successfully applied. (cherry picked from commit 72a0f7f)

As the resource agent for rabbitmq with cluster HA restart the rabbitmq service several times, the current check can fail to validate rabbitmq status, as it could do the check just on one of those times that rabbit is up while creating/joining the cluster. Then if the check passed and continued the chef execution, the next steps could fail as they are dependant on having a running rabbitmq, while the rabbitmq server may still be restarting. Instead expand the checks to first look for a rabbit master for the resource and expand the check for a local runing rabbit to make sure we are checking for the local copy. Also add an extra check after the crm checks to make sure there are no pending operations for the resource so we can try to avoid continuing if there is a promotion going on. (cherry picked from commit 3060a3e)

…h-field-4.0 manila: Correct field name for cluster name

As the other checks are not enough, as pacemaker keeps restarting rabbitmq, we need a more robust way of checking that rabbit has entered an stable situation. So check that rabbit is up 5 times in a row with a delay of 2 seconds between checks to make sure pacemaker has left it alone. Also, only trigger that check for rabbit if the pacemaker_transaction is updated, otherwise there is no need to do so (cherry picked from commit 8b56894)

This commit improves the execution of monasca-installer in various ways: * Run monasca-installer from dedicated wrapper script * Determine whether to run monasca-installer in wrapper script * Signal changed resources by deleting wrapper script's version information file (causes a re-run) * Add time stamps to /var/log/monasca-installer.log (cherry picked from commit 5e05e24)

…nstaller monasca: various monasca-installer improvements

[4.0] rabbitmq: Make sure rabbitmq is running on cluster HA

apache2 reload causes responses 406 from keystone bso#1083093 (cherry picked from commit cfda234)

Such scenario is not supported by non-disruptive upgrade. User can still place the role to compute nodes manually, this only affects automatic allocation.

It is sometimes useful to be able to add extra options (like gcache.size, gcs.fc_debug, ...) to the galera wsrep_provider_options configuration variable. This can now be done via the Crowbar RAW view. Also make (the currently hardcoded) "gcs.fc_limit" multiplier configurable. Also make (the currently hardcoded) "gcs.fc_factor" configurable. (cherry picked from commit 94eb559)

[4.0] database: Raise and align promote/demote timeouts (bsc#1131791)

[4.0] database: Make wsrep_provider_options configurable (fate#327745)

[4.0] Fix neutron-ha-tool restart on configuration changes

In shared storage based HA setup, rabbitmq uses fixed uid/gid=91. This user/group modification was done after (optional) SSL certificate generation. The ACLs on the SSL key were incorrect making rabbitmq unable to start because with EACCESS errors.

The sync failed when certs and/or keys were located in non-default paths.

…se correct paths when syncing certs (crowbar#2146) [4.0] rabbitmq: Fix ACL of SSL key after uid/gid change + keystone: Use correct paths when syncing certs

In some cases the flavor create call succeeds but client still returns non-zero status. Retries of the create call fail with "Flavor already exists" and the retry loop never succeeds. Added check is executed in every loop turn and will stop reytring if the flavor already exists. Example scenario where flavor might be correctly created but client doesn't return zero is when one of HA nodes executes flavor create commands while others perform delayed restart of nova API after config files are modified. If the "create" request hits the API just before restart it could be accepted but the client might not get the correct response back. (cherry picked from commit 8085fb9)

l3-ha service doesn't use .openrc so it doesn't need to be restarted when that file is modified. (cherry picked from commit 56e4bed)

This openrc should no longer be needed now that we have the neutron l3-ha-service which consumes its configuration from a yaml file. (cherry picked from commit aa20716)

The agent status is anyway not very quickly updated, and if we retry too quickly we still might not have the underlying problem fixes. Lets be more conservative and check every 30s instead of 10s. (cherry picked from commit 59672c8)

[4.0] nova: Don't retry creating existing flavors

[4.0] neutron: Don't restart l3-ha on .openrc change + timeout extension

This patch allows the user to change the ovs inactivity_probe timeout from the neutron barclamp, in the 'raw' view. Previously, this value was always set to the OVS default, 5. It provides crowbar support for this upstream patch: https://review.opendev.org/#/c/663024/

[4.0] Make ovs of_inactivity_probe configurable from neutron barclamp

Creating magnum flavors is done as a delayed action, alongside all other delayed actions, such as restarting services. If one of those restarted services is apache (e.g. because it was triggered by another barclamp configuration change), then keystone and other API services might not be available right away, in which case the magnum flavor creation will fail. Re-attempting magnum flavor creation fixes this issue.

[4.0] magnum: retry flavor creation (SOC-9991)

Creating the magnum image is done as a delayed action, alongside all other delayed actions, such as restarting services. If one of those restarted services is apache (e.g. because it was triggered by another barclamp configuration change), then keystone and other API services might not be available right away, in which case the magnum image creation will fail. Re-attempting magnum image creation fixes this issue.

[4.0] magnum: retry magnum image creation (SOC-10015)

In some cases, VMs my contain more threads than permissible by the systemd set default of 16000. In this case, this tuneable makes it possible to set a higher limit in qemu.conf through the Nova barclamp. (cherry picked from commit a62fd70) Backport changes: migration renamed to 126_add_max_threads.rb and schema revision adjusted accordingly.

SOC-10001: add max_threads_per_process tuneable

It was replaced with dns_domain in 9e96051, but newton neutron dhcp needs dhcp_domain.

neutron: restore dhcp_domain in stable/4.0 (bsc#1145867)

Sometimes there is a race condition and ohai didn't collect the ruby version. to_f evalutes then the version to 0.0 and zypper fails to install the rubygem `ruby0.0-rubygem-cstruct' not found in package names`. (cherry picked from commit 6c02091)

[4.0] database: Hardcode ruby version for package installation (SOC-10010)

This commit provides changes in plugin packages and config files needed for integration of SOC with ACI 4.1 and higher versions. ACI 4.1 uses a slightly different set of plugin packages and configs for integration with OpenStack. This includes: - python-gbpclient renamed to python-group-based-policy-client - ovs-bridge-name in opflex-agent-ovs.conf removed - addition of int-bridge-name and access-bridge-name in opflex-agent-ovs.conf - Renaming of agent-ovs to opflex-agent For uniformity, the template for opflex-agent-ovs.conf is now renamed from 10-opflex-agent-ovs.conf.erb to opflex-agent-ovs.conf.erb - The neutron template schema and json templates are updated to provide integration_bridge and access_bridge details with default values. The corresponding migration scripts are also updated. (cherry picked from commit cb5347d)

A Single ACI fabric can support multiple VMM domains. Each VMM domain can be governed by a different controller (Eg: VMWare vCenter or OpenStack or MicroSoft SCVMM). Several production data centers tend to use multiple VMM domains and expect to be able to monitor and control network policies from a single ACI fabric. Integration of OpenStack with such a setup requires crowbar to provide parameters specific to each VMM domain. This commit adds the additional parameters and logic to validate and send these to the correct config location. The changes now allow to provide "Vmware" or "OpenStack" as the VMM type. Multiple entries of either types are possible. - Also added "ssl_mode" as a configurable parameter which is needed to be in "encrypted" mode if ESXi is used as compute. Other use-cases may need to change it as required and hence included it as a configurable parameter within the opflex node structure. (cherry picked from commit 1f16436)

Rick Salevsky and others added 30 commits May 2, 2018 13:29

Merge pull request crowbar#1653 from jsuchome/rabbitmq-users-4.0

435c971

rabbitmq: create empty users list which is expected by some recipes

Merge pull request crowbar#1563 from ilausuch/rabbitmq_block_client_p…

ee91538

…ort_on_startup_backport [4.0] rabbitmq: block client port on startup

Merge pull request crowbar#1649 from vuntz/tempest-perms-4.0

ba61eb3

[4.0] tempest: remove world-readable permission from tempest.conf

Merge pull request crowbar#1646 from skazi0/chef-keystone-retry-bp

e3e083f

[4.0] keystone: Add retry loop to _get_token (bsc#1087466)

Merge pull request crowbar#1634 from dirkmueller/stable/4.0

420c44e

various encloud related backports for SOC7

Revert "[4.0] rabbitmq: block client port on startup"

f04cbdc

nova: fix variable naming typo

c0d8a1c

(cherry picked from commit fd3505c)

nova: default thp defragt to madvise

675b189

The "always" setting has dubious negative performance impact and the SLES default is "madvise", so we should be using that instead. See https://www.kernel.org/doc/Documentation/vm/transhuge.txt for details. (cherry picked from commit f5bb2c5)

Merge pull request crowbar#1667 from toabctl/stable-4.0-backport-pr1659

4b4027c

[4.0] transparent hugetable fixes (backport of PR 1659)

Merge pull request crowbar#1664 from crowbar/revert-1563-rabbitmq_blo…

3c3067c

…ck_client_port_on_startup_backport Revert "[4.0] rabbitmq: block client port on startup"

Merge pull request crowbar#1670 from stefannica/bsc#1091829

89e59cd

[4.0] keystone: avoid race condition during admin password change

Merge pull request crowbar#1671 from SUSE-Cloud/nestedvirt

a2cb98b

nova: allow to enable nested virt on Intel

Merge pull request crowbar#1687 from JanZerebecki/for-4.0-glane-api-r…

e87e944

…ate-limit [4.0] Add rate limiting for glance api (bsc#1005886)

aodh: Add config for alarm_history_ttl (bsc#1073703)

25c1200

The alarm_history_ttl config option for aodh was not previously configurable. (cherry picked from commit 47a7d26)

Merge pull request crowbar#1689 from aplanas/bsc1073703_stable

e0cdb50

aodh: Add config for alarm_history_ttl (bsc#1073703)

monasca: add elasticsearch tunables (bsc#1090343)

839e79a

This commit adds various new elasticsearch tunables and passes them to monasca-installer. (cherry picked from commit a91f9c5)

Merge pull request crowbar#1588 from rsalevsky/10.2_cleanup_4.0

42bd67c

[4.0] mariadb: Remove redundant config values

Merge pull request crowbar#1587 from rsalevsky/conf_prefix_4.0

dc4f235

[4.0] mariadb: Add prefix to configs

manila: Correct field name for cluster name

c0d204d

The correct field name for the Ceph cluster name is cephfs_cluster_name, correct it for the custom view so configurations using CephFS can be successfully applied. (cherry picked from commit 72a0f7f)

Merge pull request crowbar#1694 from s-t-e-v-e-n-k/correct-manila-cep…

1d52aa6

…h-field-4.0 manila: Correct field name for cluster name

Merge pull request crowbar#1698 from jgrassler/backport-fix-monasca-i…

a41ca49

…nstaller monasca: various monasca-installer improvements

Merge pull request crowbar#1697 from Itxaka/backport_rabbit_check

b1376ac

[4.0] rabbitmq: Make sure rabbitmq is running on cluster HA

copytruncate apache logs instead of creating

1a1c28b

apache2 reload causes responses 406 from keystone bso#1083093 (cherry picked from commit cfda234)

Do not automatically put manila-share roles to compute nodes

9d2cbfe

Such scenario is not supported by non-disruptive upgrade. User can still place the role to compute nodes manually, this only affects automatic allocation.

toabctl and others added 26 commits May 15, 2019 13:34

Merge pull request crowbar#2126 from toabctl/stable-4.0-bsc1131791

064958d

[4.0] database: Raise and align promote/demote timeouts (bsc#1131791)

Merge pull request crowbar#2130 from toabctl/stable-4.0-fate327745

bfd2428

[4.0] database: Make wsrep_provider_options configurable (fate#327745)

Merge pull request crowbar#2120 from dirkmueller/stable/4.0

52b4995

[4.0] Fix neutron-ha-tool restart on configuration changes

keystone: Use correct paths when syncing certs

375f8a8

The sync failed when certs and/or keys were located in non-default paths.

[4.0] rabbitmq: Fix ACL of SSL key after uid/gid change + keystone: U…

81dc9e3

…se correct paths when syncing certs (crowbar#2146) [4.0] rabbitmq: Fix ACL of SSL key after uid/gid change + keystone: Use correct paths when syncing certs

neutron: Don't restart l3-ha on .openrc change

8eef6d1

l3-ha service doesn't use .openrc so it doesn't need to be restarted when that file is modified. (cherry picked from commit 56e4bed)

neutron: remove .openrc creation from neutron cookbooks

d19ac50

This openrc should no longer be needed now that we have the neutron l3-ha-service which consumes its configuration from a yaml file. (cherry picked from commit aa20716)

neutron: increase interval between checks to 30s

5c24d29

The agent status is anyway not very quickly updated, and if we retry too quickly we still might not have the underlying problem fixes. Lets be more conservative and check every 30s instead of 10s. (cherry picked from commit 59672c8)

Merge pull request crowbar#2144 from skazi0/nova-flavor-existing-7

91038bc

[4.0] nova: Don't retry creating existing flavors

Merge pull request crowbar#2141 from skazi0/neutron-l3-ha-no-openrc-7

d4a147f

[4.0] neutron: Don't restart l3-ha on .openrc change + timeout extension

Merge pull request crowbar#2162 from zzaimeche/soc7-inactivity-probe

43761a6

[4.0] Make ovs of_inactivity_probe configurable from neutron barclamp

Merge pull request crowbar#2179 from stefannica/soc-9296

72a660d

[4.0] magnum: retry flavor creation (SOC-9991)

Merge pull request crowbar#2181 from stefannica/SOC-10015

3fd19c6

[4.0] magnum: retry magnum image creation (SOC-10015)

Merge pull request crowbar#2186 from jgrassler/c7-add-threadmax-tuneable

bea84a6

SOC-10001: add max_threads_per_process tuneable

neutron: restore dhcp_domain in stable/4.0 (bsc#1145867)

31c88cf

It was replaced with dns_domain in 9e96051, but newton neutron dhcp needs dhcp_domain.

Merge pull request crowbar#2196 from djoreilly/fix-dhcp-domain

3a8da5f

neutron: restore dhcp_domain in stable/4.0 (bsc#1145867)

Merge pull request crowbar#2207 from rsalevsky/4.0_ruby_version

2f98991

[4.0] database: Hardcode ruby version for package installation (SOC-10010)

mmnelemane changed the base branch from master to stable/4.0 September 6, 2019 14:31

mmnelemane force-pushed the aci_4.1_soc7_multivmm branch from 5d0b3ad to 4ee7a94 Compare September 13, 2019 12:15

mmnelemane force-pushed the aci_4.1_soc7_multivmm branch from 4ee7a94 to dacd4ee Compare September 27, 2019 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aci 4.1 soc7 multivmm #2

Aci 4.1 soc7 multivmm #2

Uh oh!

mmnelemane commented Sep 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Aci 4.1 soc7 multivmm #2

Are you sure you want to change the base?

Aci 4.1 soc7 multivmm #2

Uh oh!

Conversation

mmnelemane commented Sep 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants