forked from crowbar/crowbar-openstack
-
Notifications
You must be signed in to change notification settings - Fork 0
Aci 4.1 soc7 multivmm #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mmnelemane
wants to merge
331
commits into
vvaradhan:stable/4.0
Choose a base branch
from
mmnelemane:aci_4.1_soc7_multivmm
base: stable/4.0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Aci 4.1 soc7 multivmm #2
mmnelemane
wants to merge
331
commits into
vvaradhan:stable/4.0
from
mmnelemane:aci_4.1_soc7_multivmm
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rabbitmq: create empty users list which is expected by some recipes
…ort_on_startup_backport [4.0] rabbitmq: block client port on startup
[4.0] tempest: remove world-readable permission from tempest.conf
[4.0] keystone: Add retry loop to _get_token (bsc#1087466)
various encloud related backports for SOC7
(cherry picked from commit fd3505c)
The "always" setting has dubious negative performance impact and the SLES default is "madvise", so we should be using that instead. See https://www.kernel.org/doc/Documentation/vm/transhuge.txt for details. (cherry picked from commit f5bb2c5)
[4.0] transparent hugetable fixes (backport of PR 1659)
…ck_client_port_on_startup_backport Revert "[4.0] rabbitmq: block client port on startup"
…829) Issuing a new keystone token immediately after updating the admin user password may sometimes return an invalid token. In the context of crowbar, this issue can be triggered when calling the keystone_register 'wakeup' action immediately after the admin password has been updated. When triggered, it results in timeout errors on non-founder nodes, while the founder node is stuck doing retry iterations with an expired token. As a workaround for bsc#1091829, the 'wakeup' action is updated with an optional 'reissue_token_on_error' argument, which, when set, will re-issue a token *before* checking the keystone API again, instead of reusing the same token for subsequent attempts. (cherry picked from commit 3d664ed)
[4.0] keystone: avoid race condition during admin password change
because it defaults to off but a lot of people rely on nested virt being available While in https://fate.suse.com/320082 the virtualisation team declined to promote nested virt to fully supported status for SLE12, we are using this since 2012 in all kinds of places without problems. (cherry picked from commit afbcc5c)
nova: allow to enable nested virt on Intel
Disabled by default default. It can be set to avoid filling up image related tables. Though the tables are only filled by POST requests this limit is for all request types. See https://wiki.openstack.org/wiki/OSSN/OSSN-0076 for details. (cherry picked from commit c4c1b8e) Backport of crowbar#1677
…ate-limit [4.0] Add rate limiting for glance api (bsc#1005886)
The alarm_history_ttl config option for aodh was not previously configurable. (cherry picked from commit 47a7d26)
aodh: Add config for alarm_history_ttl (bsc#1073703)
This commit adds various new elasticsearch tunables and passes them to monasca-installer. (cherry picked from commit a91f9c5)
[4.0] mariadb: Remove redundant config values
[4.0] mariadb: Add prefix to configs
The correct field name for the Ceph cluster name is cephfs_cluster_name, correct it for the custom view so configurations using CephFS can be successfully applied. (cherry picked from commit 72a0f7f)
As the resource agent for rabbitmq with cluster HA restart the rabbitmq service several times, the current check can fail to validate rabbitmq status, as it could do the check just on one of those times that rabbit is up while creating/joining the cluster. Then if the check passed and continued the chef execution, the next steps could fail as they are dependant on having a running rabbitmq, while the rabbitmq server may still be restarting. Instead expand the checks to first look for a rabbit master for the resource and expand the check for a local runing rabbit to make sure we are checking for the local copy. Also add an extra check after the crm checks to make sure there are no pending operations for the resource so we can try to avoid continuing if there is a promotion going on. (cherry picked from commit 3060a3e)
…h-field-4.0 manila: Correct field name for cluster name
As the other checks are not enough, as pacemaker keeps restarting rabbitmq, we need a more robust way of checking that rabbit has entered an stable situation. So check that rabbit is up 5 times in a row with a delay of 2 seconds between checks to make sure pacemaker has left it alone. Also, only trigger that check for rabbit if the pacemaker_transaction is updated, otherwise there is no need to do so (cherry picked from commit 8b56894)
This commit improves the execution of monasca-installer in various ways: * Run monasca-installer from dedicated wrapper script * Determine whether to run monasca-installer in wrapper script * Signal changed resources by deleting wrapper script's version information file (causes a re-run) * Add time stamps to /var/log/monasca-installer.log (cherry picked from commit 5e05e24)
…nstaller monasca: various monasca-installer improvements
[4.0] rabbitmq: Make sure rabbitmq is running on cluster HA
apache2 reload causes responses 406 from keystone bso#1083093 (cherry picked from commit cfda234)
Such scenario is not supported by non-disruptive upgrade. User can still place the role to compute nodes manually, this only affects automatic allocation.
It is sometimes useful to be able to add extra options (like gcache.size, gcs.fc_debug, ...) to the galera wsrep_provider_options configuration variable. This can now be done via the Crowbar RAW view. Also make (the currently hardcoded) "gcs.fc_limit" multiplier configurable. Also make (the currently hardcoded) "gcs.fc_factor" configurable. (cherry picked from commit 94eb559)
[4.0] database: Raise and align promote/demote timeouts (bsc#1131791)
[4.0] database: Make wsrep_provider_options configurable (fate#327745)
[4.0] Fix neutron-ha-tool restart on configuration changes
In shared storage based HA setup, rabbitmq uses fixed uid/gid=91. This user/group modification was done after (optional) SSL certificate generation. The ACLs on the SSL key were incorrect making rabbitmq unable to start because with EACCESS errors.
The sync failed when certs and/or keys were located in non-default paths.
…se correct paths when syncing certs (crowbar#2146) [4.0] rabbitmq: Fix ACL of SSL key after uid/gid change + keystone: Use correct paths when syncing certs
In some cases the flavor create call succeeds but client still returns non-zero status. Retries of the create call fail with "Flavor already exists" and the retry loop never succeeds. Added check is executed in every loop turn and will stop reytring if the flavor already exists. Example scenario where flavor might be correctly created but client doesn't return zero is when one of HA nodes executes flavor create commands while others perform delayed restart of nova API after config files are modified. If the "create" request hits the API just before restart it could be accepted but the client might not get the correct response back. (cherry picked from commit 8085fb9)
l3-ha service doesn't use .openrc so it doesn't need to be restarted when that file is modified. (cherry picked from commit 56e4bed)
This openrc should no longer be needed now that we have the neutron l3-ha-service which consumes its configuration from a yaml file. (cherry picked from commit aa20716)
The agent status is anyway not very quickly updated, and if we retry too quickly we still might not have the underlying problem fixes. Lets be more conservative and check every 30s instead of 10s. (cherry picked from commit 59672c8)
[4.0] nova: Don't retry creating existing flavors
[4.0] neutron: Don't restart l3-ha on .openrc change + timeout extension
This patch allows the user to change the ovs inactivity_probe timeout from the neutron barclamp, in the 'raw' view. Previously, this value was always set to the OVS default, 5. It provides crowbar support for this upstream patch: https://review.opendev.org/#/c/663024/
[4.0] Make ovs of_inactivity_probe configurable from neutron barclamp
Creating magnum flavors is done as a delayed action, alongside all other delayed actions, such as restarting services. If one of those restarted services is apache (e.g. because it was triggered by another barclamp configuration change), then keystone and other API services might not be available right away, in which case the magnum flavor creation will fail. Re-attempting magnum flavor creation fixes this issue.
[4.0] magnum: retry flavor creation (SOC-9991)
Creating the magnum image is done as a delayed action, alongside all other delayed actions, such as restarting services. If one of those restarted services is apache (e.g. because it was triggered by another barclamp configuration change), then keystone and other API services might not be available right away, in which case the magnum image creation will fail. Re-attempting magnum image creation fixes this issue.
[4.0] magnum: retry magnum image creation (SOC-10015)
In some cases, VMs my contain more threads than permissible by the systemd set default of 16000. In this case, this tuneable makes it possible to set a higher limit in qemu.conf through the Nova barclamp. (cherry picked from commit a62fd70) Backport changes: migration renamed to 126_add_max_threads.rb and schema revision adjusted accordingly.
SOC-10001: add max_threads_per_process tuneable
It was replaced with dns_domain in 9e96051, but newton neutron dhcp needs dhcp_domain.
neutron: restore dhcp_domain in stable/4.0 (bsc#1145867)
Sometimes there is a race condition and ohai didn't collect the ruby version. to_f evalutes then the version to 0.0 and zypper fails to install the rubygem `ruby0.0-rubygem-cstruct' not found in package names`. (cherry picked from commit 6c02091)
[4.0] database: Hardcode ruby version for package installation (SOC-10010)
This commit provides changes in plugin packages and config files needed for integration of SOC with ACI 4.1 and higher versions. ACI 4.1 uses a slightly different set of plugin packages and configs for integration with OpenStack. This includes: - python-gbpclient renamed to python-group-based-policy-client - ovs-bridge-name in opflex-agent-ovs.conf removed - addition of int-bridge-name and access-bridge-name in opflex-agent-ovs.conf - Renaming of agent-ovs to opflex-agent For uniformity, the template for opflex-agent-ovs.conf is now renamed from 10-opflex-agent-ovs.conf.erb to opflex-agent-ovs.conf.erb - The neutron template schema and json templates are updated to provide integration_bridge and access_bridge details with default values. The corresponding migration scripts are also updated. (cherry picked from commit cb5347d)
5d0b3ad to
4ee7a94
Compare
A Single ACI fabric can support multiple VMM domains. Each VMM domain can be governed by a different controller (Eg: VMWare vCenter or OpenStack or MicroSoft SCVMM). Several production data centers tend to use multiple VMM domains and expect to be able to monitor and control network policies from a single ACI fabric. Integration of OpenStack with such a setup requires crowbar to provide parameters specific to each VMM domain. This commit adds the additional parameters and logic to validate and send these to the correct config location. The changes now allow to provide "Vmware" or "OpenStack" as the VMM type. Multiple entries of either types are possible. - Also added "ssl_mode" as a configurable parameter which is needed to be in "encrypted" mode if ESXi is used as compute. Other use-cases may need to change it as required and hence included it as a configurable parameter within the opflex node structure. (cherry picked from commit 1f16436)
4ee7a94 to
dacd4ee
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.