About this presentation

Press ? for help on navigating these slides
Press m for a slide menu
Press s for speaker notes

HA in OpenStack today

When is compute HA important?

Check progress of proposal

root@crowbar:~ # tail -f /var/log/crowbar/production.log
root@crowbar:~ # tail -f /var/log/crowbar/chef-client/*.log

crowbar batch

Unattended batch setup of barclamps:

root@crowbar:~ # crowbar batch build my-cloud.yaml

Dump current barclamps as YAML:

root@crowbar:~ # crowbar batch export

YAML for Pacemaker remotes

- barclamp: pacemaker
  name: services
  attributes:
    stonith:
      mode: libvirt
      libvirt:
        hypervisor_ip: 192.168.217.1
    drbd:
      enabled: true
  deployment:
    elements:
      hawk-server:
      - "@@controller1@@"
      - "@@controller2@@"
      pacemaker-cluster-member:
      - "@@controller1@@"
      - "@@controller2@@"
      pacemaker-remote:
      - "@@compute1@@"
      - "@@compute2@@"

YAML input for KVM remote nodes

- barclamp: nova
  attributes:
    use_migration: true
    kvm:
      ksm_enabled: true
  deployment:
    elements:
      nova-controller:
      - cluster:cluster1
      nova-compute-kvm:
      - remotes:cluster1

Boot a VM

Let's boot a VM to test compute node HA!

Connect to one of the controller nodes, and get image / flavor / net names:

source .openrc
openstack image list
openstack flavor list
neutron net-list

Boot the VM using these ids:

nova boot --image image --flavor flavor --nic net-id=net testvm

Test it's booted:

nova show testvm

Assign a floating IP

Create floating IP:

neutron floatingip-create floatingnet

Get VM IP:

nova list

Get port id:

neutron port-list | grep vmIP

Associate floating IP with VM port:

neutron floatingip-associate floatingipID portID

Set up monitoring

Recommended in separate windows/terminals
From either of the controller nodes

Ping VM:

ping vmFloatingIP

Ping host where the VM is running:

nova list --fields host,name
ping hostIP

Set up monitoring (part 2)

Check log messages for NovaEvacuate workflow:

tail -f /var/log/messages | grep NovaEvacuate

Monitor cluster status:

crm_mon

Simulate compute node failure

Login to compute node where VM runs, and type:

pkill -9 -f pacemaker_remoted

This will cause fencing! (Why?)

Verifying compute node failure detection

Pacemaker monitors compute nodes via pacemaker_remote.

If compute node failure detected:

compute node is fenced
- crm_mon etc. will show node unclean / offline
Pacemaker invokes fence-nova as secondary fencing resource
```
crm configure show fencing_topology
```

Find node running fence_compute:

crm resource show fence-nova

Verifying secondary fencing

fence_compute script:

tells nova server that node is down
updates attribute on compute node to indicate node needs recovery

Log files:

/var/log/nova/fence_compute.log
/var/log/messages on DC and node running fence-nova

Verify attribute state via:

attrd_updater --query --all --name=evacuate