Donal Lafferty: July 2013

Monday, July 29, 2013

Using CloudMonkey to Automate CloudStack Operations

Background:

The CloudStack GUI does not suit repetitive tasks. There is no macro mechanism in the GUI to allow an admin to record and replay long workflows. Multi-step tasks such as the setup of a new zone or the registration of a template must be done by hand and are error prone.

Developers can automate CloudStack workflows with the CloudMonkey tool. CloudMonkey provides a means of making CloudStack API calls from the command line, and thus from a script.

Problem:

The GUI does not tell you which API calls and parameters it is using for a task. This makes it difficult to reproduce the same functionality in a CloudMonkey script.

Solution:

Parse the management server log file to see the sequence of commands executed during a GUI task. Once the commands and parameters are known, reconstruct the steps in CloudMonkey.

Parse the CloudStack log file:

The management server logs the beginning and end of all API calls in a log file. In the case of a development system, the log file is usually the file vmops.log in the root of the source tree.

Use grep to obtain a list of API call log entries:

 grep 'command=' vmops.log > all_api_logs.txt

The result is quite raw. It will require additional clean up. E.g.:

  root@mgmtserver:~/github/cshv3# grep 'command=' vmops_createtmplt_sh_problem.log > all_api_calls.txt   
  root@mgmtserver:~/github/cshv3# cat all_api_calls.txt   
  ...   
  2013-07-17 08:59:50,522 DEBUG [cloud.api.ApiServlet] (343904103@qtp-1389504071-7:null) ===START=== 10.70.176.29 -- GET command=listCapabilities&response=json&sessionkey=null&_=1374047990517   
  2013-07-17 08:59:50,540 DEBUG [cloud.api.ApiServlet] (343904103@qtp-1389504071-7:null) ===END=== 10.70.176.29 -- GET command=listCapabilities&response=json&sessionkey=null&_=1374047990517   
  ...

Next, remove uninterested log entries using sed:

 sed -e '/^.*command=log/d; /^.*===END===/d; /^.*command=queryAsyncJobResult/d' all_api_logs.txt > ./reqd_api_logs.txt

How does this work?

Using the -e parameter, we pass sed a list of commands separated by a semicolon. The meaning of each command is as follows:

/^.*command=log/d deletes login and logout commands.

/^.*===END===/d removes the second log message for a call, which is made at the end of the API call.

/^.*command=queryAsyncJobResult/d' removes polling commands that the GUI uses to determine if an asynchronous command has completed. We will use Monkey in blocking mode, which means it will do the queryAsyncJobResult calls for us.

Next, convert logs entries to commands:

 sed -e 's/^.*command=//; s/&/ /g; s/_=.*//; s/sessionkey=[^ ]*//; s/response=[^ ]*//' ./reqd_api_logs.txt > ./encoded_api_calls.txt

How does this work?

s/^.*command=// removes from start of line to and including "command=". We want everything after command=, because that is the actual command.

s/&/ /g replaces the '&' used to separate arguments in the API call with a space. Its more readable, and CloudMonkey wants us to separate commands with a space.

s/_=.*// removes the 'cache buster' that prevents network infrastructure from responding to the HTTP request with a cached result.

s/sessionkey=[^ ]*// removes the session key. CloudMonkey uses API keys. Besides, the sessionkey will have expired by now!

s/response=[^ ]*// removes the response encoding parameter from the request. CloudMonkey will insert a suitable version of this parameter automatically.

Next, enclose parameter values in single and double quotes

 sed -e 's/ \+/ /g; s/=/='"'"'"/g; s/ /"'"'"' /g; s/"'"'"'//' ./encoded_api_calls.txt > delimited_encoded_api_calls.txt

We want to put double quotes around parameter values before converting from URL encoding to strings. This will preserve the whitespace after decoding. We also add single quotes. The single quotes prevent the bash shell from removing the double quotes when we put these commands in a script.

The sed commands are complex due to a quirk with how bash parses single quotes...

s/ \+/ /g converts one or more spaces to a single space.

s/=/='"'"'"/g converts equals (=) to equals, single quote, double quote ( ='" )

s/ /"'"'"' /g converts all spaces to double quote, single quote ( "' ).

s/"'"'"'// removes the leading double quote, single quote.

Using the command above,

 createPhysicalNetwork zoneid=28444ba3-1405-4872-b23c-015cf5116415 name=Physical%20Network%201 isolationmethods=VLAN

has all parameters enclosed in '" ... "', e.g.

 createPhysicalNetwork zoneid='"28444ba3-1405-4872-b23c-015cf5116415"' name='"Physical%20Network%201"' isolationmethods='"VLAN"'

If you don't need the single quotes, just use the command below to insert your quotes.

 sed -e 's/ \+/ /g; s/=/="/g; s/ /" /g; s/"//' ./encoded_api_calls.txt > delimited_encoded_api_calls.txt

Finally, remove URL encoding from the parameters:

The parameters for our commands are URL encoded. E.g.

 root@mgmtserver:~/github/cshv3# cat delimited_encoded_api_calls.txt  
 ...  
 addImageStore name="AWS+S3" provider="S3" details%5B0%5D.key="accesskey" details%5B0%5D.value="my_access_key" 
 details%5B1%5D.key="secretkey" details%5B1%5D.value="my_secret_key" details%5B2%5D.key="bucket" details%5B2%5D.value="cshv3eu" details%5B3%5D.key="usehttps"   
 details%5B3%5D.value="true" details%5B4%5D.key="endpoint" details%5B4%5D.value="s3.amazonaws.com"  
 ...

You can decode them with the following (source):

 sed -e 's/+/ /g; s/\%0[dD]//g' delimited_encoded_api_calls.txt | awk '/%/{while(match($0,/\%[0-9a-fA-F][0-9a-fA-F]/)){$0=substr($0,1,RSTART-1)sprintf("%c",0+("0x"substr($0,RSTART+1,2)))substr($0,RSTART+3);}}{print}' > decoded_api_calls.txt

This restores whitespace and punctuations E.g.

 root@mgmtserver:~/github/cshv3# cat decoded_api_calls.txt  
 ...  
 addImageStore name="AWS S3" provider="S3" details[0].key="accesskey" details[0].value="my_access_key" 
 details[1].key="secretkey" details[1].value="my_secret_key" details[2].key="bucket" details[2].value="cshv3eu" details[3].key="usehttps"   
 details[3].value="true" details[4].key="endpoint" details[4].value="s3.amazonaws.com"  
 ...

Setup CloudMonkey:

Install CloudMonkey

Be careful not to use an out of date community maintained package. The target version of CloudMonkey is listed at install time. E.g

 root@mgmtserver:~/github/cshv3# apt-get install python-pip  
 Reading package lists... Done  
 Building dependency tree  
 ...  
 root@mgmtserver:~/github/cshv3# pip install cloudmonkey  
 Downloading/unpacking cloudmonkey  
  Downloading cloudmonkey-4.1.0-1.tar.gz (60Kb): 60Kb downloaded  
  Running setup.py egg_info for package cloudmonkey  
 root@mgmtserver:~/github/cshv3# which cloudmonkey  
 /usr/local/bin/cloudmonkey

If you are a developer, use the instructions on the CloudMonkey wiki to build the latest version. E.g.

 root@mgmtserver:~/github/cshv3# cd tools/cli  
 root@mgmtserver:~/github/cshv3/tools/cli# mvn clean install -P developer  
 [INFO] Scanning for projects...  
 [INFO]  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Building Apache CloudStack cloudmonkey cli 4.2.0-SNAPSHOT  
 [INFO] ------------------------------------------------------------------------  
 [INFO]  
 ...  
 [INFO] --- maven-install-plugin:2.3.1:install (default-install) @ cloud-cli ---  
 [INFO] Installing /root/github/cshv3/tools/cli/pom.xml to /root/.m2/repository/org/apache/cloudstack/cloud-cli/4.2.0-SNAPSHOT/cloud-cli-4.2.0-SNAPSHOT.pom  
 [INFO] ------------------------------------------------------------------------  
 [INFO] BUILD SUCCESS  
 [INFO] ------------------------------------------------------------------------  
 [INFO] Total time: 5.190s  
 [INFO] Finished at: Mon Jul 22 22:33:01 BST 2013  
 [INFO] Final Memory: 16M/154M  
 [INFO] ------------------------------------------------------------------------  
 root@mgmtserver:~/github/cshv3/tools/cli# python setup.py build  
 running build  
 ...  
 writing manifest file 'cloudmonkey.egg-info/SOURCES.txt'  
 root@mgmtserver:~/github/cshv3/tools/cli# python setup.py install  
 running install  
 ...  
 Finished processing dependencies for cloudmonkey==4.2.0-0  
 root@mgmtserver:~/github/cshv3/tools/cli# which cloudmonkey  
 /usr/local/bin/cloudmonkey

Configure CloudMonkey

As a minimum, CloudMonkey needs the URL for the management server and API keys to authenticate requests to the server. API keys are different from your password / username. How to obtain API keys is described at 9:07 in this YouTube CloudMonkey overview by DIYCloudComputing.

Also, set CloudMonkey to use JSON output. The alternative is difficult to parse.

Finally, use sync to tell CloudMonkey to discover the latest API.

These values can be set at the command line. E.g.

 cloudmonkey set apikey WsiG7tva38gJpl082mBRQEnAic9g_BW15fK5aB4W3ak9GBoBeg0iOz9iGAIJ7eSnHecS1ONffEygi2xTkP4QOw   
 cloudmonkey set secretkey _Ov8DMed8WMWMscWaWX6cCHzF7kWCQU2SVwbQJo4ujL2-ocLdvkC5Mwe0XlrSDZ12ha52ieAtYOJj6viA1SFhQ   
 cloudmonkey set display json   
 cloudmonkey sync

Now CloudMonkey can make API calls. E.g.

 root@mgmtserver:~/github/cshv3# cloudmonkey list users  
 {  
  "count": 1,  
  "user": [  
   {  
    "account": "admin",  
    "accountid": "12a8380c-f2e3-11e2-b495-00155db1030e",  
    "accounttype": 1,  
    "apikey": "WsiG7tva38gJpl082mBRQEnAic9g_BW15fK5aB4W3ak9GBoBeg0iOz9iGAIJ7eSnHecS1ONffEygi2xTkP4QOw",  
    "created": "2013-07-22T17:26:25+0100",  
    "domain": "ROOT",  
    "domainid": "12a7d75c-f2e3-11e2-b495-00155db1030e",  
    "email": "admin@mailprovider.com",  
    "firstname": "Admin",  
    "id": "12a8686b-f2e3-11e2-b495-00155db1030e",  
    "iscallerchilddomain": false,  
    "isdefault": true,  
    "lastname": "User",  
    "secretkey": "_Ov8DMed8WMWMscWaWX6cCHzF7kWCQU2SVwbQJo4ujL2-ocLdvkC5Mwe0XlrSDZ12ha52ieAtYOJj6viA1SFhQ",  
    "state": "enabled",  
    "username": "admin"  
   }  
  ]  
 }

Recreate GUI Commands in CloudMonkey

The parsed log file contains a list of API calls. Pick out the ones you want to use. I placed them in a file called myscript.

To make CloudMonkey API calls from the command line, simply add cloudmonkey api to the API call. To save time, you can prepend every command using sed:

 sed 's/^/cloudmonkey api /' myscript > myscript2

The results of one API call will provide the parameters for the next, so we want to be able to capture the results of our CloudMonkey calls.

Simply enclose your commands in reverse single quotes and assign the result to a bash variable. To save time, use sed:

 sed -e 's/^/apiresult=`/; s/$/`/' myscript2 > myscript3

CloudMonkey has strict case sensitivity rules that prevent it from using log file input. CloudMonkey expects all parameter keys to be in lower case. E.g. the addTrafficType command above appears in the log with the parameter trafficType. However, CloudMonkey expects it to be traffictype (all lower case).

Thus, a a file with the API calls below:

 createZone networktype='"Advanced"' securitygroupenabled='"false"' guestcidraddress='"10.1.1.0/24"' name='"HybridZone"' localstorageenabled='"true"' dns1='"4.4.4.4"' internaldns1='"10.70.176.118"' internaldns2='"10.70.160.66"'  
 createPhysicalNetwork zoneid='"28444ba3-1405-4872-b23c-015cf5116415"' name='"Physical Network 1"' isolationmethods='"VLAN"'

We would get this result:

 apiresult=`cloudmonkey api createphysicalnetwork zoneid='"28444ba3-1405-4872-b23c-015cf5116415"' name='"Physical Network 1"' isolationmethods='"VLAN"' `  
 apiresult=`cloudmonkey api addtrafficType physicalnetworkid='"8ae03f63-efe9-46ea-9c31-f35164ef3dfc"' traffictype='"Management"' `

Extract results as required

The variable apiresult includes a lot of information not useful for subsequent calls. E.g.

 root@mgmtserver:~/github/cshv3# apiresult=`cloudmonkey api createZone networktype="Advanced" securitygroupenabled="false" guestcidraddress="10.1.1.0/24" name="HybridZoneA" localstorageenabled="true" dns1="4.4.4.4" internaldns1="10.70.176.118" internaldns2="10.70.160.66"`  
 root@mgmtserver:~/github/cshv3# echo $apiresult  
 { "zone": { "allocationstate": "Disabled", "dhcpprovider": "VirtualRouter", "dns1": "4.4.4.4", "guestcidraddress": "10.1.1.0/24", "id": "2347b5c8-378c-4a7e-9977-818bbba4f7ff", "internaldns1": "10.70.176.118", "internaldns2": "10.70.160.66", "localstorageenabled": true, "name": "HybridZoneA", "networktype": "Advanced", "securitygroupsenabled": false, "zonetoken": "b957e317-d661-30dd-a412-1f76f2736412" } }

Usually, you will have to add code to extract specific parameters from the result. For instance, here we extract the identifier of a newly create zone for a createPhysicalNetwork call:

 root@mgmtserver:~/github/cshv3# zoneid=`echo $apiresult | sed -e 's/^.*"id": //; s/,.*$//'`  
 root@mgmtserver:~/github/cshv3# echo $zoneid  
 "2347b5c8-378c-4a7e-9977-818bbba4f7ff"  
 root@mgmtserver:~/github/cshv3# apiresult=`cloudmonkey api createPhysicalNetwork zoneid=$zoneid name='Physical Network 1' isolationmethods='VLAN'`

In a script, we can the $zoneid as the value variable of a variable.

Difficulties with this Approach:

CloudStack does not log the parameters of POST requests. Commands such as addHost are recorded as received, but their parameters are not. You have to refer to the developers guide to figure them out. This is down to a lack of explicit support for logging incoming commands in CloudStack.

Final Remarks:

Parsing the GUI commands out of the log file is quite complex. It would be a lot easier if the management server logged API calls in plain text rather than as URL encoded strings, and if commands sent by HTTP POST commands had their parameters clearly logged.

Parsing JSON encoded text is poorly supported in bash. CloudMonkey's 'filter' option would avoid this issue if it were available with the api command. filter tells CloudMonkey to return only the values of a list of keys. If the filter were available, code to parse the apiresult would not be required.

CloudMonkey cannot be used with a clean deployment, because CloudStack initially has no API keys. This issue can be avoided if username / password could be used to authenticate API calls. username / passowrd authentication is used for login by the GUI and by tools such as the CloudStack.NET SDK (see the relevant Login method).

Fortunately, developers can disable database encryption and add API keys to the admin user before starting CloudStack. To disable database encryption, set db.cloud.encryption.type=none in your db.properties file. This is done automatically by the Maven project that runs Jetty. E.g.

 root@mgmtserver:~/github/cshv3# grep -R "db\.cloud\.encrypt" * --include=db.properties  
 client/target/generated-webapp/WEB-INF/classes/db.properties:db.cloud.encryption.type=none  
 client/target/generated-webapp/WEB-INF/classes/db.properties:db.cloud.encrypt.secret=

Next, add the desired API keys are set in the user table. E.g.

 mysql --user=root --password="" cloud -e "update user set secret_key='_Ov8DMed8WMWMscWaWX6cCHzF7kWCQU2SVwbQJo4ujL2-ocLdvkC5Mwe0XlrSDZ12ha52ieAtYOJj6viA1SFhQ',api_key='WsiG7tva38gJpl082mBRQEnAic9g_BW15fK5aB4W3ak9GBoBeg0iOz9iGAIJ7eSnHecS1ONffEygi2xTkP4QOw' where id=2;"

Tuesday, July 16, 2013

Apache CloudStack Java Coding Conventions

Background

CloudStack stipulates coding conventions for Java code submitted to the project.

Problem

Converting Java source to this code style is labour intensive. A tool to identify problems would be very useful. A tool to fix them is even better.

Solution

Use CheckStyle in conjunction with Eclipse.

CheckStyle will analyse Java code according to a style file and report any violations. A style file consistent with Apache CloudStack is available for download from the Apache CloudStack wiki.

Eclipse fixes problems and verifies more complex changes. When the CheckStyle Eclipse plugin is installed, problem code is highlighted and an explanation provided on a line by line basis. After updating Eclipse's automatic formatting to meet Apache CloudStack's coding conventions, labour intensive work such as proper indentation is taken care of automagically.

CheckStyle Plugin Install

The CheckStyle Plugin offers an install guide, but the screen shots are for an older version of Eclipse. Think of this an overview rather that step by step instructions.

Eclipse Auto Format

Ctrl-Shft-F will auto format the current file using the current code formatter. However, the default profile is not consistent with ACS coding conventions. By default, Eclipse treats whitespace slightly differently than the Java conventions.

Import new Java Code Style Preferences:

Java Code Style settings are available in the Apache CloudStack repo. Simply import Java Code Style Preferences from .tools\eclipse\eclipse.epf. To do so, open File -> Import, and type "Preferences" into the filter text box.

or

Update the Code Style Formatter profile:

1. Under Window -> Preferences -> Java -> Code Style -> Formatter, create a new profile based on "Java Conventions". Call it "ACS Java Conventions", and select it as the Active profile.

2. Edit the "ACS Java Conventions".

2.1 Under the "Indentation tab", set tab policy to "Spaces only", 4 characters

2.2 Under "White Space", Arrays -> Array Initializers, turn off 'after opening brace' and turn off 'before closing brace'

3. Confirm changes to close the Profile edit window and the Preferences window.

4. Format your document: Ctrl-Shft-F

CheckStyle Plugin Setup

I do not want to be prescriptive, so just keep the following points in mind:

1. CheckStyle is configured in Project Properties. You'll see a 'CheckStyle' menu in a project's 'Properties' dialog. Source

2. The available styles are configured in teh Checkstyle tab of the Eclipse Preferences (Window->Preferences). Source

3. A custom style file called acs_codestyle.xml was created. This custom checklist is derived from the "Sun Checks" policy that ships with CheckStyle. To get a feel for the differences, export and diff the built-in Sun Checks and the ACS checks.

Final Remarks

Common Errors

Methods marked @Override require final keyword

Since Java 6, @Override is used to verify that an interface method is being implemented. If method cannot be overriden, you have use the finally keyword to signal this.

Unused and unchanged method parameters should be declared final.

Complaints

Magic number warnings make tests less readable

Non-zero values that appear in methods should be replaced by constants. In Java, constants are static final fields NAMED_IN_ALL_CAPS.
For unit tests this makes less sense. Test values are often best hidden in the method corresponding to the test rather than being spread out over the test file.

~~80 char max line length makes tests unreadable~~

~~My tests store CloudStack kernel commands in their JSON serialised format. These are very long strings. Keeping them to 80 characters makes them unreadable.~~

Eclipse auto format sometimes fails to sort out extra long lines of code.

Taking the 'TODO's out removes important metadata

TODOs are the best way to tell other developers where code needs to be double checked due to gaps in my knowledge.

Edits

ACS uses 180 max line length, and not 80 chars. I will have to update the style file.
Eclipse Java Code Style Preferences are available from the Apache CloudStack git repo at ./tools/eclipse/eclipse.epf

Tuesday, July 09, 2013

Installing Hyper-V PV Drivers for Linux Guests

Background

PV, or paravirtualization improves VM performance bypassing hardware emulation where the emulator would introduce significant and unnecessary overhead. A good example is a VMs networking stack, which exists twice. Once in the VM, and again in the parent operating system running on the hypervisor.

PV drivers allow a guest O/S to take advantage of optimisations offered by PV. Using drivers is an easy way to allow the kernel to take advantage of PV facilities without requiring that the kernel by recompiled with hypervisor PV APIs in mind. E.g. externally, the networking PV driver can be used by an unchanged kernel, and internally it is a thin wrapper over the hypervisor's networking API.

Problem

Turning to Hyper-V, it's PV drivers and accompanying utils are referred to as "Linux Integration Services" aka LIS. The latest versions are available for download. This gives you an ISO for RPMs for supported Linux distributions.

The difficulty for me is that the ISO contains only RPMs, and they target the older linux kernel typical of even recent versions of RHEL. LIS is not compatible with CloudStack's internal services that are implemented by VMs running Debian Wheezy...

# uname -r
3.2.0-4-686-pae

# cat /etc/*-release
Cloudstack Release 4.2.0 Fri Jul 5 04:16:17 UTC 2013
PRETTY_NAME="Debian GNU/Linux 7.0 (wheezy)"
NAME="Debian GNU/Linux"
VERSION_ID="7.0"
VERSION="7.0 (wheezy)"
ID=debian
ANSI_COLOR="1;31"
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support/"
BUG_REPORT_URL="http://bugs.debian.org/"

Solution

Fortunately, Debian Wheezy was compiled with the Hyper-V drivers. You can verify the presence of the kernel modules using using modinfo E.g.

# modinfo -F filename hid-hyperv hv_storvsc hv_netvsc hv_vmbus hv_utils
/lib/modules/3.2.0-4-686-pae/kernel/drivers/hid/hid-hyperv.ko
/lib/modules/3.2.0-4-686-pae/kernel/drivers/scsi/hv_storvsc.ko
/lib/modules/3.2.0-4-686-pae/kernel/drivers/net/hyperv/hv_netvsc.ko
/lib/modules/3.2.0-4-686-pae/kernel/drivers/hv/hv_vmbus.ko
/lib/modules/3.2.0-4-686-pae/kernel/drivers/hv/hv_utils.ko

NB: As of linux kernel 3.2, hv_timesource is no longer an independent kernel module (source).

However I still need the user mode daemons that make up LIS.

Hyper-V Key Value Pair (KVP) Component

The KVP component that is responsible for transferring data in the form of key value pairs between the host and the guest O/S. It also performs other tasks.

Using rpm2cpio, you can extract the daemon and service controller from the RPM. rpm2cpio returns the .cpio, which a. tar, is a concatenation of files. This guide explains how to get the actual contents. E.g.

# rpm2cpio microsoft-hyper-v-rhel63.3.4-1.20120727.i686.rpm | cpio -idmv
./etc/init.d/hv_kvp_daemon
./etc/modprobe.d/hyperv_pvdrivers.conf
./usr/sbin/hv_kvp_daemon
28 blocks

Final Remarks

The file ./etc/init.d/hv_kvp_daemon provides only an example of how to start the daemon, but it cannot be used directly. The script is not tailored to Debian, because it uses distro-specific methods from /etc/init.d/functions such as daemon. Debian offers equivalent functionality in the start-stop-daemon function (source).

Finally, the licensing for the binary fo daemon built by MSFT is ambiguous. Unlike the drivers, which modinfo lists as GPL, the binary makes no statement on its terms of licensing.