As mentioned, svm createslice does three things:
(1) Obtains a set of tickets for a set of nodes from a broker/agent.
(2) Redeems the tickets for leases by contacting each node in the slice.
(3) Uses the leases to create virtual machines for each node in the slice.
In each of these three steps, errors may occur due to nodes that are unreachable (e.g., a node has crashed, the network is down, etc.). In the current implementation, (1) involves communication with a single agent daemon, so this either succeeds or fails. For (2) and (3), however, the service manager, svm, must communicate with multiple nodes in a slice, some of which may be unreachable. An example of a failure in case (2) is shown below. In general, errors are reported with an "Error on www.xxx.yyy.zzz" message while success is reported with a "Success on www.xxx.yyy.zzz" message for each node. (The error message is printed with an extra newline below simply for formatting.)
# svm createslice oceanstore.xml Error on 12.155.161.147: (<class socket.error at 0x824196c>, <socket.error instance at 0x835bb4c>, <traceback object at 0x83595ec>) Success on 169.229.51.250 Success on 169.229.51.252 Success on 169.229.51.251 |
Partials failures for (2) and (3) can be handled using the svm command. Please make sure to read PDN-02-005 before continuing here. If (2) fails, this means that tickets for one or more nodes were not redeemed for leases. If (3) fails, this means that leases for one or more nodes were not successful in creating virtual machines. For each slice, there is a directory that manages the transitions from steps (1) to (2) to (3) (i.e., from tickets to leases to virtual machines) for each node in the slice. For example, for the oceanstore slice:
# cd ~/.planetlab/slices/oceanstore # ls -l total 20 drwxr-xr-x 2 bnc dusers 4096 Feb 19 15:08 leases drwxr-xr-x 2 bnc dusers 4096 Jan 23 15:02 slicekeypair drwxr-xr-x 2 bnc dusers 4096 Jan 24 12:41 sshkeys drwxr-xr-x 2 bnc dusers 4096 Jan 23 15:05 tickets drwxr-xr-x 2 bnc dusers 4096 Jan 23 15:05 vms |
The tickets directory contains unused tickets for the slice (i.e., for nodes where (2) failed). The leases directory contains unused leases (i.e., for nodes where (3) failed). The vms directory contains currently active leases for nodes in a slice where virtual machines have been successfully created. Each directory contains a set of files named by IP address. In doing an svm createslice, tickets for nodes are placed in the tickets directory, tickets get deleted as leases are obtained and placed in the leases directory, and finally, leases are moved from the leases directory to the vms directory as VMs are created.
To handle partial failures for cases (2) and (3), you will need to use svm to issue lower-level commands. In fact, svm createslice really is identical to performing a (a) an svm newtickets operation, (b) an svm newleases operation, and (c) an svm newvms operation, each with the appropriate arguments. As mentioned, (1) always either succeeds or fails since it involves contacting a single node. To handle failures to create leases or create virtual machines (using previously issued leases), you must use these lower-level commands.
To handle failures in (2) (i.e., redeeming tickets for leases), use the svm command to explicitly retry this operation on a set of nodes. For example, suppose the service manager tried to redeem tickets for leases on nodes 150.135.65.3 and 131.215.45.71 for slice oceanstore. The ticket files for these two nodes would thus live in ~/.planetlab/slices/oceanstore/tickets. To retry all failed ticket to lease conversions, do the following:
# svm newleases oceanstore ~/.planetlab/slices/oceanstore/tickets/* |
To handle failures in (3) (i.e., using leases to create VMs), use the svm command to explicitly retry this operation on a set of nodes. For example, suppose the service manager tried to use leases to create VMs on nodes 150.135.65.3 and 131.215.45.71 for slice oceanstore. The lease files for these two nodes would thus live in ~/.planetlab/slices/oceanstore/leases. To retry all failed lease to VMs conversions, do the following:
# svm newvms oceanstore ~/.planetlab/slices/oceanstore/leases/* |
Handling failures in all other cases (e.g., adding a key, removing a key, renewing leases in a slice, etc.) is simply done by retrying the command until it succeeds on all nodes.