Vagrant, Ansible, Cumulus, and EVPN….Orchestrating a Virtualized Network on your Laptop.

In this post we will talk about how to use Vagrant, Ansible, and Cumulus Linux to build a virtual layer 2 extension across an IP fabric. We will use the standard protocol MP-BGP and its EVPN extension to run the control plane, and VXLAN to provide the virtual data plane.

Each technology deserves a brief introduction prior making things happen.

Vagrant up!

Vagrant is an orchestration tool that uses “providers” to deploy virtual machines to a variety of platforms. Vagrant is probably most famous for its integration with Virtualbox. It’s quite common for cloud and application developers to use Vagrant to build their application stacks locally. Vagrant then makes it easier to then deploy them to production in cloud environments, in an automated fashion. We will use Vagrant to deploy the switches and servers in our topology.

Ansible.

Ansible is another orchestration tool that helps you manage your infrastructure as code. It is agent-less and has probably the best tool box for network automation out of the various orchestration tools. In this post we will use ansible to provision, or configure, our devices.

Cumulus Linux.

Cumulus Networks is a virtual networking company that has made available a machine image running their networking software. This is a linux image that will be run as the switches in our topology. In the real world, you can run this software on a physical switch and operate the topology in the same way, just replace Vagrant with your OPs guy installing and cabling them. They publish the CumulusVX image, along with several automated topologies, so that you can do exactly what we are doing in this post, automate your network with Cumulus.

Last but not least….EVPN.

EVPN stands for Ethernet Virtual Private Network. Its another address-family under the MP-BGP standard. The technology is used as the control plane for transporting and updating the MAC address table to your switches across an IP fabric.

Another technology that should be mentioned, that is not in the title, is VXLAN. We will use VXLAN in the dataplane for encapsulating and transporting our ethernet frames.

Ok, here is the topology we are going to build:

 

On the bottom are our servers. Each server is connected to a single leaf switch in a mode 2 port channel. Many of the labs that Cumulus publishes dual attach the servers to two leaf switches and bonds them using their m-lag implementation. This requires lacp, which unfortunately, I was not able to get working locally. To keep the project moving forward, I implemented a work around which was to change the physical connectivity, and configure all of the bonds as mode 2 (balance-xor).

Cumulus provides some great tools for building these topologies, one of them is their topology converter.

Using their python script topology_converter.py, I took the following “.dot” file and converted it into a Vagrantfile. Vagrant will use this to build, and most importantly connect, all of the instances.

graph dc1 {
 "spine1" [function="spine" config="./helper_scripts/extra_switch_config.sh"]
 "spine2" [function="spine" config="./helper_scripts/extra_switch_config.sh"]
 "leaf1" [function="leaf" config="./helper_scripts/extra_switch_config.sh"]
 "leaf2" [function="leaf" config="./helper_scripts/extra_switch_config.sh"]
 "server1" [function="host" config="./helper_scripts/extra_server_config.sh"]
 "server2" [function="host" config="./helper_scripts/extra_server_config.sh"]
   "spine1":"swp3" -- "leaf1":"swp3"
   "spine1":"swp4" -- "leaf2":"swp4"
   "spine2":"swp5" -- "leaf1":"swp5"
   "spine2":"swp6" -- "leaf2":"swp6"
   "leaf1":"swp40" -- "leaf2":"swp40"
   "leaf1":"swp50" -- "leaf2":"swp50"
   "server1":"eth1" -- "leaf1":"swp1"
   "server1":"eth2" -- "leaf1":"swp2"
   "server2":"eth1" -- "leaf2":"swp1"
   "server2":"eth2" -- "leaf2":"swp2"
}

The Vagrant file that it creates is fairly long and took a number of modifications for what I wanted. Luckily, I’m going to share this with you… so clone this repo and take a look at it yourself. The repo will be needed to complete the build anyway.

The modifications that I made to the file involve commenting out some of the helper scripts, and using Vagrant’s ansible integration to run the playbooks.

~$ git clone https://github.com/tsimson/cumulus_evpn.git
~$
~$ cd cumulus_evpn/
~/cumulus_evpn$
~/cumulus_evpn$

Lets run a few commands:

~/cumulus_evpn$ vagrant status
Current machine states:

spine1                    not created (virtualbox)
spine2                    not created (virtualbox)
leaf1                     not created (virtualbox)
leaf2                     not created (virtualbox)
server1                   not created (virtualbox)
server2                   not created (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

This shows the instances we are about to create…so lets build and provision them!

~/cumulus_evpn$vagrant up

... output ommitted ...

~/cumulus_evpn$vagrant status
Current machine states:

spine1                    running (virtualbox)
spine2                    running (virtualbox)
leaf1                     running (virtualbox)
leaf2                     running (virtualbox)
server1                   running (virtualbox)
server2                   running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
~/cumulus_evpn$ ~/cumulus_evpn$


Our VMs are up…cool!

If you watched vagrant do its magic, you will see that it also ran the ansible playbooks in a consumption model. Each time a machine build is completed, the playbooks are run, the dynamic inventory is matched against the new machine, and ansible deploys the new configuration to the machine. The next machine that is built is provisioned in the same way, but the ansible does not provision any of the previous instances because it has already completed them.

Lets jump one of our servers and verify its working as expected. Here we log into server1 (10.1.1.101) and ping server2 (10.1.1.102).

~/cumulus_evpn$ vagrant ssh server1
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-22-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

271 packages can be updated.
159 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


Last login: Tue Jan 29 06:24:03 2019 from 10.0.2.2
vagrant@server1:~$
vagrant@server1:~$
vagrant@server1:~$ ping 10.1.1.102
PING 10.1.1.102 (10.1.1.102) 56(84) bytes of data.
64 bytes from 10.1.1.102: icmp_seq=1 ttl=64 time=1.55 ms
64 bytes from 10.1.1.102: icmp_seq=2 ttl=64 time=1.88 ms
^C
--- 10.1.1.102 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 1.555/1.719/1.884/0.169 ms
vagrant@server1:~$

It works! If you’ve gotten this far congratulations! You have layer-2 connected a couple of hosts across a layer-3 IP fabric….all using EVPN as the control plane, and vxlan as the virtual dataplane!

Now its time to let our networking geek-birds fly. Lets view the configurations, verify the control and dataplane functionality using command line, and dig even deeper with a review of network captures from the IP fabric.

Here is the leaf control plane configuration. Notice how the neighbor statements refer to interfaces, and not actual neighbor IP addresses. This is because the neighbors are established using an IPv6 link local addressing. Using this strategy, you simply specify the interface and the peer addressing is derived from IPv6 ND/RD.

vrf RED
 vni 104001
 exit-vrf
!
vrf BLUE
 vni 104002
 exit-vrf
!
router bgp 65101
 bgp router-id 10.255.255.11
 neighbor swp4 interface remote-as external
 neighbor swp5 interface remote-as external
 !
 address-family ipv4 unicast
  redistribute connected route-map LOOPBACK_ROUTES
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor swp4 activate
  neighbor swp5 activate
  advertise-all-vni
 exit-address-family
!
route-map LOOPBACK_ROUTES permit 10
  match interface lo

Here is the interface configuration:

vagrant@leaf1:~$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
    address 10.255.255.11/32

auto vagrant
iface vagrant inet dhcp

auto eth0
iface eth0 inet dhcp

auto swp1
iface swp1

auto swp2
iface swp2

...

######
### Define VRF and L3VNI
######
auto RED
iface RED
    vrf-table auto

auto SERVER01
iface SERVER01
    bond-slaves swp2 swp3
    bond-mode balance-xor
    bridge-access 10

#####
### Define bridge
#####
auto bridge
iface bridge
    bridge-ports SERVER01 VXLAN10
    bridge-vids 10
    bridge-vlan-aware yes

#####
### Define VXLAN interfaces
#####
auto VXLAN10
iface VXLAN10
    bridge-access 10
    bridge-arp-nd-suppress on
    bridge-learning off
    mstpctl-bpduguard yes
    mstpctl-portbpdufilter yes
    vxlan-id 10010
    vxlan-local-tunnelip 10.255.255.11

auto vlan10
iface vlan10
    address 10.1.1.2/24
    address-virtual 00:00:00:00:00:1a 10.1.1.1/24
    vlan-id 10
    vlan-raw-device bridge
    vrf RED

In the above snippet, we combine our vlan10, VXLAN10, and SERVER01 into one bridge domain….named “bridge”. Our Layer3 interface, vlan 10, is assigned to the RED vrf.

So there’s the config, lets verify it at the command line.

~/cumulus_evpn$ vagrant ssh leaf1

Welcome to Cumulus VX (TM)

Cumulus VX (TM) is a community supported virtual appliance designed for
experiencing, testing and prototyping Cumulus Networks' latest technology.
For any questions or technical support, visit our community site at:
http://community.cumulusnetworks.com

The registered trademark Linux (R) is used pursuant to a sublicense from LMI,
the exclusive licensee of Linus Torvalds, owner of the mark on a world-wide
basis.
vagrant@leaf1:~$ sudo net show bgp evpn route
BGP table version is 20, local router ID is 10.255.255.11
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[ESI]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 10.255.255.11:3
*> [2]:[0]:[0]:[48]:[44:38:39:00:00:09]
                    10.255.255.11                      32768 i
*> [2]:[0]:[0]:[48]:[44:38:39:00:00:09]:[32]:[10.1.1.101]
                    10.255.255.11                      32768 i
*> [2]:[0]:[0]:[48]:[44:38:39:00:00:09]:[128]:[fe80::4638:39ff:fe00:9]
                    10.255.255.11                      32768 i
*> [2]:[0]:[0]:[48]:[46:38:39:00:00:07]
                    10.255.255.11                      32768 i
*> [2]:[0]:[0]:[48]:[46:38:39:00:00:09]
                    10.255.255.11                      32768 i
*> [3]:[0]:[32]:[10.255.255.11]
                    10.255.255.11                      32768 i
Route Distinguisher: 10.255.255.12:3
*> [2]:[0]:[0]:[48]:[44:38:39:00:00:0d]
                    10.255.255.12                          0 65201 65102 i
*  [2]:[0]:[0]:[48]:[44:38:39:00:00:0d]
                    10.255.255.12                          0 65201 65102 i
*> [2]:[0]:[0]:[48]:[44:38:39:00:00:0d]:[32]:[10.1.1.102]
                    10.255.255.12                          0 65201 65102 i
*  [2]:[0]:[0]:[48]:[44:38:39:00:00:0d]:[32]:[10.1.1.102]
                    10.255.255.12                          0 65201 65102 i
*> [2]:[0]:[0]:[48]:[44:38:39:00:00:0d]:[128]:[fe80::4638:39ff:fe00:d]
                    10.255.255.12                          0 65201 65102 i
*  [2]:[0]:[0]:[48]:[44:38:39:00:00:0d]:[128]:[fe80::4638:39ff:fe00:d]
                    10.255.255.12                          0 65201 65102 i
*  [2]:[0]:[0]:[48]:[46:38:39:00:00:0d]
                    10.255.255.12                          0 65201 65102 i
*> [2]:[0]:[0]:[48]:[46:38:39:00:00:0d]
                    10.255.255.12                          0 65201 65102 i
*  [2]:[0]:[0]:[48]:[46:38:39:00:00:11]
                    10.255.255.12                          0 65201 65102 i
*> [2]:[0]:[0]:[48]:[46:38:39:00:00:11]
                    10.255.255.12                          0 65201 65102 i
*  [3]:[0]:[32]:[10.255.255.12]
                    10.255.255.12                          0 65201 65102 i
*> [3]:[0]:[32]:[10.255.255.12]
                    10.255.255.12                          0 65201 65102 i

Displayed 12 prefixes (18 paths)
vagrant@leaf1:~$

Look closely, the bgp evpn family is talking about MAC addresses. Its associating each MAC with a loopback address as the next hop. VXLAN will use this information to establish the overlay in the dataplane. Based on this output, the control plane seems to be working.

Next….we capture the dataplane and test fault tolerance. We are using ECMP across our uplinks so we will have to shut one path down to make sure our captures are plentiful.

~/cumulus_evpn$ vagrant destroy spine1 -f
==> spine1: Forcing shutdown of VM...
==> spine1: Destroying VM and associated drives...
tsimson@GMTI-Desktop-Tims-MacBook-Pro:~/cumulus_evpn$ vagrant ssh server1
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-22-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

271 packages can be updated.
159 updates are security updates.

New release '18.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


Last login: Tue Jan 29 08:35:20 2019 from 10.0.2.2
vagrant@server1:~$ ping 10.1.1.1
PING 10.1.1.1 (10.1.1.1) 56(84) bytes of data.
64 bytes from 10.1.1.1: icmp_seq=1 ttl=64 time=0.388 ms
^C
--- 10.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.388/0.388/0.388/0.000 ms
vagrant@server1:~$ ping 10.1.1.102
PING 10.1.1.102 (10.1.1.102) 56(84) bytes of data.
64 bytes from 10.1.1.102: icmp_seq=1 ttl=64 time=1.93 ms
64 bytes from 10.1.1.102: icmp_seq=2 ttl=64 time=1.66 ms

We toasted spine1 and the network continues to roll…awesome.

We can now see that we are single threaded through spine2…and can be sure we are capturing everything through a single interface.

~/cumulus_evpn$ vagrant ssh leaf1

vagrant@leaf1:~$ sudo net show bgp sum

show bgp ipv4 unicast summary
=============================
BGP router identifier 10.255.255.11, local AS number 65101 vrf-id 0
BGP table version 7
RIB entries 3, using 456 bytes of memory
Peers 2, using 39 KiB of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
spine1(swp4)    4      65201    2354    2363        0    0    0 00:15:48      Connect
spine2(swp5)    4      65201    2648    2657        0    0    0 01:07:25            1

Total number of neighbors 2


show bgp ipv6 unicast summary
=============================
% No BGP neighbors found


show bgp l2vpn evpn summary
===========================
BGP router identifier 10.255.255.11, local AS number 65101 vrf-id 0
BGP table version 0
RIB entries 3, using 456 bytes of memory
Peers 2, using 39 KiB of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
spine1(swp4)    4      65201    2354    2363        0    0    0 00:15:48      Connect
spine2(swp5)    4      65201    2648    2657        0    0    0 01:07:25            6

Total number of neighbors 2
vagrant@leaf1:~$

Time to take captures.

vagrant@leaf1:~$ sudo tcpdump -npi swp5 -w BB5.pcap && ping 10.1.1.102
tcpdump: listening on swp5, link-type EN10MB (Ethernet), capture size 262144 bytes

...

vagrant@leaf1:~$ scp BB5.pcap tsimson@10.0.2.2:

Here’s our capture. I’ve filtered everything but some ICMP traffic. Check out the VXLAN header, more specifically the VNI.  As defined by RFC-7348:

“Each VXLAN segment is identified through a 24-bit segment ID, termed the “VXLAN Network Identifier (VNI)”. This allows up to 16 M VXLAN segments to coexist within the same administrative domain. The VNI identifies the scope of the inner MAC frame originated by the individual VM.”

Once again we made it to the end of another pretty cool post. The use cases for combining Vagrant, Ansible, and Cumulus Linux are vast. In future posts, I hope to build on this topology by establishing routing between networks, to external networks, and by implementing security within the fabric.

I had an absolute blast building and sharing this environment. I… and hopefully we… learned a ton!

 

Using GnuPG to Handle Your Network Automation Credentials!

One thing I struggled a long time with is the following:

How do we code our network while securely handling our device credentials? How do we do this in a way that is highly collaborative?

Here’s one issue that I ran into. It is easy to get roped into baking your credentials into a script (completely guilty here). But what happens when it’s time to deliver your code to a colleague, or even an external customer? You will need to refactor your code to deal with the AAA credentials that are displayed (plaintext) in your code.

With python and GnuPG, we can securely deal with device credentials in sharable code.

One of my favorite parts about this strategy is thinking about the extensibility of GnuPG….particularly with its ability send and receive secure messages. This post won’t dive into that much. Instead we’ll stick to the following objectives:

  1. Install GnuPG, the associated python libraries, and generate keys.
  2. Build an encrypted credentials file in yaml or json.
  3. Use python to interface with your keys and securely load your credentials.

Ok… that was highly summarized..let’s get into the details:

Installing gpg via brew…there is more chatter in real life, but this is a blog:

$ xcode-select --install

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

$ brew doctor
Your system is ready to brew.

$ brew install gpg

Installing the required python libraries.

$ sudo easy_install pip

$ sudo pip install python-gpg

 

Generating keys…Please read the entire section before starting.

This step generates a public and private key, in the .gnupg folder. When you proceed to using this in code, you encrypt with the specified users public key, and decrypt with your own private key.

Run this command and follow the self explanatory prompts. Be advised that not generating a passphrase is less secure. In this scenario I’m treating my keys like ssh rsa keys and giving them file permissions of 600.

$
$gpg --gen-key
$

 

Cool…lets play with gnupg in the interpreter:

We specify our .gnupg location and begin to interact with our keys:

$ python
Python 2.7.10 (default, Oct  6 2017, 22:29:07)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gnupg
>>>
>>> gpg = gnupg.GPG(gnupghome='/Users/simsondm/.gnupg')
>>>
>>> gpg.list_keys()
[{'origin': u'0', 'cap': u'scESC', 'subkeys': [[u'7D48010AC98F2356', u'e', u'734616DF402948BACFB178E67D48010AC98F2356']], 'sigs': [], 'subkey_info': {u'7D48010AC98F2356': {'origin': 'unavailable', 'dummy': u'', 'updated': u'', 'keyid': u'7D48010AC98F2356', 'hash': u'', 'uid': u'', 'expires': u'1610070480', 'curve': u'', 'flag': u'', 'length': u'2048', 'ownertrust': u'', 'sig': u'', 'algo': u'1', 'compliance': u'23', 'date': u'1546998480', 'trust': u'u', 'type': u'sub', 'cap': u'e', 'token': u'', 'issuer': u''}}, 'trust': u'u', 'issuer': u'', 'ownertrust': u'u', 'token': u'', 'sig': u'', 'type': u'pub', 'updated': u'', 'hash': u'', 'expires': u'1610070480', 'flag': u'', 'fingerprint': u'20041D5DF00676FA83278BFCAE3DB80A68069FDB', 'date': u'1546998480', 'dummy': u'', 'keyid': u'AE3DB80A68069FDB', 'uids': [u'Timothy Simson <simsontj@yahoo.com>'], 'compliance': u'23', 'curve': u'', 'length': u'2048', 'algo': u'1'}]
>>>

Lets encrypt some stuff. We setup a string to encrypt and perform the encryption with the gpg.encrypt() function. We also have ways to make sure the encryption worked, and see the encrypted object.

>>>
>>> unc = 'This is a cool test of encryption'
>>>
>>> enc = gpg.encrypt(unc, 'simsontj@yahoo.com')
>>>
>>> enc.status
'encryption ok'
>>>
>>> print enc
-----BEGIN PGP MESSAGE-----

hQEMA31IAQrJjyNWAQf/WqyJHdwJ5gBePahCwqd/dKaSXFnHYgppdHkf9J7Iygwp
yz9gY0I1tKhmADJp6zsMBGzh0vFdotswk21BzEAkzpXzmFTaK64TF5gFfhEHeoim
i7ZRuPRVzHUm7+ayrpexKyZjEbThmWJmTIQVt1+jAQKAb+I7i9qYqdkmHaUL5FUf
DvhNKwMFK+VltMhLxQs+IEa+IZTp4pbA+pWfPwhc9lwUFQrTtEeWc6Jzx0BaC2xR
kkas5D5oVT2vJidpu3Dsv2Ydt1RWOz9mbP269W6XDfQTHC5xqfeuyVOF7NIx2b76
i0F7imBmQtbIeklDjljwXePP/Lhde1PCX6VpIyydRtJaAWYxG/suh92TzodPqLr5
IcS2MO5P+EzwgTRPg0YAfefe9JR4dh2b7WaiHZSsYXyCkYY5kmV0bwZs8XTVTNBt
3jOKP81Ymc3yjcci3DjxdtO9gmUY/XmLDF6C
=7chC
-----END PGP MESSAGE-----
>>>

Yes this is an object!

>>>
>>> enc
<gnupg.Crypt object at 0x105a75590>
>>>

That means you have to convert to a string with the str() function to decrypt it…you guessed it, that’s next:

>>>
>>> test_unc = gpg.decrypt(str(enc))
>>>
>>> print test_unc
This is a cool test of encryption
>>>

 

Ok, we have gnupg working in python and bash. How do we automate our network credentials?

First we need to encrypt credentials_file.txt from bash:
Here is the credentials file:

---
cisco_user: ciscoUsername
cisco_pass: ciscoP4ssword

Here’s how we encrypt it.

$
$ gpg --output credentials_file.gpg --encrypt --recipient blake@cyb.org credentials_file.txt
$

 

And here is our python…we made it!

There might be a lot to look at below, but look at the “Decrypt/Load credentials” section. We’re automating our network credentials securely! The creds are loaded and used by the connection handler…in code that’s shareable.

This script deploys a new vlan to a datacenter ethernet fabric, and ensures the new vlan is available via 802.1q tag to a pre-specified Vmware cluster.

import os
import gnupg
import yaml
from jinja2 import Environment
import argparse
import sys
import json
import re
from netmiko import ConnectHandler

### Take command line arguments, VLAN_ID, and VLAN_NAME.

parser = argparse.ArgumentParser()
parser.add_argument("VLAN_ID", help="Required. Vlan ID.")
parser.add_argument("VLAN_NAME", help="Required. Name of the vlan.")
args = parser.parse_args()

### Decrypt/Load credentials .yml file using gnupg

creds_list = []
gpg = gnupg.GPG(gnupghome='/Users/tsimson/.gnupg')
creds_stream = open('credentials_file.gpg', 'rb')
creds_decrypt = gpg.decrypt_file(creds_stream)
creds_gen = yaml.load_all(str(creds_decrypt))

for i in creds_gen:
    creds_list = i

### Establish switch inventory

datacenter_nexus_1_2 = [
{"name": "DataCenter_Nexus-1", "ip": "172.38.99.37"},
{"name": "DataCenter_Nexus-2", "ip": "172.38.99.38"}
]

# Establish interface inventory.

cluster_interface_list = [
{"int_id": "Port-channel15", "int_desc": "FABRIC_UPLINK"},
{"int_id": "Eth132/1/1", "int_desc": "DataCenter_ESX602"},
{"int_id": "Eth132/1/8", "int_desc": "DataCenter_ESX611"},
{"int_id": "Eth132/1/9", "int_desc": "DataCenter_ESX607"},
{"int_id": "Eth132/1/11", "int_desc": "DataCenter_ESX604"},
{"int_id": "Eth132/1/15", "int_desc": "DataCenter_ESX605"},
{"int_id": "Eth132/1/16", "int_desc": "DataCenter_ESX610"},
{"int_id": "Eth133/1/1", "int_desc": "DataCenter_ESX601"},
{"int_id": "Eth133/1/8", "int_desc": "DataCenter_ESX609"},
{"int_id": "Eth133/1/9", "int_desc": "DataCenter_ESX608"},
{"int_id": "Eth133/1/10", "int_desc": "DataCenter_ESX612"},
{"int_id": "Eth133/1/2", "int_desc": "DataCenter_ESX614"},
{"int_id": "Eth135/1/10", "int_desc": "DataCenter_ESX602"},
{"int_id": "Eth133/1/3", "int_desc": "DataCenter_ESX603"},
{"int_id": "Eth133/1/16", "int_desc": "DataCenter_ESX606"},
{"int_id": "Eth135/1/3", "int_desc": "DataCenter_ESX607"}
]

### Run tests to ensure environment is ready for automated configurations.

for i in datacenter_nexus_1_2:
    net_connect = ConnectHandler(device_type="cisco_nxos", ip=i['ip'], username=creds_list['cisco_user'], password=creds_list['cisco_pass'])
    existing_vlan = net_connect.send_command('show vlan | in "^' + args.VLAN_ID + ' "')
    print 'show vlan | in "^' + args.VLAN_ID + ' "'
    if existing_vlan:
        print "Vlan already exists...aborting"
        quit()
    else:
        print "No vlan conflict detected...proceeding"

### Establish vlan configuration template

vlan_config_template = """

vlan {{ VLAN_ID }}
name {{ VLAN_NAME }}

"""

### Establish interface configuration template.

interface_config_template = """

interface {{ INTERFACE_ID }}
switchport trunk allowed vlan add {{ VLAN_ID }}

"""

### Function to configure vlan on nexus switch

def configure_doc_nexus_vlan(device_name, ip, username, password, vlan_id, vlan_name):
    print "Connecting to " + device_name + " !!!"
    net_connect = ConnectHandler(device_type="cisco_nxos", ip=ip, username=username, password=password)
    print "Configuring " + device_name + " !!!"
    vlan_config_return = net_connect.send_config_set(Environment().from_string(vlan_config_template).render(VLAN_NAME=vlan_name, VLAN_ID=vlan_id))
    print vlan_config_return

for i in datacenter_nexus_1_2:
    configure_doc_nexus_vlan(i["name"], i["ip"], creds_list['cisco_user'], creds_list['cisco_pass'], args.VLAN_ID, args.VLAN_NAME)

### Function to configure cluster interfaces to carry new vlan.

def config_doc_nexus_interfaces(device_name, ip, username, password, vlan_id):
    print "Connecting to " + device_name + " !!!"
    net_connect = ConnectHandler(device_type="cisco_nxos", ip=ip, username=username, password=password)
    print "Configuring " + device_name + " !!!"
    config = ''
    for i in cluster_interface_list:
        config = config + Environment().from_string(interface_config_template).render(INTERFACE_ID=i['int_id'], VLAN_ID=vlan_id)
    interface_config_return = net_connect.send_config_set(config)
    print interface_config_return

for i in datacenter_nexus_1_2:
    config_doc_nexus_interfaces(i["name"], i["ip"], creds_list['cisco_user'], creds_list['cisco_pass'], args.VLAN_ID)

### Run tests to ensure configuration are sane before saving.

### Envisioning a spanning tree check here.

### Save the configurations.

for i in datacenter_nexus_1_2:
    net_connect = ConnectHandler(device_type="cisco_nxos", ip=i['ip'], username=creds_list['cisco_user'], password=creds_list['cisco_pass'])
    save_configs = net_connect.send_command('copy run start')

Automate Where it Makes Sense…or… It Makes Sense to Automate?

Automate Where it Makes Sense…or… It Makes Sense to Automate?

Well certainly the layman would say to automate where it makes sense…but why not drive your network to a place where it makes sense to automate? Transform your network to one that’s conducive to automation, and the code will flow freely. Like the infamous Dan Bilzerian once said “Its all about setup”.

In many cases its useful to run a script to change several network devices, but I believe many stop here when it comes to network scripting, and fail to see the benefits of an “automate everything” culture. One time use scripts have their place, but driving a code based architecture …and culture…helps to drive opportunities for automation. This approach can be daunting up front, but the investment will pay off if executed properly.

The way I envision a successful transition to a code based network, would be to follow a three step process.

  1. Standardize and document EVERYTHING. (Well that’s not very sexy, when do we start coding?)

This doesn’t seem very fun or exciting, but its absolutely critical. Picture this, you want to deploy some new SNMP configurations to your 2000 routers and switches, but many of them are different vendors, and have different AAA configurations. Congratulations, we’ve just hit the first non starter for automation. Get the picture?

You can consider the potential for automation a direct function of how standardized your network is. The documentation of these standards will translate directly to business rules to write code to.

  1. Instantiate all configuration data as structured data.

Right now you probably have a configuration management platform that is backing up all of your configurations as text files. This is great, but we need to be able to have our code (or orchestration tools) act on this data in meaningful and efficient ways. The goal of this step is to take your configurations, and split them into variables, and parameterized templates. The below example was built for ansible, but a python script can apply the variables to the jinja2 template just as well.

Here’s my parameterized_template.yml:

{% if dhcp_server %}

{% for exclude in dhcp_exclusions %}
ip dhcp excluded-address {{ exclude['exclude_1'] }} {{ exclude['exclude_2'] }}
{% endfor %}

{% for pool in dhcp_pools %}
ip dhcp pool {{ pool['name'] }}
 network {{ pool['network'] }}
 default-router {{ pool['default_router'] }}
 dns-server {{ pool['dns_server'] }}
 option 60 ascii "{{ pool['option_60_ascii'] }}"
 option 43 hex {{ pool['option_43_hex'] }}
 lease 2
{% endfor %}

{% endif %}

Here’s my variables.yml:

dhcp_exclusions:
  - exclude_1: 10.35.63.1
    exclude_2: 10.35.63.24
  - exclude_1: 10.35.63.100
    exclude_2: 10.35.63.255
  - exclude_1: 10.35.64.1
    exclude_2: 10.35.64.24
  - exclude_1: 10.35.64.100
    exclude_2: 10.35.64.255
  - exclude_1: 10.35.65.1
    exclude_2: 10.35.65.24
  - exclude_1: 10.35.65.100
    exclude_2: 10.35.65.255

dhcp_pools:
  - name: native_wap
    network: "10.35.63.0 255.255.255.0"
    domain_name: us.ad.submarine.com
    default_router: 10.35.63.1
    option_60_ascii: "Cisco AP c1140"
    option_43_hex: f108.0ae6.0805
    dns_server: "10.35.203.132 172.20.0.41"

  - name: production_internal_vlan
    network: "10.35.64.0 255.255.255.0"
    domain_name: us.ad.submarine.com
    default_router: 10.35.64.1
    option_60_ascii: "Cisco AP c1140"
    option_43_hex: f108.0ae6.0805
    dns_server: "10.35.203.132 172.20.0.41"

  - name: guest_vlan
    network: "10.35.65.0 255.255.255.0"
    domain_name: us.ad.submarine.com
    default_router: 10.35.65.1
    option_60_ascii: "Cisco AP c1140"
    option_43_hex: f108.0ae6.0805
    dns_server: "208.67.222.222 208.67.220.220"

  1. Manage your structured data like an application, not a network.

This is where I get to throw buzzwords like Agile and DevOps out there. These terms are used to describe methodologies for software development. I won’t go into the details of each one on this post, but the takeaway is that your network is now an application. Each configuration snippet should be treated as a software feature.

For example, we want our application to to use ISP2, when ISP1 is down. How should we code this feature and deploy it? How can we unit test this code? How can we roll it back if the deployment goes bad? How can we canary test the deployment to reveal issues early with minimal business impact (fail early fail often)?

The coming posts will aim to answer all of these questions…so stay tuned!

Today I added a WordPress blog to my site…and it was pretty good.

Today I added a WordPress blog to my site…cool story bro.

I wanted to bolt a blog on to my existing site over running a separate WordPress instance. The reason for this was basically to increase my skills with GCP, ansible, PHP, Python and MySQL.

If all you are after is to get a WordPress site up and running, then its probably much easier to go with the GCP Marketplace WP instance and walk away. I initially tried this and it was almost too easy. You more or less fill out a form and its built for you, and all of the automation behind this is available and unit tested.

That being said my work is not fully absent of a quest for shortcuts. I played with the Google Marketplace instance, a canned WP ansible role, and installing WP manually. At this point I don’t remember what I ended with, but it didn’t matter once I learned how to deploy the existing site repeatably. The key here was two google buckets, one for the WP content files, and another for a restorable database dump.

Here is how this site is built and deployed:

The site is deployed across two Google Cloud instances. The first runs nginx and PHP. It serves a static webpage, as well as the WordPress site. The second is a database instance.

The two instances are fully orchestrated using ansible. When you run the playbook, it builds the two instances, and deploys roles to them.

The playbook pulls the static/wordpress content, along with a database dump from Google Storage buckets.

To facilitate updates and repeatability, a python script is used to keep the buckets updated.

Security??…Gosh I hope so.
Our variables are encrypted using Ansible Vault. Whats cool is under the hood, we use a private-key file to encrypt everything…no passphrase at command line:

Deploying to Google Cloud using Ansible.

Here is the primary playbook:

- name: Create instance(s)
  hosts: localhost
  connection: local
  gather_facts: yes

  vars:
    service_account_email: XXXXXX-compute@developer.gserviceaccount.com
    credentials_file: /Users/tsimson/project-name.json
    project_id: project-name
    machine_type: g1-small
    image: centos-7
  tasks:

   - name: Launch Web Host
     gce:
         network: ts-tech-vpc-1
         subnetwork: ts-tech-vpc-1
         zone: us-east1-b
         instance_names: ts-web-host-1
         machine_type: "{{ machine_type }}"
         image: "{{ image }}"
         service_account_permissions: storage-full
         service_account_email: "{{ service_account_email }}"
         credentials_file: "{{ credentials_file }}"
         project_id: "{{ project_id }}"
         ip_forward: True
         tags: [ssh, http-server, https-server, subnet]
     register: gce_web_host_1
   - debug: var=gce_web_host_1

   - name: Wait for SSH to come up
     wait_for: host={{ item.public_ip }} port=22 delay=10 timeout=300
     with_items: "{{ gce_web_host_1.instance_data }}"

   - name: Add host to groupname
     add_host: hostname={{ item.name }} ansible_ssh_host={{ item.public_ip }} groupname=ts-web-hosts
     with_items: "{{ gce_web_host_1.instance_data }}"

   - name: Launch DB Host
     gce:
         network: ts-tech-vpc-1
         subnetwork: ts-tech-vpc-1
         zone: us-east1-b
         instance_names: ts-db-host-1
         machine_type: "{{ machine_type }}"
         image: "{{ image }}"
         service_account_email: "{{ service_account_email }}"
         credentials_file: "{{ credentials_file }}"
         project_id: "{{ project_id }}"
         ip_forward: True
         tags: [ssh, subnet]
     register: gce_db_host_1
   - debug: var=gce_db_host_1

   - name: Wait for SSH to come up
     wait_for: host={{ item.public_ip }} port=22 delay=10 timeout=300
     with_items: "{{ gce_db_host_1.instance_data }}"

   - name: Add host to groupname
     add_host: hostname={{ item.name }} ansible_ssh_host={{ item.public_ip }} groupname=ts-db-hosts
     with_items: "{{ gce_db_host_1.instance_data }}"

- name: Manage web-host
  hosts: ts-web-hosts
  connection: ssh
  sudo: True
  roles:
    - role: web_host_role
    - role: wordpress

- name: Manage db-host
  hosts: ts-db-hosts
  connection: ssh
  sudo: True
  roles:
    - role: db_host_role

Here is the directory structure that ansible uses:

ts_tech_web_public$ tree
.
├── README.md
├── ansible.cfg
├── create_ts_tech_website.retry
├── create_ts_tech_website.yml
├── group_vars
│   └── all
├── host_vars
├── hosts
└── roles
    ├── db_host_role
    │   ├── files
    │   │   └── my.cnf
    │   ├── tasks
    │   │   └── main.yml
    │   └── templates
    ├── web_host_role
    │   ├── files
    │   ├── tasks
    │   │   └── main.yml
    │   └── templates
    │       ├── nginx.conf
    │       └── wp_db_gb_sync.py
    └── wordpress
        ├── README.md
        ├── defaults
        │   └── main.yml
        ├── files
        ├── handlers
        │   └── main.yml
        ├── meta
        │   └── main.yml
        ├── tasks
        │   └── main.yml
        ├── templates
        │   └── wp-config.php
        ├── tests
        │   ├── inventory
        │   └── test.yml
        └── vars
            └── main.yml

20 directories, 20 files

This python script is run to keep the content and database updated off platform. You run this after making live changes to maintain repeatability:

Python script …for the boys:   wp_db_gb_sync.py

from google.cloud import storage
import os
import glob
from subprocess import Popen
import requests
import time

storage_client = storage.Client()
ts_wp_files_bucket = storage_client.get_bucket('ts-wp-files')
ts_wp_db_dump_bucket = storage_client.get_bucket('ts-wp-db-dump')
test_path = ''

os.chdir('/root')

Popen('mysqldump -h {{ hostvars['localhost']['gce_db_host_1']['instance_data'][0]['private_ip'] }} -u {{ mysql_user }} -p{{ mysql_pass }} --databases ts_tech_db > ts_tech_db.sql', shell=True)

response = requests.request("GET", "http://metadata/computeMetadata/v1/instance/network-interfaces/0/access-configs/0/external-ip", headers={"Metadata-Flavor": "Google"})

pub_ip = response.content

time.sleep(5)

with open("ts_tech_db.sql", 'r') as f:
    s = f.read()
    x = s.replace(pub_ip, "www.tangosierratech.com")

with open("ts_tech_db.sql", 'w') as f:
    f.write(x)

def copy_local_file_to_gcs(bucket, local_file):
    blob = bucket.blob(local_file)
    blob.upload_from_filename(local_file)

copy_local_file_to_gcs(ts_wp_db_dump_bucket, 'ts_tech_db.sql')

os.chdir('/var/www/')

def copy_local_directory_to_gcs(bucket, local_path):
    """Recursively copy a directory of files to GCS.

    local_path should be a directory and not have a trailing slash.
    """
    assert os.path.isdir(local_path)
    def walk(local_path):
        for path in glob.glob(local_path + '/**'):
            if os.path.isdir(path):
                walk(path)
            else:
                if test_path:
                    remote_path = os.path.join(test_path, path)
                    print remote_path
                    blob = bucket.blob(remote_path)
                    blob.upload_from_filename(path)
                else:
                    remote_path = path
                    print remote_path
                    blob = bucket.blob(remote_path)
                    blob.upload_from_filename(path)

    walk(local_path)

copy_local_directory_to_gcs(ts_wp_files_bucket, 'html')