It is important to acknowledge the Data Center Business Unit and TAC for their help in providing feedback to this document
Throughout this lab, we have shown some show commands from the ACI and Kubernets point of view that will be very useful when diagnosting an issue. But now, we would like to take you a little deeper on how to leverage a troubleshooting methodology in case you ever need to use it in your production environment.
We will be using the CLI during this section in order to show the flexibility of ACI. Where users can choose either method, depending of their confort level.
Since we are running OSPF in the fabric, the first thing I would like to check our OSPF configuration. Once you get this information you could perform your typical NXOS commands in order to check routing tables.
show tenant common vrf k8s_vrf external-l3 ospf detail
Area Id : 0.0.0.3 Tenant : common Vrf : k8s_vrf User Config : Node ID Area Properties ---- ---------------------------------------------------------------------- 209 Type: regular, Cost: 1, Control: redistribute,summary 210 Type: regular, Cost: 1, Control: redistribute,summary Configuration : Operational Node ID Router ID Route Map Area Oper. Props ---- --------------- ---------------- ----------------------------------- 209 10.0.238.15 k8s_out Type: regular, Cost: 1, Control: redistribute,summary, AreaId: 0.0.0.3 210 10.0.10.13 k8s_out Type: regular, Cost: 1, Control: redistribute,summary, AreaId: 0.0.0.3 Interfaces : Configuration : Operational Node ID L3Out Interface IP Address Oper. Intf Oper. State ---- ---------------- ------------ --------------- -------- -------- 209 uni/tn-common eth1/48 10.15.3.1/30 eth1/48 bdr /out-k8s 210 uni/tn-common eth1/48 10.13.3.1/30 eth1/48 dr /out-k8s
In order to be able to troubleshoot we need to identify the VXLAN encap that the VRF is using and the Policy TAG. These outputs will be use as a reference for future commands.
show tenant common vrf k8s_vrf detail
VRF Information: Tenant VRF VXLAN Encap Policy Enforced Policy Tag Consumed Contracts Provided Contracts Description ---------- ---------- ---------- ---------- ---------- -------------------- -------------------- ---------------------------------------- common k8s_vrf 2555909 enforced 49153 - - per-Node Information: Node Admin State Oper State Oper State Reason Creation Time Modification Time ---------- ---------- ---------- --------------- ------------------------------ ------------------------------ 204 admin-up up 2018-04-23T21:03:42.072+00:00 2018-04-23T21:03:42.072+00:00 207 admin-up up 2018-04-23T21:12:55.372+00:00 2018-04-23T21:12:55.372+00:00 208 admin-up up 2018-04-23T21:19:34.191+00:00 2018-04-23T21:19:34.191+00:00 209 admin-up up 2018-05-04T13:14:23.886+00:00 2018-05-04T13:14:23.886+00:00 210 admin-up up 2018-05-03T18:54:45.534+00:00 2018-05-03T18:54:45.534+00:00
In this particular case my VXLAN encap is 255909 and my Policy TAG is 49153.
If you recall we leveraged the following command a lot throughout this lab guide in order to get the External IP address of the service.
kubectl get svc -o wide --namespace mylabapp
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
mylabapp LoadBalancer 10.96.81.15 10.0.146.67 80:30563/TCP 1d app=mylabapp
Now we need to verify that ACI has installed the correct EXTERNAL-IP 10.0.146.67
show tenant common vrf k8s_vrf external-l3 epg k8s_pod09_svc_mylabapp_mylabapp detail
Name Flags Match Node Entry Oper State -------------- ------------------------- ------------------ ---------- ------------------ ---------- common: vxlan: 2555909 10.0.146.67/32 node-208 10.0.146.67/32 enabled k8s_pod09_svc_ vrf: k8s_vrf node-209 10.0.146.67/32 enabled mylabapp_mylab Target dscp: unspecified node-207 10.0.146.67/32 enabled app qosclass: unspecified node-210 10.0.146.67/32 enabled Contracts --------- Provided: k8s_pod09_svc_m ylabapp_mylabapp Consumed:
The next step is to verify the correct Service Graph has been implemented. If you recall from the lab guide, this is one of the many benefits ACI has when integrating with Kubernetes. Because ACI automates the creation of this Service Graph.
We covered this portion in the lab guide. Therefore this may look famliar.
show l4l7-graph tenant common graph k8s_pod09_svc_global
Graph : common-k8s_pod09_svc_global Graph Instances : 1 Consumer EPg : common-k8s-epg Provider EPg : common-k8s_pod09_svc_default_mylabapp Contract Name : common-k8s_pod09_svc_default_mylabapp Config status : applied Function Node Name : loadbalancer Service Redirect : enabled Connector Encap Bridge-Domain Device Interface Service Redirect Policy ---------- ---------- ---------- -------------------- ------------------------------ provider vlan-3209 common-k8s_po interface k8s_pod09_svc_default_mylabapp d09_bd_kubern etes-service consumer vlan-3209 common-k8s_po interface k8s_pod09_svc_default_mylabapp d09_bd_kubern etes-service
show tenant common contract k8s_pod09_svc_default_mylabapp
Tenant Contract Type Qos Class Scope Subject Access-group Dir Description ---------- ---------- ---------- ------------ ---------- ---------- ---------- ---- ---------- common k8s_pod09_ permit unspecified vrf loadbalanc k8s_pod09_sv both svc_defaul edservice c_default_my t_mylabapp labapp
show tenant common access-list k8s_pod09_svc_default_mylabapp
Tenant : common
Access-List : k8s_pod09_svc_default_mylabapp
match tcp dest 80
It is time to understand how the switches in the fabric are configured. From the above output, we know that the Border Leafs are 209 and 210. Therefore we need to check their information. The first thing we will do is to check how the Contracts are configured.
ACI has a feature where allows you to query any device in the fabric from the APIC. You could leverage this feature during your troubleshooting. In the previous step that we looked at the VRF of common for the VXLAN number. The following will extract it in one line from the APIC CLI:
show tenant common vrf k8s_vrf detail | grep common | tr -s ' ' | cut -d" " -f 4
The returned number is the VXLAN identifier for the VRF of common. Save that, you are going to need it soon. We also need to get the PcTag for the L3 EPG k8s_pod09_svc_mylabapp_mylabapp. In order to get value of the PcTag you need to query the APIC. The easiest way to do it is by leveraging a tool called "moquery".
moquery -c l3extInstP | grep -B 3 k8s_pod09_svc_mylabapp_mylabapp | grep pcTag
pcTag : [TAG VALUE]
Now that we have the VXLAN Encap and PcTag, we can query the switches in the fabric to see their contract information. You are going to have to build the next command based on the VXLAN value and the pcTag value.
fabric 210 show zoning-rule | grep [VXLAN VALUE]| grep redir | grep [TAG VALUE]
---------------------------------------------------------------- Node 210 (L10) ---------------------------------------------------------------- Rule ID SrcEPG DstEPG FilterID operSt Scope Action Priority ======= ====== ====== ======== ====== ===== ====== ======== 4117 49153 32807 1713 enabled 2555909 redir(destgrp-5457) fully_qual(7) 4120 32807 15 1797 enabled 2555909 redir(destgrp-5457) fully_qual(7)
Here we are getting the FilterID and redirediction. In this case the FilterID are 1713 and 1797, and the redir destination group is 5457.
Now we identify if the correct filter has been configured in the switch.
fabric 210 show zoning-filter filter [First Filter ID Value]
----------------------------------------------------------------
Node 210 (L10)
----------------------------------------------------------------
FilterId Name EtherT ArpOpc Prot MatchOnlyFrag Stateful SFromPort SToPort DFromPort DToPort Prio Icmpv4T Icmpv6T TcpRules
======== =========== ====== ========= ======= ====== ======= ======= ==== ==== ==== ========= ======= ======== ========
1713 1713_0 ip unspecified tcp no no unspecified unspecified http http dport unspecified unspecified
As you can see this is the Destination HTTP filter that ACI has created because Kubernetes has been configured to use that services.
Now let's check the second filter 1797.
Again, you are going to have to build the command based on the returned value out the cli outupt.
fabric 210 show zoning-filter filter [Second Filter ID Value]
----------------------------------------------------------------
Node 210 (L10)
----------------------------------------------------------------
FilterId Name EtherT ArpOpc Prot MatchOnlyFrag Stateful SFromPort SToPort DFromPort DToPort Prio Icmpv4T Icmpv6T TcpRules
======== =========== ====== ========= ======= ====== ======= ======= ==== ==== ==== ========= ======= ======== ========
1797 1797_0 ip unspecified tcp no no http http unspecified unspecified sport unspecified unspecified
Now this is the Source Port. As you can see ACI has configured these filters for you.
The next step is to find out where Node 210 is going to send the packets after the Service Graph. Therefore, we need to check the Service Redirediction Policy in Node 210. We will be using the redir destination group we gathered from the above command (Filter command) which returned 5457.
fabric 210 show service redir info group [redir destgrp Value]
---------------------------------------------------------------- Node 210 (L10) ---------------------------------------------------------------- ============================================================================================ LEGEND TL: Threshold(Low) | TH: Threshold(High) | HP: HashProfile | HG: HealthGrp ============================================================================================ GrpID Name destination HG-name operSt operStQual TL TH HP Tracking ===== ==== =========== ============== ======= ============ === ==== ======== ======== 5457 destgrp-5457 dest-[10.107.2.4]-[vxlan-2555909] Not attached enabled no-oper-grp 0 0 symmetric no dest-[10.107.2.2]-[vxlan-2555909] Not attached
As you can see the Service Redirediction Policy is pointing to the two nodes in your Kubernetes cluster. The next step is to find out VNID in order for Node 210 to be sent the packets. In this particular case we are leveraging vnid 2555909
Let's take the first node as an example 10.107.2.4
fabric 210 show service redir info destination [IP from previous cmd] vnid [VRF VXLAN VNID]
----------------------------------------------------------------
Node 210 (L10)
----------------------------------------------------------------
============================================================================================
LEGEND
TL: Threshold(Low) | TH: Threshold(High) | HP: HashProfile | HG: HealthGrp
============================================================================================
Name bdVnid vMac vrf operSt operStQual HG-name
==== ====== ==== ==== ===== ========= =======
dest-[10.107.2.4]-[vxlan-2555909] vxlan-15138766 0A:ED:2C:3B:22:EB common:k8s_vrf enabled no-oper-dest Not attached
Now we are able to gather the VNID where the packets are going to be destined to vxlan-15138766
We understand that the nodes are located in Node 207 and Node 208. The next step is to find out connection ports. We leverage the VNID we located from the above output.
fabric 207 show vlan extended | grep [bdVnid Value]
52 common:k8s_pod09_bd_kubernetes- vxlan-15138766 Eth1/7
fabric 208 show vlan extended | grep [bdVnid Value]
40 common:k8s_pod09_bd_kubernetes- vxlan-15138766 Eth1/7
As you can see both hosts are located in port 1/7
kubectl describe node pod09-node1 | grep opflex
opflex.cisco.com/pod-network-ranges={"V4":[{"start":"10.207.1.2","end":"10.207.1.129"}]}
opflex.cisco.com/service-endpoint={"mac":"0a:ed:2c:3b:22:eb","ipv4":"10.107.2.4"}
As you can see 10.107.2.4 matches the output of the Service Redirediction Policy. You have succesfully traced the path from the Border Leaf all of the way to the Node(s).