Troubleshooting tips

It is important to acknowledge the Data Center Business Unit and TAC for their help in providing feedback to this document

Throughout this lab, we have shown some show commands from the ACI and Kubernets point of view that will be very useful when diagnosting an issue. But now, we would like to take you a little deeper on how to leverage a troubleshooting methodology in case you ever need to use it in your production environment.

We will be using the CLI during this section in order to show the flexibility of ACI. Where users can choose either method, depending of their confort level.

Step 1 - Check Routing Protocol

Since we are running OSPF in the fabric, the first thing I would like to check our OSPF configuration. Once you get this information you could perform your typical NXOS commands in order to check routing tables.

    
        show tenant common vrf k8s_vrf external-l3 ospf detail 
    
    
Area Id     : 0.0.0.3
Tenant      : common
Vrf         : k8s_vrf
User Config : 
        
Node ID  Area Properties                                                        
----     ---------------------------------------------------------------------- 
209      Type: regular, Cost: 1, Control: redistribute,summary                  
210      Type: regular, Cost: 1, Control: redistribute,summary                  
        
 Configuration :          Operational
        
Node ID  Router ID        Route Map         Area Oper. Props                    
----     ---------------  ----------------  ----------------------------------- 
209      10.0.238.15      k8s_out           Type: regular, Cost: 1, Control:    
                                            redistribute,summary, AreaId:       
                                            0.0.0.3                             
210      10.0.10.13       k8s_out           Type: regular, Cost: 1, Control:    
                                            redistribute,summary, AreaId:       
                                            0.0.0.3                             
Interfaces : 
        
Configuration :          Operational
        
Node ID  L3Out             Interface     IP Address       Oper. Intf  Oper. State 
----     ----------------  ------------  ---------------  --------    --------    
209      uni/tn-common     eth1/48       10.15.3.1/30     eth1/48     bdr         
        /out-k8s                                                                 
        
210      uni/tn-common     eth1/48       10.13.3.1/30     eth1/48     dr          
        /out-k8s   

Step 2 - Check for the VXLAN encap and the Policy TAG

In order to be able to troubleshoot we need to identify the VXLAN encap that the VRF is using and the Policy TAG. These outputs will be use as a reference for future commands.

    
        show tenant common vrf k8s_vrf detail 
    
    
VRF Information:
Tenant      VRF         VXLAN Encap  Policy Enforced  Policy Tag  Consumed Contracts    Provided Contracts    Description                              
----------  ----------  ----------   ----------       ----------  --------------------  --------------------  ---------------------------------------- 
common      k8s_vrf     2555909      enforced         49153       -                     -                                                              
    
per-Node Information:
Node        Admin State  Oper State  Oper State Reason  Creation Time                   Modification Time              
----------  ----------   ----------  ---------------    ------------------------------  ------------------------------ 
204         admin-up     up                             2018-04-23T21:03:42.072+00:00   2018-04-23T21:03:42.072+00:00  
207         admin-up     up                             2018-04-23T21:12:55.372+00:00   2018-04-23T21:12:55.372+00:00  
208         admin-up     up                             2018-04-23T21:19:34.191+00:00   2018-04-23T21:19:34.191+00:00  
209         admin-up     up                             2018-05-04T13:14:23.886+00:00   2018-05-04T13:14:23.886+00:00  
210         admin-up     up                             2018-05-03T18:54:45.534+00:00   2018-05-03T18:54:45.534+00:00  

In this particular case my VXLAN encap is 255909 and my Policy TAG is 49153.

Step 3 - Check the IP address of the External IP

If you recall we leveraged the following command a lot throughout this lab guide in order to get the External IP address of the service.


kubectl get svc -o wide --namespace mylabapp

NAME       TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE       SELECTOR
mylabapp   LoadBalancer   10.96.81.15   10.0.146.67   80:30563/TCP   1d        app=mylabapp

Now we need to verify that ACI has installed the correct EXTERNAL-IP 10.0.146.67


show tenant common vrf k8s_vrf external-l3 epg k8s_pod09_svc_mylabapp_mylabapp detail

 Name            Flags                      Match               Node        Entry               Oper State 
 --------------  -------------------------  ------------------  ----------  ------------------  ---------- 
 common:         vxlan: 2555909             10.0.146.67/32      node-208     10.0.146.67/32      enabled    
 k8s_pod09_svc_  vrf: k8s_vrf                                   node-209    10.0.146.67/32      enabled    
 mylabapp_mylab  Target dscp: unspecified                       node-207    10.0.146.67/32      enabled    
 app             qosclass: unspecified                          node-210    10.0.146.67/32      enabled    
                 Contracts                                                                                 
                 ---------                                                                                 
                 Provided: k8s_pod09_svc_m                                                                 
                 ylabapp_mylabapp                                                                          
                 Consumed:   
 

Step 4 - Check the Service Graph

The next step is to verify the correct Service Graph has been implemented. If you recall from the lab guide, this is one of the many benefits ACI has when integrating with Kubernetes. Because ACI automates the creation of this Service Graph.

We covered this portion in the lab guide. Therefore this may look famliar.


show l4l7-graph tenant common graph k8s_pod09_svc_global

Graph           : common-k8s_pod09_svc_global
Graph Instances : 1                          

Consumer EPg  : common-k8s-epg                       
Provider EPg  : common-k8s_pod09_svc_default_mylabapp
Contract Name : common-k8s_pod09_svc_default_mylabapp
Config status : applied                              

Function Node Name : loadbalancer
Service Redirect   : enabled
Connector   Encap       Bridge-Domain  Device Interface      Service Redirect Policy        
----------  ----------  ----------     --------------------  ------------------------------ 
provider    vlan-3209   common-k8s_po  interface             k8s_pod09_svc_default_mylabapp 
                        d09_bd_kubern                                                       
                        etes-service                                                        
consumer    vlan-3209   common-k8s_po  interface             k8s_pod09_svc_default_mylabapp 
                        d09_bd_kubern                                                       
                        etes-service                                                          

show tenant common contract k8s_pod09_svc_default_mylabapp

    Tenant      Contract    Type        Qos Class     Scope       Subject     Access-group  Dir   Description 
    ----------  ----------  ----------  ------------  ----------  ----------  ----------    ----  ----------  
    common      k8s_pod09_  permit      unspecified   vrf         loadbalanc  k8s_pod09_sv  both              
                svc_defaul                                        edservice   c_default_my                    
                t_mylabapp                                                    labapp           

show tenant common access-list k8s_pod09_svc_default_mylabapp  
Tenant      : common
Access-List : k8s_pod09_svc_default_mylabapp
              match tcp dest 80

Step 5 - Check Contract in the Switches

It is time to understand how the switches in the fabric are configured. From the above output, we know that the Border Leafs are 209 and 210. Therefore we need to check their information. The first thing we will do is to check how the Contracts are configured.

ACI has a feature where allows you to query any device in the fabric from the APIC. You could leverage this feature during your troubleshooting. In the previous step that we looked at the VRF of common for the VXLAN number. The following will extract it in one line from the APIC CLI:


show tenant common vrf k8s_vrf detail | grep common |  tr -s ' ' | cut -d" " -f 4

The returned number is the VXLAN identifier for the VRF of common. Save that, you are going to need it soon. We also need to get the PcTag for the L3 EPG k8s_pod09_svc_mylabapp_mylabapp. In order to get value of the PcTag you need to query the APIC. The easiest way to do it is by leveraging a tool called "moquery".


moquery -c l3extInstP | grep -B 3 k8s_pod09_svc_mylabapp_mylabapp | grep pcTag

   
pcTag        : [TAG VALUE]

Now that we have the VXLAN Encap and PcTag, we can query the switches in the fabric to see their contract information. You are going to have to build the next command based on the VXLAN value and the pcTag value.

fabric 210 show zoning-rule | grep [VXLAN VALUE]| grep redir | grep [TAG VALUE]
----------------------------------------------------------------
 Node 210 (L10)
----------------------------------------------------------------
Rule ID         SrcEPG          DstEPG          FilterID        operSt          Scope           Action                              Priority       
=======         ======          ======          ========        ======          =====           ======                              ========  
4117            49153           32807           1713            enabled         2555909         redir(destgrp-5457)                 fully_qual(7)  
4120            32807           15              1797            enabled         2555909         redir(destgrp-5457)                 fully_qual(7)  

Here we are getting the FilterID and redirediction. In this case the FilterID are 1713 and 1797, and the redir destination group is 5457.

Now we identify if the correct filter has been configured in the switch.

fabric 210 show zoning-filter filter [First Filter ID Value]
----------------------------------------------------------------
 Node 210 (L10)
----------------------------------------------------------------
FilterId  Name          EtherT      ArpOpc      Prot        MatchOnlyFrag Stateful SFromPort   SToPort     DFromPort   DToPort     Prio        Icmpv4T     Icmpv6T     TcpRules   
========  ===========   ======      =========   =======     ======        =======  =======     ====        ====        ====        =========   =======     ========    ========   
1713      1713_0        ip          unspecified tcp         no            no       unspecified unspecified http        http        dport       unspecified unspecified  

As you can see this is the Destination HTTP filter that ACI has created because Kubernetes has been configured to use that services.

Now let's check the second filter 1797.

Again, you are going to have to build the command based on the returned value out the cli outupt.

fabric 210 show zoning-filter filter [Second Filter ID Value]
----------------------------------------------------------------
 Node 210 (L10)
----------------------------------------------------------------
FilterId  Name          EtherT      ArpOpc      Prot        MatchOnlyFrag Stateful SFromPort   SToPort     DFromPort   DToPort     Prio        Icmpv4T     Icmpv6T     TcpRules   
========  ===========   ======      =========   =======     ======        =======  =======     ====        ====        ====        =========   =======     ========    ========   
1797      1797_0        ip          unspecified tcp         no            no       http        http        unspecified unspecified sport       unspecified unspecified            

Now this is the Source Port. As you can see ACI has configured these filters for you.

Step 6 - Check the Service Redirediction in the Switches

The next step is to find out where Node 210 is going to send the packets after the Service Graph. Therefore, we need to check the Service Redirediction Policy in Node 210. We will be using the redir destination group we gathered from the above command (Filter command) which returned 5457.

fabric 210 show service redir info group [redir destgrp Value]
----------------------------------------------------------------
    Node 210 (L10)
----------------------------------------------------------------
============================================================================================
LEGEND
TL: Threshold(Low)   |     TH: Threshold(High)   |   HP: HashProfile     |     HG: HealthGrp
============================================================================================
GrpID Name            destination                              HG-name         operSt     operStQual      TL   TH    HP         Tracking
===== ====            ===========                              ==============  =======    ============    ===  ====  ========   ========
5457  destgrp-5457    dest-[10.107.2.4]-[vxlan-2555909]        Not attached    enabled    no-oper-grp     0    0     symmetric  no      
                      dest-[10.107.2.2]-[vxlan-2555909]        Not attached   

As you can see the Service Redirediction Policy is pointing to the two nodes in your Kubernetes cluster. The next step is to find out VNID in order for Node 210 to be sent the packets. In this particular case we are leveraging vnid 2555909

Let's take the first node as an example 10.107.2.4

fabric 210 show service redir info destination [IP from previous cmd] vnid [VRF VXLAN VNID]
----------------------------------------------------------------
    Node 210 (L10)
----------------------------------------------------------------
============================================================================================
LEGEND
TL: Threshold(Low)   |     TH: Threshold(High)   |   HP: HashProfile     |     HG: HealthGrp
============================================================================================
Name                                     bdVnid          vMac                 vrf             operSt     operStQual      HG-name        
====                                     ======          ====                 ====            =====      =========       =======        
dest-[10.107.2.4]-[vxlan-2555909]        vxlan-15138766  0A:ED:2C:3B:22:EB    common:k8s_vrf  enabled    no-oper-dest    Not attached   

Now we are able to gather the VNID where the packets are going to be destined to vxlan-15138766

Step 7 - Check the Local Kubernetes Nodes Connection

We understand that the nodes are located in Node 207 and Node 208. The next step is to find out connection ports. We leverage the VNID we located from the above output.

fabric 207 show vlan extended | grep [bdVnid Value]
52   common:k8s_pod09_bd_kubernetes-  vxlan-15138766   Eth1/7  
fabric 208 show vlan extended | grep [bdVnid Value]
40   common:k8s_pod09_bd_kubernetes-  vxlan-15138766   Eth1/7  

As you can see both hosts are located in port 1/7

Step 8 - Check the Service EndPoint in the node(s)

    
        kubectl describe node pod09-node1 | grep opflex

opflex.cisco.com/pod-network-ranges={"V4":[{"start":"10.207.1.2","end":"10.207.1.129"}]}
opflex.cisco.com/service-endpoint={"mac":"0a:ed:2c:3b:22:eb","ipv4":"10.107.2.4"}

As you can see 10.107.2.4 matches the output of the Service Redirediction Policy. You have succesfully traced the path from the Border Leaf all of the way to the Node(s).