Border Gateway Protocol (BGP) - Part 1/n
An Overview of BGP
Let’s face it - Border Gateway Protocol is just incredibly unique, especially when we compare it to other routing protocols. The very first thing that makes BGP so unique, is what it does for us. It is our only Exterior Gateway Protocol (EGP) in major use today. We know we have our Interior Gateway Protocols (IGPs), and that would be like OSPF running inside of an autonomous system. But BGP is an EGP, which means that it is (usually) going to take prefixes that are inside an autonomous system and send those to other autonomous systems. Figure 1 shows a sample BGP topology.
Figure 1: A Sample BGP Topology
This is why BGP is the protocol that makes the Internet function. Internet Service Providers (ISPs) can use BGP in order to move prefix information between other Internet Service Providers. The unique characteristics of BGP don't just stop there, though. One of the things that is very unique about the protocol is that it forms point-to-point peerings with other BGP speakers, and you must create these peerings manually.
With Border Gateway Protocol there is no such thing as forming a neighborship with a whole bunch of devices on the same segment automatically. For each of the devices BGP needs to peer with, it does so by using a single peer-to-peer relationship, what we prefer to call a BGP peering.
Another very unique property involves the fact that BGP is an application layer protocol. Admittedly, most network engineers would wager that it is a network layer protocol – and they would lose that bet!
As an application layer component, BGP does something brilliant. It leverages Transmission Control Protocol (TCP) for its operations. If we examine EIGRP as an example, the creators had to take great pains to build reliability into the protocol itself. For example, an EIGRP speaker will multicast transmissions, and if that's not working out, it will fall back to unicast transmissions, as a way to try and ensure reliability.
With Border Gateway Protocol, the designers decided not to engineer all those types of reliability controls into the protocol. They just rely on the wonders that are the reliable communications of TCP. Specifically, BGP uses TCP port 179.
When we think of our routing protocols, we know that there is going to be some value that is going to serve as a metric value to measure distance. For instance, in the case of OSPF, we know the metric is cost, and cost is based directly on bandwidth.
BGP does not work that way at all. BGP uses attributes instead of just a single metric, and one of the prime attributes of BGP is called the AS Path attribute. This is a list of all of the autonomous systems (AS) that a prefix has had to transit on its journey to, let's say, your autonomous system.
The AS path is effectively a recording of all of this AS path information. The AS path is so critical to the function of BGP, that the protocol is often referred to as a Path Vector routing protocol. Notice this is not a Distance Vector protocol, but a Path Vector. The AS path is not only used to gauge what would be the best path to a destination (i.e. the shorter AS path), it is also used as a loop prevention mechanism.
When an autonomous system sees its own AS number in the AS path, it gets very concerned that there may be a loop in the communications. Something else that makes BGP incredibly unique is the fact that when we form peerings inside an autonomous system, these are called Internal BGP peerings, and the rules that are followed are Internal BGP (IBGP) rules. (Note: Some literature writes IBGP as iBGP.)
When we are forming a peering between autonomous systems, this is called Exterior Border Gateway Protocol (EBGP). . (Note: Some literature writes EBGP as eBGP.) Remember, the reason BGP distinguishes between an IBGP peering and an EBGP peering, is that operational characteristics are going to need to change based on how the peering is done. For example, we stated there is an AS path, which is recording the autonomous systems that are transited. Clearly, over an EBGP peering, when a prefix is sent from one AS to another AS, the sending AS must put its autonomous system in the path. But with IBGP, the prefix remains within an AS, so BGP does not insert the AS value in the update. You can refer back to Figure 1 in order to see these different peering types in action.
So, the rules change when we are talking IBGP versus EBGP, in order to keep things consistent and error free. And the unique properties of BGP just don't end here. We're going to see, again and again, as we examine BGP in this chapter, that there is just going to be case after case, where the protocol operates in a very different manner from what we would expect with our IGP brethren.
BGP Message Types, Formats, and Neighbor States
BGP Message Types
Many people describe Border Gateway Protocol as an extremely complex protocol, but I don't think that really tells the story. You see, setting BGP policies and controlling prefix propagation inside of BGP, that can be pretty complex. But the protocol itself, while unique, is basically simplistic in its operation.
In this section, we are going to examine the BGP message types. Figure 2 shows the various message types of BGP.
Figure 2: The BGP Message Types
Remember the first step. When two BGP speakers want to form a peering, they are going to rely on Transmission Control Protocol (TCP). And, of course, we know there is going to be a three-way handshake with TCP in order to start that reliable communication session.
What happens next is these devices are going to exchange Open messages. The Open message contains very important information, the primary ingredient of which is the autonomous system number of the peer. This is going to determine whether it is an IBGP peering or an EBGP peering.
When the Open messages are exchanged, the BGP speakers then start exchanging Keepalive messages. This, of course, is a simple little mechanism to make sure the other device is alive, happy, and healthy, so that the peering can remain up. When the BGP speakers have updates to share, it is a simple Update message that is transmitted.
If anything goes wrong at any point, the BGP speakers can use a simple Notification message in order to tear down the peering as a result of some error that may be happening with BGP.
One of the very interesting BGP message types is the Route Refresh message type. This is not in the original standard but is a valuable later enhancement that came along to BGP. While this message type was not in the original BGP standard, most of our major networking vendors support Route Refresh behavior. Route Refreshes allow the neighbors to update, let's say, BGP route information or even update things after a pretty major policy reconfiguration without tearing down the peering or affecting the peering in any big negative way.
Figure 3 shows these message types in action thanks to a Wireshark capture of the message exchanges by BGP in our sample topology from Figure 1.
Figure 3: The Message Types of BGP in Use Between BGP Devices
BGP Message Formats
In this section, let’s learn even more about the operational characteristics of Border Gateway Protocol, by taking examining the message types of BGP in even more detail.
Each message type is going to have a BGP header. Figure 4 shows this header. You will note that the BGP header has a large Marker field. You would think this is hugely important. It is 16 octets in size. As it turns out, this field is just going to be filled with all ones.
Figure 4: The BGP Header
The is because the use of this Marker field has been deprecated in the standards. The original idea for this field was that it could be used to detect events like a loss of synchronization between two peers, and it was also thought that this would be the area in which authentication information could be held.
Why is this field even kept around in BGP? This would be in the very rare case where there needed to be backward support with some really old BGP device that expects to see this Marker information.
The important fields here in the header are going to be Length (which is the length of the overall message) and the Type fields. The Type field indicates what type of BGP message we're dealing with.
If you see a 1 in this field, for example, you're dealing with a BGP Open message. A value of 2 indicates an Update message. A 3 indicates a Notification. A value of 4 would be a Keepalive. A 5 indicates the optional Route Refresh behavior.
What follows the header information, of course, is data, with one important exception being a Keepalive message. By definition there is no data required in the Keepalive message.
Now I'm sure you recall, when your system wants to form a BGP peering with another device, it's going to send an Open message. Figure 5 shows the format of these messages.
Figure 5: The Open Message in BGP
When we look at the Open message format, we note that there is a version number. We expect (and demand!) future versions of BGP, and that's how BGP indicates the version of BGP that you're using.
Your system is also going to send its AS number in the Open message. This is critical for that IBGP versus EBGP type behavior. There is a Hold Time value, and what's interesting about the Hold Time is when the router that you want to peer with receives this, it will look at that Hold Time, look at its own configured Hold Time, and then use the smaller of the two values. The Hold Time needs to be either zero or at least three seconds. Note that you can negate a Hold Time with a zero value.
Then there's your BGP Identifier. This is your BGP Router ID, and this is a critical value that's going to distinguish your system uniquely in the BGP peerings.
Finally, we have optional parameters that can be set with the Open message. There's an Optional Parameter Length and then the parameters themselves to give us added flexibility with the protocol.
Another really important message that we have is the BGP Update message. Figure 6 shows this message structure.
Figure 6: The BGP Update Message
The BGP Update message contains an indicator for Withdrawn Routes Length. This ensures the Update message is the vehicle for routes to be withdrawn from the BGP table of a neighbor. Note the list of Withdrawn Routes is then inserted in the Update message.
The Update message then contains fields that are used to share network prefix information with neighbors and include the very important attribute information that is associated with the prefixes. Remember, these attributes permit you to make policy decisions on how BGP will actually route information in the network.
A well-known attribute that we already mentioned is AS path. You recall that this is the list of autonomous systems that the prefix has transited on its way throughout the BGP infrastructure. AS Path would be an example of an attribute that must be in the Update message when it is used to send prefixes. There can be many attributes that we are utilizing, and that is the reason for the Total Path Attribute Length field in the Update message.
The network prefix information itself is located in the NLRI field. This stands for Network Layer Reachability Information. This might be an excellent time for you to revisit Figure 3 in this chapter as you can see these fields in an actual packet as well as their contents.
Why did the creators of BGP not simply name the NLRI field as the IPv4 Prefixes field? This is because the creators of BGP did a genius thing. They built the protocol to carry NLRI so that it would be flexible as networks change and new information might need to be transported. BGP is built to start immediately running things like IPv6 for us. It can also readily carry VPN IPv4 prefixes inside of something like an MPLS VPN.
Figure 7 shows the fields of the Notification message.
Figure 7: The BGP Notification Message
It is not hard for us to remember the job of this BGP message when we examine the fields inside it. The very first field is the Error Code. Then there's an Error Subcode. These fields give us a general type of error and then even more information. For example, if in the Error Code we have a value of 3 and then in the Error Subcode we have a value of 3, this indicates that there is an Update message error. And that specifically, there is a missing well-known attribute value. This would be a massive error for us in the BGP infrastructure, and the BGP peering would be torn down by the devices.
BGP Neighbor States
Just as we can learn a ton about the operation of BGP by examining the BGP messages and their formats, we can also learn a lot about BGP by examining the various states that a BGP peering transition through. In fact, these can be critical when troubleshooting. When you really analyze the BGP protocol, you're not going to be surprised to learn that there are a lot of mechanisms built in to ensure stability. We want as stable a protocol here as possible.
A lot of IGPs are engineered to be as quickly converging as possible. This is so when the minute there's a change inside your organization’s network, we want sub-second convergence of other devices, so we know about that change. BGP is engineered differently. Timers are of a much longer duration than we would be used to with our IGPs, because we want stability at the sacrifice of convergence speed. After all, BGP is dealing with the public Internet routing tables in Service Provider deployments. These routing tables are absolutely massive. Instability in this environment could literally be catastrophic for the public Internet.
When you look at the BGP neighbor states, you get a sense for this. The relatively large number of BGP neighbor states shown in Figure 8 convey the careful efforts around the stability of the routing protocol.
Figure 8: The BGP Neighbor States
Notice there is an Idle state where the device is not initiating any of the other states, and then there's an Established state where it is fully established with its peer. What is somewhat surprising is that there's all these “in-between” states of Connect, Active, OpenConfirm, and Active.
The Connect state is where the BGP device is waiting for the TCP connection with the neighbor to be completed. In the Active state, it's actually trying to initiate a TCP connection with its neighbor. In the OpenSent state, as you might guess, it has sent its Open message, and it's waiting to hear back from its neighbor with its Open message. In the OpenConfirm state, the BGP speaker is actually waiting for the Keepalive based on the successful exchange of the Open messages. Hopefully, the BGP device gets a Keepalive. If there's some type of error, it would receive a Notification message.
The great news for your support engineers is they can confirm what's going on with the BGP state at the Cisco CLI. Example 1 shows the use of the show ip bgp summary command in order to examine the neighbor state.
Example 1: Examining the BGP Neighbor State
Notice from Example 1, this BGP peering is in the Idle state. I produced this router output by configuring the TPA1 router for a BGP peering with ATL. I have not completed any BGP configuration on ATL yet, thus the Idle state for the peering. TPA1 tried to form the neighborship, and then essentially gave up and entered this state. When the peering is healthy (ATL has been configured), the Established state is communicated by showing the number of prefixes received in the show ip bgp summary output. In Example 1, this would mean the value of Idle is replaced with 1 (if ATL is sharing exactly one prefix with TPA1).
NOTE: When troubleshooting BGP peerings and considering neighbor states, we’re troubleshooting the Application layer protocol that is BGP, but we might also have to troubleshoot TCP. As you recall, BGP is relying on TCP at the Transport Layer protocol and port 179.
Well, that will wrap it up for this first BGP blog post. Stay tuned for the next one, where we'll delve into Path Attributes (PAs) and how BGP makes path selection decisions.