Before we look at what breaks, I should probably make sure that you know what it is that I’m talking about here. If you already know all about traditional NAT and address overloading, skip to the NAT444 section. If you are familiar with that as well, feel free to skip right to the list of what it breaks. In any case; enjoy! =)
NAT, PAT and Address Overloading
When people talk about Carrier Grade NAT (CGN) or Large Scale NAT (LSN) they are talking primarily about NAT444. The NAT part of those acronyms stands for Network Address Translation and NAT is already very common in IPv4 networks, particularly on LAN/WAN gateway devices. The basic idea is to use non-globally-unique (“private”) addresses on the LAN (Local Area Network) and only use globally-unique (“public”) addresses on the WAN (Wide Area Network) / Internet facing interfaces. When deployed in this way, NAT is usually combined with less-often-spoke-of PAT (Port Address Translation) to allow address overloading on the WAN side.
Whoa – we just went into geek speak, I know. It’s really not too scary though; overloading just refers to a many to one relationship of inside (local/private) addresses to outside (global/public) addresses. This is sometimes called NAPT (Network Address Port Translation) or NAT overloading and it allows you to place a very large number of hosts (all with ‘private’ addresses) behind a NAT device with a much smaller number of globally unique (public) addresses. This is done by using port numbers as unique identifiers. Let’s walk through a quick example to see how this works (as always, click pics for full size).
On the far left of this diagram we have two hosts in our local network (a laptop and a tablet), the Internet is on the right, and there’s a NAT overload (NAPT) router in the middle connecting them. You can see that both of our local network hosts have locally unique (“private”) inside addresses (192.168.1.23 and .42) and the router has a single globally unique (“public”) outside address (203.0.113.57).
When the laptop wants to connect to another host out on the Internet, it sends its traffic to the router (it’s default gateway). The router takes that traffic and swaps it’s outside (global/public) address in place of the laptop’s inside (local/private) address and sends the traffic on, to the Internet. Now all Internet devices see this traffic as if it came from the router itself, so they send return traffic directly to the router. When the router receives that return traffic, it swaps the laptop’s inside address back in place of it’s own outside address and sends the traffic on, to the laptop. Simple enough.
Things get more interesting when both the laptop and the tablet need to talk to the Internet at the same time though. Now the router needs to know what return traffic goes to the tablet and which traffic is supposed to go to the laptop. This is where those port numbers I mentioned earlier come into play. Each two-way connection between network hosts is called a session (or sometimes a flow) and to keep these sessions from getting mixed up, the router uses a unique port number to identify each one.
Here you can see this in action. In this diagram, both the laptop and the tablet are sending traffic to the Internet (initiating one session each). When the router does the address translation (the swap), it not only changes the address but also adds a unique port number when sending the traffic out to the Internet (:2001 and :2002 in the diagram). In this way, when the router receives return traffic that is addressed to 203.0.113.57:2001 it knows that this should be swapped for 192.168.1.42 (which is sent to the laptop) and that 203.0.113.57:2002 translates to 192.168.1.23 and goes to the tablet. Easy peasy. It’s kind of like having a post office box. All the mail comes to the same physical street address (the outside IP address) but each letter is addressed to a unique box number (the port number), so everybody gets the right mail.
Of course, this example is very simple. In real life most networks have more than two hosts and almost all hosts open up many more than one session. Since each IP address has about 65,000 ports available, that is the absolute maximum number of sessions that one address can handle. This is plenty for most small networks, and larger ones can typically add a few outside addresses to what is called a NAT pool, so this scales pretty well at the LAN level. There are problems with this approach however.
Problems with NAT
The primary issue with NAT (we are still talking about NAPT/NAT-overlaod but just saying NAT is easier, it’s what most folks mean when they say NAT anyway) is that it breaks the end to end principle of networking. WTF does that mean? Well, basically, the end to end principle states that network communication should happen between the devices at the very edges of the network and that all devices in the middle should just pass that traffic along. This is a primary design goal of the Internet and for good reason. Without delving into all the gory philosophical details, we can over-simplify again and say that, in practical terms, the end to end principle allows network hosts to talk to each other unobstructed.
Aha! Now we start to see the problem with NAT: It introduces an obstruction into the communication path. In order for NAT to work, the device doing the NATing has to mangle each and every packet of data being sent and received. It must dig into those packets’ headers and change the source or destination address (and port number). The primary effect of this is that network hosts behind an overloaded NAT must initiate all communication. That is; inbound communication attempts are impossible because there is no way for a host outside of the local network to know or understand the inside (local/private) address of that host.
If we look at our laptop in the examples above, it has an RFC 1918 address of 192.168.1.42. Because this is a “private” address from a shared pool, it can’t be advertised on the public Internet. This means that other hosts on the Internet have no way to know the laptops address (and even if they did, there would be no route to get to it). See, we need that outside address and port combination set up on the router before any traffic can get in, to the laptop, so that the router knows where the traffic is supposed to go. This is fine for web browsing and many other client side applications but it is a major problem for server and peer to peer applications (VoIP, gaming, webcams, VPNs, bittorrent, video streaming, chat, etc) where communication needs to be initiated from the outside in. Major meaning they don’t work.
UPnP and NAT Traversal
Because there are simply not enough IPv4 addresses to provide a globally unique (public) address to all of the devices already connected to the Internet, network operators have been forced to deploy NAT in the LAN. As such, tools have been developed to help combat the problems it causes.
One notable hack is enabled through UPnP (Universal Plug and Play) and is called Internet Gateway Device (IGD) Protocol. UPnP-IGD is a protocol that gives the hosts behind a NAT some control over the NAT device. Things like discovering the router’s outside IP address and setting up static port mappings. This allows many of the applications that would otherwise be broken by the NAT to work in spite of it. NAT Port Mapping Protocol (NAT-PMP) offers similar functionality.
There are several other NAT traversal techniques and protocols that have been developed to get around the problems that NAT causes. These various tools are together a looming testament to two things:
- Human ingenuity and our ability to overcome obstacles.
- The brokenness that NAT introduces into inter-networking.
While it pains me to do so, I feel compelled to address this largest myth of NAT: Network Address Translation is NOT a security technique!
Some folks will tell you that because hosts outside of a NAT’d network can not initiate communication with hosts inside of the NAT’d network that NAT is providing security. This is a lot like saying that since a key snapped off in the door lock will make it harder to get into my car, that it provides added security. While it may be true that jamming a door lock will make it harder to get in the car, I would have a hard time recommending that as an anti-theft method. There is a difference between broken access and access security.
Stateful packet inspection is what you want for network access security, not NAT brokenness.
The NAT444 Model
So far we have discussed traditional NAT (mainly NAPT/NAT overload) which is starting to be called NAT44 because it translates one IPv4 address for another (4 to 4). Now let’s explore what I (still) call NAT444, which was also called Carrier Grade NAT (CGN) at one point and is currently called Large Scale NAT (LSN) by all the cool geeks. I like NAT444 because it explains what is really going to occur in most places when LSN is implemented; a triple NAT (IPv4 to IPv4 to IPv4). Sounds like a nightmare already, doesn’t it? We just doubled the NAT, which means doubling the interference with network traffic and further impeding the end to end principle. Let’s see what that looks like.
This diagram illustrates the NAT444 model as envisioned in the IETF NAT444 draft and as it will most likely be deployed (unless we can figure out a way around it altogether). In this diagram you should notice two things right off the bat:
- Large Scale NAT for IPv4 will be deployed dual-stacked with global (“public”) IPv6.
- Large Scale NAT adds a second layer of NAT and thus a second area of “private”/inside addressing.
As you can guess from point two, NAT444 exacerbates all the problems that traditional NAT44 introduced. What may not be obvious at first is that LSN/NAT444 aggravates those issues in new ways as well.
In addition to adding a second layer of NAT that creates major problems with law enforcement and abuse logging as well as geolocation and others (all because many distinct customers are behind one provider address), we have to deal with the fact that the second layer of NAT is not going to participate in UPnP, NAT-PMP or other LAN-based NAT traversal protocols. No ISP (Internet Service Provider) in their right mind is going to open up their own routers (or other network devices) to customer control – which is exactly what these protocols require. They are simply not secure and the risk of one customer being able to impact other customers’ service is too great. So where does that leave us?
What NAT444 Breaks
We are left with a number of applications (and application types) that currently break when Large Scale NAT is introduced. To avoid the doom and gloom feeling that is sure to follow a list of just the broken stuff, let’s start with a list of what isn’t broken by NAT444/LSN:
- Web browsing
- FTP download
- Small files
- BitTorrent and Limewire
- Leeching (download)
- Skype video and voice calls
- Instant messaging
- Facebook and Twitter chat
Not too shabby really, all things considered. That is quite a bit of functionality for being behind a fairly large kludge. If that were the end of the story I wouldn’t have written this article though. So, without further adieu, here is the list you’ve been waiting for; what NAT444 breaks:
- FTP download
- Large files
- BitTorrent and Limewire
- Seeding (upload)
- On-line gaming
- Video streaming
- Remote viewing
- VPN & Encryption
- Limited ALG/SIP support
- All custom applications with the IP embedded
- Lack of ALGs
Wow, is it just me or is that list a bit longer? There’s that doom and gloom feeling creeping up.
For our purposes here, “breaks” means that the service was degraded or completely failed. The data behind this list primarily comes from Assessing the Impact of NAT444 on Network Applications, an IETF draft which documents testing that was done by CableLabs, Time Warner Cable, and Rogers Communication on “many popular Internet services using a variety of test scenarios, network topologies, and vendor equipment.” If this kind of thing interests you at all, I highly recommend checking out the full draft, it’s a quick and informative read. I also have a bit of experience dealing with NAT444 myself, but that’s a story for another day.
Port Control Protocol
I would be remiss if I didn’t at least mention Port Control Protocol (PCP) in this discussion. The basic goal of PCP is to create a new, more advanced, technique to control port forwarding on NAT devices so that the brokenness fixed with protocols like UPnP-IGD and NAT-PMP in the LAN can be solved in a NAT444 (or other LSN) environment. I have not, as of yet, dug very deep into the work being done but I do see some challenges:
- New protocols require new equipment (or at least new code).
- Will providers sign on to allow customer application control of their network devices?
- Time. Is there enough? We will hit RIR exhaustion in at least 3 out of 5 regions before the proposed standards are published.
You knew I was going to say it eventually: The only true solution is to deploy IPv6. This is why the NAT444 model includes global IPv6; the only way to get around the brokenness introduced by NAT is to eliminate it. Luckily we have the means to do so; Internet Protocol version 6 (IPv6). So, dual-stack today, with NAT444 if you must, and then do everything you can to get everyone you do business with to do the same.
[…] NAT444 (CGN/LSN) and What it Breaks :: don't panic […]
Thanks for this. I wasn’t aware that ISP’s might deploy NAT. Yuk!
(Seems to me its wasted effort that could be better employed on IPv6)
Couldn’t agree more – I for one am going to everything I can to help ISPs avoid needing NAT, or at the very least, needing it for very long.
[…] Here’s an interesting article on NAT444. […]
Great article, thanks.
Can you clarify one point?
You mention VPN, specifically ssl not working, I read:
draft-donley-nat444-impacts-01 and they dot mention Ssl, can you say were you found the tests that demonstrated the fail?
IPv6 is BAD : why ? because it’s not privacy compliant by design, just by option …
[…] want to give up my FiOS connection and running my own router. With a cellular Internet connection, NATing is done at the ISP and no PATing for my servers! Moderator of the InfoSec, CWNP, IT Jobs, Virtualization, Java, and Microsoft Developers forums […]
krominet: well ipv4 has no privacy muilt in either,the little privacy you gain as aresult of all the headakes of nat is not there by design, whereas IPV& (at least any implementation users are lighttly to get exposiure to has privacy exstensions torned on by default so please tell me how this is worse than ipv4?
well this will be w moot point soon anyway as even afrinic (tha last rir with anty adresses left for general distribution will probably ryn out in may 2018
My ISP here in New Zealand is using CGNAT. Would that explain why my modem/router WAN address is different to the address displayed by a website like ipchiken.com?
Also, would CGNAT stop PPTP VPN’s from working?
First, yes, CGNAT would certainly explain the difference in address.
Second, unless the ISP is running a PPTP ALG, the CGN could certainly interrupt your PPTP VPN. There are CGN solutions on the market that do not disrupt PPTP VPNs, it’s really dependent on what gear your ISP is using and how they have it configured.
Interesting that you should have that issue. I’ve just changed ISP here in New Zealand to BigPipe (who use CG-NAT) and my PPTP connection to the office has stopped working – it drops after about 10 seconds. Interestingly, my old ISP (Flip) also use CG-NAT and I had no issues there.
Seems to backup Chris’ response that it depends on the ISP’s gear.
How did you get on with this?
[…] gave a lightning talk on CGN logging at NANOG 54 in San Diego which started with those very words. The abstract lays out the […]
If you;re going to list things that you allege don’t work, you should state why, or show evidence.
Mobile networks have used CGnat for a long time with no noticeable issues.
I’m most curious as to why you think netflix wont work, doesnt seem to reflect reality.
The evidence (details on the testing performed) is in the IETF document I link to in the post: https://tools.ietf.org/html/rfc7021
It’s also worth noting that I wrote this piece about 8 years ago.
Thanks for the comment.
NetFlix will fail because the MPAA is beating on NetFlix to enforce regions. Many people use VPNs to change regions. As a result, when NetFlix sees a lot of customers coming from the same public IP address, they block that IP and display an error message stating that you seem to be using a proxy.
Regarding VPNs and SSL, how is this all of this being broken by NAT444? I understand that a lot of devices cannot perform a site to site VPN without knowing the remote address either via the static address or a dynamic DNS so this would break. However with NAT-T, which all client based IPSec VPNs support, a client based IPSec VPN should work fine. Likewise any devices that can accept a site to site VPN without knowing the remote side’s IP, such as Cisco DMVPN, will be able to bring up a site to site VPN also using NAT-T. But how does NAT444 break SSL? Most web sites are now running HTTPS using SSL so that would mean web browsing wouldn’t work either. SSL based VPNs should likewise work for the same reason. Also unless I missed something, the IETF draft on NAT444 doesn’t mention anything about breaking VPNs, IPSec or SSL at all.
What are your thoughts about using the shared address space for routing gear within the ISP’s network? I have built many networks where public addresses are used for internal ISP routing equipment just because no one wants to see private addresses on a traceroute. Using shared addresses for gear that is just routing and doesn’t need to be accessed from outside the ISP would seem to alleviate wasting public addresses in this case.
Chris, Thanks for the writeup. I’m sorry some people think they have the right to attack you because they don’t understand it.
Thanks for this breakdown, Chris! I was doing some research on CGNAT and it brought me here. Hope to see you at another Tech Field Day event!
Thanks for the great content, And if it was wrote in. 2011, I think lot of things have been changed, maybe an update is need if the actual content is out of date.
Thanks for an insightful article on the matter, whilst it is now 11 years old, it is still relevant trying to understand the cause and plan the best way to get a Plex server to work remotely with an ISP using CGN
Glad you found it useful! We’ll put it in the “oldie but goody” category =)