We’re back! It’s been a bit longer since my last update to this series, so I’d again highly suggest visiting parts 1 and 2 before going through this post, it’ll make it a lot more worthwhile!
After setting up our virtual networks and VMs, with one router VM having Internet access, we extended this by setting up a DHCP server on our router, using DHCP to dynamically assign IP addresses to the VMs in each network, in separate subnets. We also statically assigned IP addresses to the network interfaces of our router, assigning an IP address in the same subnet as the other interfaces in the same LAN that each of these interfaces connected to.
The key takeaway from last time was that although we had made some progress with intra-network communication within each VLAN, we still could not do any inter-networking, and by extension, could not connect to the Internet (except from the router, of course).
So why can’t we communicate between VLANs or with the Internet? The answer is that the ethernet protocol only gives us the foundation for communicating between network interfaces where we have a physical link. To take the next step of inter-networking, we’ll need to use IP.
Let’s talk a little ARP
Before we dive into the the IP layer, we should revisit how we were able to communicate among the devices on the same virtual switch, without a router or any other type of device needed. This will help us understand how the IP layers builds on the ethernet layer as it does.
As we previously discussed, all network interfaces are assigned a MAC address. As opposed to these MAC addresses, which are a physical addressing scheme that are burned into network hardware, IP addresses are a logical addressing scheme, implemented by software that builds on top of the ethernet layer. Packets have an IP header with information used to implement the improvements and new functionality exposed by IP, but still rely on the MAC addresses of network interfaces and the ethernet layer for communicating packets to the destination. For this reason, the IP layer includes a key protocol, known as ARP, or the address resolution protocol. This protocol establishes a communication format, for different devices to poll for the MAC address that corresponds to a given IP address. From this protocol, devices can build up an ARP cache, which is a mapping from IP addresses to a MAC address, like the ARP cache below on one of the blue VMs.
It’s this cache that lets devices decide what to do with a packet, as they will send the packet to the appropriate device if there is an ARP caching for the destination IP address, and otherwise, broadcast an ARP request for the specified IP, which other devices in their broadcast address can respond back to, with the MAC address this IP address maps to. Below is an example of some ARP requests and responses.
We see the who-has messages, where a message is being broadcast asking for 192.168.0.5. The response back is-at message tells the sender the MAC address that this IP corresponds to.
It’s important to reiterate that ARP is not an ephemeral process. IP addresses constantly change, whether you switch from using cellular data to a wi-fi network, or drive across town while using your same device, or even just that your DHCP configured IP has expired and a new one has been issued. For this reason, ARP requests are constantly being issued by all devices, to maintain a consistent cache.
Let’s do some routing
We’ve already assigned IP addresses to each of the VMs’ network interfaces, via the DHCP server we set up on our router. The next responsibility we will need to enable on our router is packet forwarding, better known as routing.
Routing is the key breakthrough of the IP layer, as it enables packets to go from a source to destination over a set of intermediate interfaces, connected to one another by a physical link, and thus allows for sources and destinations to communicate without being physically connected to one another. It is what will let us communicate between our VLANs, and with the Internet.
Routing is typically disabled by default. Devices with routing enabled are the links between the countless local area networks that make up the Internet.
Enabling routing on Linux is a simple configuration change, like below on our router…
…and just like that, we have inter-networking!
On the right we have one of our blue VMs, pinging the IP address of a red VM. Unlike last time though, we now see on the left when using tcpdump on the destination red VM that we not only are getting the requests from our blue VM, but we are also sending back the reply. It’s worth pointing out that each ping request/response not only shows the source and destination IP addresses, but also the source and destination MAC addresses.
But not really…When we try pinging some internet address outside of our three VLANs, like the Cloudflare DNS server 220.127.116.11, it doesn’t seem to go through.
so what do we do now?
Out the back door with iptables and NAT
If you remember from last time, IPv4 addresses consist of 32 bits, meaning we have a net range of ~4 billion IP addresses available. From the beginning, the various group who helped drive forward the Internet realized we may eventually exhaust this address space if we simply assigned a new IP address to every device, and they were right! These days, there are more and more new devices being create in droves every day and giving each one its own IP address just wouldn’t work. To combat this, they leaned back on the power of IP address subnetting, and the ability to manipulate the headers of packets, combining the two to into a technique of IP address conservation known as NAT, or network address translation.
To conserve IP addresses, only edge devices, the devices that connect different LANs to the Internet (for example, your home router), are assigned a public IP address. These edge devices will also have a private IP address, which is defined in the subnet of the LAN they create. Then, these edge routers, or some other device in the LAN will assign IP addresses in the private subnet to all the devices in the LAN.
At this point, every device in the LAN has a private IP address, but this cannot be used to communicate with the Internet, only the public IP address of the edge device can send requests to the Internet.
At this point, we can use what is referred to as IP masquerading, to let the router proxy requests for any of the devices in the LAN to the Internet.
For example, consider our router below, which has a private IP address for the blue/green/red VLANs, the 3 IP addresses of form 192.168.*.1, and a public IP address of 10.135.66.56.
The router can receive packets from our VLANs, and rewrite the IP header, replacing the source IP address with its own public IP address, and send these to the Internet. It then can replace the response packets destination IP address, with the private IP address of the original sender. In this way, our private devices masquerade as our router and access the Internet without a public IP address.
The manipulation and filtering of packets is part of the networking layer implemented by the OS, but there are tools which allow for end users to execute actions in the user space, that can modify the actions taken by the networking layer. For Linux, the module that implements this is known as netfilter, and the command exposed to users is known as iptables.
I could write an entire post about the various packet filtering rules that can be constructed by iptables, but it suffices to say it allows for the adding rules to apply at various stages of a packet in the network stack, such as when it is first received, or when it is determined the packet needs to be forwarded. It also allows for manipulating a table of NAT rules that will be executed at these different stages, which is what we’ll focus on for now.
To enable IP masquerading, we need to modify our router’s NAT table, like below:
This rule is simply establishing the following: when any packets sent by the subnet 192.168.0.0/16 are received by our router’s network interface eth1 and are in the process of postrouting (meaning they reached the router, but are outgoing traffic meant for some other destination), we want to apply IP masquerading to these packets. This subnet mask means if any of our 3 VLANs send a packet to the router that is then forwarded to the Internet, the packet source will be the routers’ public IP address.
Now, just like that, we have the Internet! (This time for real)
What else can we do with iptables?
Like I said, the power of iptables is endless and lets you truly decide the laws of the land in your LAN
I’ll give one example of something else we could do with iptables, and let you consult with a reference to learn more, but consider the command below:
This adds a rule to our forward table, which is applied to packets leaving one interface on the device and going to another on the same device. In this case, we are saying that any packets from the subnet 192.168.2.0/24 are to be dropped, meaning they will not be forwarded and instead are effectively discarded.
If you’re paying attention closely, you’ll notice this is the subnet of our green LAN, and as you might’ve guessed, results in the green VLAN losing access to the other VLANs or the Internet.
At this point, I’ll need a promise from you all that you won’t be messing with your own home routers’ rules and blocking your roommates from the Internet…
So now we have access to the Internet! We’ve also learned a bit more about ARP and how it lets IP build on top of L2 networking, as well as how we can manipulate the way our devices’ network stacks handle packets. This is all good to open the door, but the Internet as we know it is made for humans, not machines, and these IP addresses, while cool, are not the most intuitive way to access the sites and services we love and rely on.
At some point, it was realized we would need a phonebook-like service to make the Internet usable by people, and that was how the domain name system came to be. We use DNS every day, when we try to hit a site like twitter.com, and use DNS to translate that site name to an IP address like 18.104.22.168 or wherever this site lives on the Internet.
DNS is already being leveraged to resolve host lookups on our router, but for the rest of our VMs, we’ll talk about configuring a DNS server of our own next time.