Hi - I'm posting this just to document the issues that we had and what fixed them in the event that others have the same issues and hopefully they can fix it in less that the couple of weeks that it took us to figure out and fix.
- We are a charter school and have 4 classes (100 students) with chromebooks for online learning. This is the first year the students have had their own chromebooks. Previously we had a school network for the teachers and administrative staff to use their laptops and connect their phones while at school but that was it.
- We have a robust LAN / WLAN network (2 Cisco 3560 switchs & 6 Cisco 1130 a/b/g dual radio APs) ... not the newest gear, but enterprise class and plenty of capacity for our needs.
- We were using the comcast router (DPC3939B) for NAT and firewall.
Issues we experienced:
1. No support for static routes in the DPC3939B. With the addition of the chromebooks we created some internal subnets and were using one of the Cisco switches as a router for those network. While in the admin interface of the DPC3939B it has a place to configure static routes, try as we might they would not take. When trying to add a static route it would give an error message and say try again. In order to run our multiple internal networks we set up the Cisco switch / router to support proxy arp and set the netmask on the comcast gateway large enough so that it thought every host inside the network was directly attached to it. This worked for a little while until ...
2. DPC3939B seemed to stop passing traffic. We think it was because it ran out of space in the arp table or something similar. With the proxy arp setup it was seeing > 100 machines at the same time. In order to get around the static routing issue we added a dedicated router / firewall (the Cisco switch / firewall couldn't do NAT) - a Ubiquiti EdgeRouter Pro-8. We set this up to provide firewall and NAT between the 'internal network' (Cisco switches and APs) and the DPC3939B. The outside of our Ubiquiti was from our static IP address range so in theory the DPC3939B was not doing any NAT / firewall for this device and our entire site was now just showing up as 1 IP to the DPC3939B. Things got a lot better, everyhting seemed to be working until ...
3. At the busiest times when everyone was online and we were running more than 8000 connections (as measured in the Ubiquiti firewall) access to the internet would grind to a halt. The issues seemed to be related to the number of connections, not the amount of traffic. With just one host on the network we could run speed tests up to 170 mbps (20 mbps more than our service). However, with a large number of connnections we would see high packet loss, low throughput (maybe 30 mbps). Pings to the external interface of the DPC3939B would have up to 50% packet loss. We knew it must be something with the DPC3939B, even though in theory all of our traffic should be passing through it and it should not have been tracking the connections in any way. So next we tried ...
4. Setting the DPC3939B to bridge mode. We figured that this should turn off any firewall at all in the DPC3939B. This did not change the behaviour - when the # of connections in our Ubiquiti firewall was around 8k the internet connection would die. Also, when looking at the connections in the firewall, 75% of them were in the UNREPLIED state, meaning that the internal host was trying to connect out and had not received any responsed from the server it was trying to connect to.
5. After many calls to Comcast, specifically getting them on the phone while the issue was happening so that they could see the packet loss themselves, they sent a tech out to replace the modem. The key here was that we had to get on the phone with them while the issue was happening, because otherwise the connection would work fine and test out fine. The first tech came out to replace the modem, couldn't because Comcast's provisioning system was down while he was out, then left for the day and never came back. After 2 more days we got another tech to come out. As soon as he looked at the modem he said 'Oh yeah - the DPC3939B has an interal conection limit of around 8k connections. If you have more than that it craps out.' He offered to install the 'other' businessclass device (Netgear). That fixed the problem - we can now easily carry all 100+ devices and have not seen the issues reoccur.
The short story: the DPC3939B has an internal connection limit of around 8k. Once you have more than that many connections (TCP, UDP, whatever - any combination of source & destination IP addresses & ports) then it craps out. The Netgear modem does not seem to have any limitation. Caveat: we have own own device doing NAT & firewall, you might not be able to run 100+ devices behind the Netgear ...
The only limitation is that the Netgear only goes up to 150 mbps, that is why they gave us the DPC3939B in the first place - it goes up to 250 mbps. The tech said that if you need more than 150 mbps and will have a large number of connections (usually associated with >50 or so hosts) then you have to go to Comcast fiber.
Hopefully this will help anyone else in the future that experiences the same issue.