I have a Comcast business class account with 13 static IP addresses and an IPv6 netblock. I run several servers here, including a DNS server that gets a few hundred queries a second. Total traffic is generally about 5Mbit/s, so not super heavy.
However, every DNS query comes from a different port, so presumably it gets tracked as a "new connection" by the firewall code inside the DPC3941B. That adds up to a few hundred "new connections" a second, quickly filling the connection tracking table inside the Cisco router.
Why do I think this is happening? Three symptoms:
- Established TCP connections and UDP streams continue to go fast, even as new connections fail to connect. For example, video conferencing continues, even as sending an email gets stuck.
- Redirecting some of the DNS traffic elsewhere reduces the severity of the problem a little bit, though it still happens.
- Ping times to 10.1.10.1 vary wildly. Presumably the DPC3941B is "doing something" before responding to the ping request. Maybe looking through the connection table for an available slot?
64 bytes from 10.1.10.1: icmp_seq=753 ttl=64 time=1.46 ms
64 bytes from 10.1.10.1: icmp_seq=754 ttl=64 time=2.07 ms
64 bytes from 10.1.10.1: icmp_seq=755 ttl=64 time=15.0 ms
64 bytes from 10.1.10.1: icmp_seq=756 ttl=64 time=1.76 ms
64 bytes from 10.1.10.1: icmp_seq=757 ttl=64 time=1.83 ms
64 bytes from 10.1.10.1: icmp_seq=758 ttl=64 time=3.31 ms
64 bytes from 10.1.10.1: icmp_seq=759 ttl=64 time=1.30 ms
64 bytes from 10.1.10.1: icmp_seq=760 ttl=64 time=1.47 ms
64 bytes from 10.1.10.1: icmp_seq=761 ttl=64 time=240 ms
64 bytes from 10.1.10.1: icmp_seq=762 ttl=64 time=2.71 ms
64 bytes from 10.1.10.1: icmp_seq=763 ttl=64 time=121 ms
64 bytes from 10.1.10.1: icmp_seq=764 ttl=64 time=3.14 ms
64 bytes from 10.1.10.1: icmp_seq=765 ttl=64 time=133 ms
How could this be fixed in the firmware?
If the size of the connection table is the issue: this device has 1GB of RAM, and the size of the connection table could be increased to something large enough that even 1000+ "new connections" a second, multiplied by the timeout value for UDP "connections" still results in there being easily re-usable free slots in the connection table.
If looking up the connection table entries is the issue: increasing the size of the hash table (source & destination address & port number) used to look up the connection table entries can reduce the number of items traversed from each hash bucket.
People have been doing this right for over 20 years, and there is no reason why the DPC3941B should be doing it wrong.
I have had the same problem before with an SMCDG3, which showed packet loss and crashes, instead. The Netgear router was even worse, becoming totally unusable within minutes.
Comcast, will you be fixing at least one of your routers to be able to handle a few hundred (incoming) DNS packets a second?
Solved! Go to Solution.
Some additional observations:
- On the Cisco router, both IPv4 and IPv6 firewalling are disabled in the configuration (Disable Gateway Smart Packet Detection and Disable entire firewall are checked).
- However, looking at Troubleshooting > Logs > Firewall Logs (today), I see that the router is dropping over 1000 IPv6 connections an hour: "FW.IPv6 FORWARD drop, 19083 Attempts." This is despite the firewall being DISABLED in the configuration!
- Limiting DNS lookups to IPv4 only seems to slow down that rate a little bit.
- The router still crashes several times a day.
- I have seen ping times as high as 893ms to 10.1.10.1, as well as packet loss to the router. It must be using a lot of CPU to do something else, before getting around to routing my packets...
After a whole day with the firewall supposedly disabled, the router says 37113 ipv6 connections were dropped. Given how flaky the networking is, I believe it.
"FW.IPv6 FORWARD drop, 37113 Attempts.
|2016/11/24 23:44:39||Firewall Blocked"|
I've had nothing but issues since my netgear was replaced with one of these abotu 10 days ago. Latency is terrible across the device and I have lots of data that shows it. Additionally, I can't get any of my static IPv6 to work since I run my /29 of static addresses in "passthrough mode" with my own CPE (which all worked before the switch). I want my netgear back, but I am willing to give up my statics and run this piece of junk in true bridge mode if I absolutely have to. LAtenct through the device graphic attached. I have bidirectional data and a data point at additional locations as well that more or less prove it's the Cisco.
Guess where the mew modem went in? My opinion so far is that these are just poor devices. FWIW, I see the exact same behavior in the firewall logs with the IPv6 firewall supposedly disabled.
This, I have been trying to track down the problems for our sudden latency, especially since the modem will allow a small number of connections (less than 10,000 TCP sockets) push and pull through the modem at close enough to the full speed. When we open more than around 15k states, the modem becomes unresponsive. The firewall seems to be able to handle the traffic just fine but it hardly receives any responses from the modem during these times.
I have this happening at both locations that had an SMC replaced with the DPC3941b, however business support will not replace since it "works fine" when ever they check the modem.
The next steps for me are placing this modem into bridge mode and simulating over 20k socket connections using httperf or netcat since "support" won't act without emperical evidence that the problem it not inside my network. I did get one of these replaced with the Netgear 3000 but it appears to have the same issue. I may also just buy another modem to bridge.
I hope to report back here with meaningful information after this is ironed out, we have been limited to only a small number of connections for 3 full weeks now and my two weekly TIER 2 requests have never been fulfilled or followed up on other than an automated email response "Thanks for calling about CR#########" .
So the Comcast DOCSIS 3 modems can NOT reliably act as routers for Business Class traffic. The solution (that we found without any support) was two fold:
The Cisco DPC3941B has issues on both fronts, first off it's inharent latency is related to the Puma6 chip.
Here is a latency testing tool (http://www.dslreports.com/tools/puma6) and relevent discussion on the matter, however you can just search puma6 on DSLreports.com as well. Second, it can not be placed into a True Bridge mode (as seen with no difference in latency below), it will also still be accessable at it's internal IP even in Bridge mode.
Here is the Ping Graph to Google from inside the Network behind the Cisco DPC3941B, responding at 50-60ms at the best an typically in the hundreds of milliseconds when under load (or not at all). This modem was placed into Bridge mode a day before the one in the last graph for comparison. Latency dropped only slightly for Max times (thousands of milliseconds).
Here are the pings to Google using the NetGear CG3000DCR. Can you guess when the modem was put into bridge mode? Also, the ammount of traffic going through that modem had increased after we bridged and removed the latency.
Pings are now flat, between 10ms and 20ms even under a heavy outward bound network load. These graphs are from the same time period with the two different modems on 2 similar networks.