EDIT: I thought this only impacted IPv6, but I have reason to believe it also impacts IPv4 when using NAT as well.
Short summary - Comcast, I would love to get in touch with someone technical about this who might be able to fix it (or just have it magically fixed - that's fine too). Or, alternatively I would like to get a firmware that works, or a different modem that lets me have both static IPv4 addresses and working IPv6 (I don't need any firewalling, nor want it, nor do I need any Wifi on it). A nice plus, but I'll live without it if needed, would be a modem with working DHCPv6-PD. Certainly, if I've screwed something up, I'm glad to admit it, but need to be pointed the right way. That said, I don't think I screwed up - I think Cisco did. But feel free to enlighten me.
I tried to turn off all firewalling, but I still see in Troubleshooting -> Logs, "Firewall Logs", entries for FW.IPv6 FORWARD drop and FW.IPv6 INPUT drop - I believe these indicate that there is an IPv6 firewall running on this box, and I believe that to be my issue. I'd love to get it turned off.
Duplication is simple: Take a DCP3941B modem, establish an outbound SSH session over IPv6 to something "on the internet". Run some command that produces periodic output to ensure you keep the connection alive ("top" is what I usually use), if you want to be sure it's not an idle timeout. The session will die, likely within a few minutes, but certainly within an hour.
I've been investigating two annoying problems with my new Comcast service. First, outbound IPv6 SSH sessions seem to lock up and eventually die randomly (but almost always within 15 minutes), even with plenty of traffic going back and forth, SSH keep alives, TCP keep alives, etc, going. It's not an idle timeout. Second, I'll find Facebook and Google sometimes hang for my browser - I suspect the same problem: outbound TCP sessions (like a keep-alive session from Chrome or IE or Safari) over IPv6 die randomly, but the browser never gets a FIN or RST, so it assumes the connection is stil up and has to timeout.
The first thing I did was to use some tcpdump captures of what comes in and out of my computer directly cabled to the BWG, and also a remote SSH host. What I see is normal TCP traffic back and forth just fine, until at some point the remote end's outbound packets never make it through the cable modem to my laptop (I see them sent on the SSH server side, but I don't see them in the capture on the client).
I'll note that I am a network engineer for a large ISP, with lots of wire-level protocol debugging and development experience (I've written routing engines for BGP from scratch, for instance).
Things I've ruled out:
1) This hang doesn't happen when using the same destination SSH server but using a Centurylink DSL account with a 6RD tunnel - the connections persist indefinitely just fine. So I don't believe it's the server.
2) To rule out that the server's network might be doing something weird to Comcast, I put a SSH server on my Centurylink DSL and established a connection to it from a laptop behind the Comcast router. That connection hung (and eventually died) frequently and quickly (within 15 minutes), just like traffic to the original SSH server I tested with. I also do not see any problems with neighbor discovery, router advertisements, etc (indeed, the problem even happens with statically configured IPv6 addresses - static in the sense of being hard coded into my laptop, as well as default MacOS X settings for IPv6).
INBOUND connections work fine. I can SSH from Centurylink DSL to a server behind the Comcast BWG just fine, with no apparent timeouts (I am using keepalives and such). EDIT: I just didn't wait long enough on those tests, it does also timeout.
4) IPv4 works fine, without these timeouts. The modem is staying trained, and my uncorrectable errors are very, very low (5 over the course of several days).
5) I've duplicating while bypassing any equipment other than a cable between my laptop and the cable modem - it's not any of my routers/switches/etc.
I can provide tcpdumps and duplicate this on demand if someone from Comcast's engineering or architecture groups is interested in more information. I'm also glad to assist in any way I can with this debugging. Anything you need from me, I'm glad to provide.
I believe this to be a problem with some sort of firewall code on the modem - perhaps something I either can't find in the interface or which only Comcast can turn off. I would prefer no firewalling on this modem, so if that's possible, I'm willing to do that. I'm also willing to swap the modem for a different leased model, or to run beta network code on the modem - I just want my IPv6. I do see firewall logs that show some blocked stuff on IPv6 (FW.IPv6 FORWARD drop, 3169 Attempts. and FW.IPv6 INPUT drop, 15649 Attempts.).
My modem settings:
- Leased Comcast DPC3941B
- Wi-Fi and MoCA turned off.
- DHCP on LAN enabled, both IPv4 and IPv6
- IPv4 firewall settings are "Disable Firewall for True Static IP Subnet Only", "Custom Security" with "Disable Entire Firewall" and nothing else checked.
- IPv6 firewall settings are "Custom Security" and "Disable entire firewall" being only option checked.
- Managed Sites, Managed Services, and Managed Devices are all disabled.
- Advanced "Port Forwarding" is disabled
- Advanced "Port Triggering" is disabled
- Advanced "Port Management" / True Static IP Port Management is disabled ("Disable all rules and allow all imbound traffic through" is checked and "Open all ports but blcok exceptions below" in pull down).
- Advanced "DMZ" is disabled
- Advanced "NAT" is disabled ("Disable All" is checked).
- Advanced "Static Routing" is empty
- Advanced "Dynamic DNS" is disabled
- Advanced "Device Discovery" has UPnP "Enable", and Zero Config disabled.
- System logs show nothing but my logins for today. Event logs are empty for today. Firewall logs show "FW.IPv6 FORWARD drop, 3169 Attempts" and "FW.IPv6 INPUT drop, 15648 Attempts". There is no other data in the log - nothing about generic IP or IPv4.
- I can provide serial number, MAC, etc, on request.
Here's what a RIPE ATLAS network monitoring probe finds on Comcast IPv6 behind the DCP3941B - you'll note the frequent drops (because it's using IPv6) in it's control channel to RIPE:
Connection History (Showing only the last 25)
Recently completed uptime periods may be posted about 60 minutes late.
Internet Address Controller Connected (UTC) Connected for Disconnected (UTC) Disconnected for
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 16:13:29||0h 29m||Still Connected|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 15:13:09||0h 51m||2015-11-23 16:05:05||0h 8m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 15:07:06||0h 2m||2015-11-23 15:10:01||0h 3m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 14:55:02||0h 9m||2015-11-23 15:04:09||0h 2m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 11:00:14||3h 51m||2015-11-23 14:51:25||0h 3m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 08:08:37||2h 45m||2015-11-23 10:54:30||0h 5m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 04:25:53||3h 40m||2015-11-23 08:06:33||0h 2m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 01:07:11||3h 14m||2015-11-23 04:21:29||0h 4m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-23 00:00:49||1h 2m||2015-11-23 01:03:26||0h 3m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 23:51:45||0h 4m||2015-11-22 23:55:54||0h 4m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 23:24:39||0h 22m||2015-11-22 23:47:26||0h 4m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 21:54:13||1h 26m||2015-11-22 23:21:07||0h 3m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 20:20:47||1h 30m||2015-11-22 21:50:54||0h 3m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 19:29:36||0h 48m||2015-11-22 20:17:38||0h 3m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 17:38:06||1h 47m||2015-11-22 19:25:16||0h 4m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 17:29:02||0h 5m||2015-11-22 17:34:24||0h 3m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 17:07:56||0h 18m||2015-11-22 17:26:08||0h 2m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 17:01:53||0h 4m||2015-11-22 17:06:23||0h 1m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 14:25:17||2h 32m||2015-11-22 16:57:30||0h 4m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 12:51:54||1h 28m||2015-11-22 14:20:00||0h 5m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 06:50:49||5h 58m||2015-11-22 12:49:16||0h 2m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 06:29:44||0h 15m||2015-11-22 06:45:06||0h 5m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 05:59:36||0h 26m||2015-11-22 06:25:54||0h 3m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 04:47:14||1h 7m||2015-11-22 05:54:38||0h 4m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 04:17:07||0h 25m||2015-11-22 04:42:21||0h 4m|
|2601:285:202:xxxx:c66e:1fff:fe5b:f23a||ctr-ams16, NL||2015-11-22 03:53:01||0h 18m||2015-11-22 04:11:56||0h 5m|
It should be noted that this isn't an issue with IPv4 - the modem works good (great?) for IPv4 - fast, reliable, keeps connections alive appropriately, etc. I also see my signal levels are fine and that the modem experiences very few T3/T4 timeouts (maybe once a week at most - and I have reason to believe that was associated with some Comcast oustide plant maintenance down the road from me - they appear to be rerouting cables in response to some construction activity).
Having analyzed tcpdumps in detail, the problem is due to having a reasonable amount of data outstanding on the wire, but losing one of the early packets. That packet gets resent, which apparently resets some sort of TCP sequence number tracking in the modem. When the client replies, it sends an ACK not for that retransmitted packet, but for that retransmitted packet *and* anything that it received from the segments after that retransmitted segment. Because this SEQ number is larger than the retransmitted segment's SEQ + size (as is proper), the modem drops this, presumably because it's an ACK for a packet that it thinks hasn't been sent (but it has been).
The Comcast DCP3941B is clearly broken and non-functional for IPv6. My advise to users: Turn off IPv6 on the DCP3941B until Comcast can fix the modem.
One final post - this isn't just an IPv6 issue. It also impacts IPv4, if using NAT (I generally don't use NAT, but I did some tests to see if this was the case).
So, Comcast, it's likely giving a lot of your users a bad experience.
Basically, here's what I see in my TCP dumps between a Comcast-served IP and a non-Comcast served IP (I have dumps from both sides) - this happens either direction (client behind Comcast or server behind Comcast):
The Netgear CG3000DCR modem is the only one that forum members who have extensive ipv6 experience with Comcast have found will work properly. However this modem cannot be run at the highest speeds (100Mbt down) as it's CPU is not sufficiently powerful for that kind of bandwidth. The DCP3941b's CPU _is_ powerful enough for this but it's firmware is broken with IPv6. The Cisco is a usable alternative if you don't need DHCPv6-PD I believe.
Unfortunately, IPv6 on the DCP3941B appears to be broken even for direct assignment - even with machines plugged straight into the router, any IPv6 TCP session will get killed by the router randomly. I.E. if you ssh through the router to an IPv6 address, the router will randomly kill the session. The symptom in browsing the web is that your background "keep alive" connections get killed, so suddenly it seems like your browser hangs while browsing because it's waiting for Google (or whoever else) to respond to a request, but the router is blocking the outbound traffic. The same problem occurs on IPv4 too, if using NAT (static IPs do work just fine). If you test it, establish a connection through the router using either IPv6 or non-static IPv6 and make sure that the connection is talkative (I.E. to avoid problems with timeouts at any other middle boxes, do something that sends output every few seconds or so). The connection will die after a bit.
It shouldn't do that. The DCP3941B is broken on both IPv6 and IPv4.
I wish Comcast would simply provide a way to turn off the $#@! IPv6 firewall rules (I'm guessing they are doing an "allow all" rule when you say you don't want firewalling in the GUI, but the problem is they are still apparently doing state inspection - I want to turn that stuff off so it works). The static IPv4 addresses, with the firewall turned off, work fine. I just want that feature parity on IPv6.
So, if you have a DCP3941B, as far as I can tell, it is broken. Get it put in bridge mode and use a working modem if you aren't using static IPs and you'll probably be fine. If you are using static IPs, apparenlty you are just SOL, there is no working solution with this modem. And as others point out, the Netgear isn't an option if you have a high speed connection.
If someone from Comcast is reading this, I'm glad to help document this in any way that could get things fixed eventually.
This really does not make a lot of sense. For starters, IF you have dynamic IPv4 addresses you shouldn't even be renting a modem at all from Comcast. So dragging in dynamic vs static IPv4 is just confusing the issue mighitly.
Secondly, a connection oriented protocol like SSH and Telnet MUST send keepalives if it is established through a NAT because otherwise there is absolutely no way for the NAT device to tell if a client application that opened a connection and has never sent a TCP close, and is not sending or recieving anything, has either crashed or is still alive and waiting.
So the situation is this - IF you have a static IPv4 address or subnet on a Comcast cable line, and you are using the Cisco BWG, then that public IP address SHOULD BE on YOUR address translating router BEHIND the Cisco BWG and the BWG should be in "routed" mode with the firewall turned off.
In that mode there is NO connection table of any kind in the BWG and thus there should be no issue with interference - because the TCP connection between sender and receiver is simply not aware of the existence of the BGW in any form other than one more router hop.
In IPv6, there is ALSO no connection table at all on the /64 that is on the LAN interface of the BWG. Once more, the sender on the LAN and the recipient on the Internet only will see thee BWG as one more router hop.
I have no doubt that NAT in the Cisco BWG is badly broken. However, there is no scenario where you are forced to use the NAT in the Cisco BWG other than you have no static IP addresses and are running dynamic ones - in which case you should have no expectation of maintaining a connection for a long period of time.
I am running the Netgear on the 10Mb up by 50 MB down with static IPv4 with no speed issues.
I understand keep alives and such - they are enabled properly - please read my previous messages for what I see happening, which can only be explained by the BWG eating packets. I also disagree that NAT shouldn't allow long-lived connections (it can and does, so long as the IP address doesn't change), but I don't think whether it does or doesn't particularly matters in this case - the issue is that there is some code tracking sequence numbers that is buggy. And "long lived" is sometimes as short as 30 seconds before this bug is triggered - other times it takes an hour or two. Turning off the sequence number tracking would be a fantastic solution. The reason to bring up IPv4 and NAT is that, conceivably, this is the most common deployment scenerio for these BWGs - I'm hoping Comcast will recognize that they've shipped out a device that when it is in that mode is buggy and not functioning well. I suspect the IPv4 fix needed for those users is very siilar to the IPv6 fix needed for anyone using IPv6 on one of these not in pure bridging mode.
When using a static IPv4 address, everything is fine for IPv4 (as both you and I would expect). The modem is still likely running some firewall rules (even when the firewall is "turned off" in the user interface) for the Comcast side of the modem (I believe things like blocking outbound Windows sharing and such is done this way), but they are configured in a way that isn't doing connection tracking for the static IPs. Which is great - I want the same thing on IPv6.
When using a 10.x.x.x address assigned by the modem (which can be done while also using static IPs on other machines), the 10.x.x.x hosts connected directly to the BWG experience this NAT issue (I actually don't think it's a NAT issue as much as a connection tracking issue).
When using IPv6, you have no choice but to use the modem-assigned IPv6 addresses in the /64. Even when the firewall is turned off, it appears to be interrupting IPv6 connections (and, yes, I'm using keepalives, both TCP and SSH keepalives, and also was transferring data for good measure so I know the connection wasn't idle). The modem's connection tracking does appear to be enabled for IPv6, and it sometimes seems to get confused about the TCP sequence numbers that should be allowed through. I'd love to be able to turn off the firewall on IPv6 for these (the user interface lets me check some boxes, which I described above, but it obviously isn't turning off all the connection state tracking the way I want it to).
I see denied packets on IPv6 in the firewall logs on the BWG (unfortunately they just give counts, not any information about *what* is being dropped). This is despite the firewall being turned off. From traceroutes, it's the same symptom that happens if letting the BWG NAT IPv4 (that it happens to IPv6 also is why I don't think it's a NAT issue, but rather a connection state tracking issue).
If you have a 3941B, can you maintain a long lived SSH session (with keepalives and whatever else is needed) over IPv6? You do need data to flow back and forth (so that sequence numbers increment) to trigger this bug - the easy thing to do is run "top" on the host you are SSHing to through the BWG, and leave the client connected for a couple hours - you'll see it disconnect. My suspicion is that you cannot. IPv6 simply isn't working properly through this box, even with the firewall supposedly disabled. If you *can* maintain a long lived SSH session over IPv6, I'd love to find out what I'm doing differently.
I am also interested if you know of a way to turn off whatever is tracking the sequence numbers of IPv6 sessions.
Again, what I'm seeing on IPv6 with the firewall "disabled" on the BWG is described below. The server and client can be on either side of the BWG - I've tested it both ways. I'm not upset or concerned that a segment got dropped, since of course that's how TCP does flow control - I'm concerned how the BWG handles this situation sometimes. #5 is what I want fixed - if the box is not firewalling, it should not eat those ACKs. I am confident it is in fact eating those ACKs based on packet captures from both ends.
In the modem's log interface, I can see the following:
FW.IPv6 FORWARD drop, 3169 Attempts. and FW.IPv6 INPUT drop, 15649 Attempts
My firewall settings, from first post here:
- IPv6 firewall settings are "Custom Security" and "Disable entire firewall" being only option checked.
That logs, despite the above settings, make me think there is IPv6 firewalling going on (those names look awfully similar to default Linux iptables chains).
Here's what I'm seeing as far as the issue:
1) Server sends a bunch of data to the client. Let's say these are segments A,B,C,D, and E.
2) Client receives most of it, but is missing a segment. So only B, C, D, and E are received. Client ACKs for segment before A and adds a SACK option saying "I got B, C, D, and E".
3) Server receives ACK and resends A. In at least some cases, I believe this resets the connection tracking code on the cable modem, so it believes the currently valid SEQ numbers are the SEQs corresponding to the start and end of SEQ A.
4) Client receives A. Since it also has B,C,D, and E from previous transmissions in this session, it sends an ACK for all the data received through E.
5) The server never receives the client's ACK. I believe that's because it is too far in the "future" for the liking of the firewall code on the modem.
6) Because the ACK is blocked, the server will retransmit A, thinking it was lost again. The client will ACK E again, which will be dropped again by the Comcast modem. This repeats until one or the other end gives up and closes the connection.
"...The modem's connection tracking does appear to be enabled for IPv6, and it sometimes seems to get confused about the TCP sequence numbers that should be allowed through. I'd love to be able to turn off the firewall on IPv6 for these (the user interface lets me check some boxes, which I described above, but it obviously isn't turning off all the connection state tracking the way I want it to)...."
This is one possible interpretation of what is happening. Another is that they are still running NAT of some kind and doing a 1-to-1 translation on IPv6 rather than actually routing IPv6. That is less likely of course but without access to the code there is no way to know what is broken.
"...I see denied packets on IPv6 in the firewall logs on the BWG (unfortunately they just give counts, not any information about *what* is being dropped). This is despite the firewall being turned off...."
THAT is most definitely a bug. If the user interface allows the user to turn OFF the firewall - and even after so doing, the logs the user can access ae still showing that the firewall is blocking packets - then the firewall isn't being turned OFF no matter what the interface says.
I would suggest that this is the most likely avenue to pursue with a trouble ticket with Cisco. Frankly I couldn't care less if the firewall is working properly in the BWG - but if they allow us to turn it off - it should be turned off!!!
I know this was about a year ago, but did you ever find a solution? I am having the same issues with our network as well. I have a huge suspicion that it is with the DCP3941B as we have had issues with iPv6 settings. I've read that other people with the same problem have just switched out the modems to a Netgear or the SMC option.