Skip to content
onespeedfast's profile

New Member

 • 

8 Messages

Sunday, March 15th, 2015 2:00 PM

Serious packet loss and sluggish network when running a simple web crawler

I have a very basic libcurl web crawler that I use to identify trends & averages across the internet for things related to my business. 

 

Whenever the crawler is running - anywhere above 20 connections per second (ideal would be 200!) - computers on the network cannot visit websites, check email, or do anything.

 

I have tried everything to get to a tier 2 tech and I keep ending up at a generic customer support center that does the most basic troubleshooting. This time I ran the script while the rep "ran a disagnostic" and he found packet loss - but has no idea what we can do to configure the modem to deal with this.

 

The modem is a DPC3939B by Cisco. I use 5ghz wifi, 2.4 ghz wifi, and have one computer running linux that does the heavy lifting - connected directly to the modem.

 

More info about my setup & things i`ve tested:

 

Running libcurl with c-ares

Have tuned many dns settings, including round robin of about 20 top dns servers

Have tested running this direct to website IP's (no change)

Have tested up to 300 connections per second just converting names -> IP - this works great

The problem really seems to be in downloading about 400 kb per second from many different websites

 

 

Advocate

 • 

1.4K Messages

10 years ago

Hello onespeedfast and welcome,

 

So let's see what you are dealing with here and it would help if you could provide your current Comcast Internet Tier Bandwidth (CITB).

 

So, you mentioned 400KB per connection, which equal ~ 3.20 Mbsp and you are looking for 20 connections, so this means you have to have a minimum of 64 Mbps down and ?? Mbps up. So, for this to work for your specified business requirements, then you would need minimum 75 Mbps down and 15 Mbps up CITB.

 

Look forward to hearing from you. 

New Member

 • 

8 Messages

10 years ago

Hi,

 

Here is some clarification:

 

400kbps is the total for all the connections - its a very small footprint.

I have a 75/15 connection now - and the bandwidth checked with speedtest while my script is running, does not even change. Its a great connection - and i`m very pleased.

 

Any time the script is running the entire network slows down and ping times to google go from the 30's to the 400's. I understand i`m putting quite a bit of stress on the network, but I don`t understand why I have plenty of bandwidth. Oddly it could take me 2-3 minutes to get a slot in the available connections and have speedtest.net load - but once it does it works great.

 

I have kind of reached the end of what support can do so far, at least in terms of calling and having someone check things out here.

 

Thanks so much for asking for clarification!

New Member

 • 

8 Messages

10 years ago

Yes, the network looks like this:

 

One linux server directly wired to the modem

One windows 7 desktop wired to the modem

2 windows laptops using wifi

2 smart phones using wifi

 

All workstations & phones experience sluggish "normal" web use when the server is running a script. The network is literally unuseable - even though there is plenty of bandwidth. 

 

I saw info about PingPlotter in another thread, so i`ll try to document some of what is going on if that would be helpful.

 

Thanks!

Problem solver

 • 

305 Messages

10 years ago

To clarify here, these connection are being initiated from a computer directly wired into the modem? If so, I'd recommend plugging another computer into the modem and seeing if the same issue occurs. 

Problem solver

 • 

305 Messages

10 years ago

If everything becomes sluggish when the script runs then you know the script is the trigger, possibly even the root cause.  Are you able to slow down the script  bit and see if it makes a difference? 

Gold Problem solver

 • 

610 Messages

10 years ago

In regards to "traffic shaping", the same kind of speed limiting is done on residneital as is on business. Comcast doesn't throttle or purposely degrade connections past that; it's actually in their NBCUniversal merger deal requirements.

 

Your issues may be related to those another forum user posted recently. He/she experienced higher-than-normal latency directly at the DPC. It's here http://forums.businesshelp.comcast.com/t5/Equipment-Modems-Gateways/Why-is-there-5-10ms-latency-to-the-LAN-port-of-the-Cisco/td-p/23127/jump-to/first-unread-message

 

I will say, that I have a couple of Linux servers behind a DPC3939B, and I have not noticed any latency issues. The servers are used as an e-mail server and, separately, an offsite VPN file backup/media streaming server. I regularly stream a few gigabytes from it a day down to my home Comcast connection, and though I haven't been keeping an eye on latency that heavily, it works fine. It is a 50/10 connection.

 

If you post the script, I can run it & see if I see the same issues you are.

New Member

 • 

8 Messages

10 years ago

Yes, it happens anywhere from 20 to 200 connections per second. I`ve tried them all. Script works great on a dedicated server with a 50mbit connection - no packet loss or instability. This really feels exactly like the issues someone might experience when using torrents.

 

I am sure the script is the trigger - my concern is that there is plenty of bandwidth, but not plenty of "available connections". I know traffic shaping is very real on household cable, but is it used on business?

 

I have a colleague on some sort of local small biz cable in the Northeast, he says he was able to run my script without any of these issues - on a simple i5 mac mini. Apparently it does not affect their office use whatsoever.

 

And lastly - I tried pinging my static IP while the script was running - it was all over the place from 3ms to 100ms - pinging 10.1.10.1 is exactly the same - but pinging the server doing the work is a solid/steady 1ms.

New Member

 • 

8 Messages

10 years ago

Thanks for the info & offer - I actually read their entire FCC report a couple days ago to try to understand it.

 

Checking out the link you sent .. and i`ll get together a script. The most simple script possible that i`ve been using in testing is just a for loop through 10,000 urls at varying rates.

 

 

New Member

 • 

8 Messages

10 years ago

Ok here is a simplified version of the PERL script: http://pastebin.com/ek0f8NgV

Needs modules: Parallel::ForkManager & WWW::Curl::Easy

And then a randomized list of 10,000 domains: http://pastebin.com/hq4UFB3J

 

Some of the urls will timeout, this is a setting in CURL that is necessary in order to do any real work - I can adjust timeouts and threads and visit less domains, but that would defeat the purpose & timeframe I need to stick to.

 

In a perfect world I could open 200 connections per second and wait as long as needed for them to time out - and in a worst case scenario i`ll run the time out really low and increase it after the first visit to the entire database.

 

 

New Member

 • 

8 Messages

10 years ago

Here are some screenshots of monitors on the network:

 

http://imgur.com/lNIFvOw,1Xbi5sx,iXUlXIK,qWtZ6Co