View unanswered posts    View active topics

All times are UTC - 6 hours





Post new topic Reply to topic  [ 13 posts ] 
Print view Previous topic   Next topic  
Author Message
Search for:
PostPosted: Fri Feb 20, 2009 8:17 pm 
Offline
Joined: Tue Jan 18, 2005 2:07 am
Posts: 1532
Location: California
I have two machines on the network, both with r5.5 installed. Here are two scenarios:

1. NFS mount remote file system. Copy a 450MB file from the local machine to the NFS server. Throughput is 59Mbits/second. When I run top, between 50% and 80% of the client's dual-cores go to "wa" (I/O wait)

2. CIFS mount remote file system. Copy a 450MB file from the local machine to the Samba server. Throughput is 95Mbits/second. When I run top, virtually no client time is going to "wa".

It's surprising that CIFS/ Samba is delivering better performance than NFS, but the massive amount of CPU going to I/O wait under NFS is fatal. If I have a recording going at the same time on the client machine, I get buffer overruns in the backend. Has anyone else seen this and any ideas how to correct it? Thanks. Specs for both machines are below.

Marc

Client: Dual-core E8400, 3.0ghz 2GB RAM, gigabit network interface, KM R5.5.

Server: P4-2.8ghz, 2GB RAM, gigabit network interface, KM R5.5.

_________________
Marc

The views expressed are my own and do not necessarily reflect the views of my employer.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Feb 21, 2009 3:37 am 
Offline
Joined: Wed Dec 10, 2003 8:31 pm
Posts: 1996
Location: /dev/null
Can't answer your specific question about NFS/Samba, but I too have observed Samba giving superior xfer speeds on my old hardware (Athlon XP-based systems). As a side note, you might be able to further boost Samba xfer speeds. See this post for a suggested mod to the /etc/samba/smb.conf that worked for me.

Are you behind a switch and if so, is it and are your NIC's able to use jumbo frames? If the answer to that is yes, you may further accelerate your LAN xfers by enabling jumbo frames.

P.S. your CPU usage seems excessive to me given your hardware specs. Are both those NICs on-board or runnig in a PCI slot?

_________________
Retired KM user (R4 - R6.04); friend to LH users.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Feb 21, 2009 11:55 am 
Offline
Joined: Tue Jan 18, 2005 2:07 am
Posts: 1532
Location: California
graysky, thanks for responding. You're post triggered a thought and I've found out what's going on with the CPU utilization -- winds up this is "normal". I tried doing a large file copy from a local disk to another local disk and noticed that I/O wait times also spike in that scenario. It winds up that high CPU I/O wait times is normal during file I/O and that this high utilization will not prevent another process from using the CPU, so I've been chasing a ghost. I also found this threadwhich explains a bit more.

I ran an experiment and verified that with two CPU intensive processes running concurrently with either the local copy or the NFS copy, I/O wait times drop to almost 0 as the other processes CPU utilization goes to 100%.

I've tried jumbo frames in the past and they have not provided a significant improvement. I seem to recall finding an article that claimed that jumbo frames were mostly useful on older, slower equipment and not as much of an advantage on new, fast equipment, so perhaps that's why it didn't help me. Both the NIC's are on the motherboard.

I am still left with the question of why NFS is slower than Samba, why the backend will suffer buffer overruns and I am facing another problem:

When my network interface is under intense load I will suddenly loose network connectivity and will find the error messages shown below in kern.log. In this case the intense I/O was happening over Samba. This failure is happening on the server machine. In this scenario, the hardware is as follows:

Server: Intel core duo E8400, MSI P6NG Neo-Digital motherboard with integrated Realtek 8201CL NIC, 2GB RAM, Knoppmyth R5.5.

Client: Windows Vista on Dell desktop, intel core 2 duo Q6600, 4GB RAM, integrated NIC.

I've seen this happen periodically with the server and have seen it happen with different clients. It only happens under intense load, and I am wondering if it is a driver issue.

Any thoughts on this one?

Marc

Quote:
Feb 20 19:27:50 mythhd kernel: NETDEV WATCHDOG: eth0: transmit timed out
Feb 20 19:27:50 mythhd kernel: eth0: Got tx_timeout. irq: 00000032
Feb 20 19:27:50 mythhd kernel: eth0: Ring at 2f814000
Feb 20 19:27:50 mythhd kernel: eth0: Dumping tx registers
Feb 20 19:27:50 mythhd kernel: 0: 00000032 000000ff 00000003 024803ca 00000000 00000000 00000000 00000000
Feb 20 19:27:50 mythhd kernel: 20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

_________________
Marc

The views expressed are my own and do not necessarily reflect the views of my employer.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Feb 21, 2009 1:12 pm 
Offline
Joined: Thu Mar 25, 2004 11:00 am
Posts: 9551
Location: Arlington, MA
marc.aronson wrote:
I seem to recall finding an article that claimed that jumbo frames were mostly useful on older, slower equipment and not as much of an advantage on new, fast equipment, so perhaps that's why it didn't help me.

Well it's actually more complicated than that, what you're really dealing with is the overhead and latencies (turn around time, etc.) versus the volume of data. Generally the faster the raw bit rate on a communications channel of any kind is, the the bigger the chunks you want to send to maximize throughput over it.

There are analogies in other parts of CS, for example if you have a 8 core box running a java application with multiple threads (roughly speaking 1.5x to 2.5x the number of cores is usually optimal) and you have single threaded garbage collection, the throughput of the system drops by about ((Nores-1)/Ncores)) * (GCTime/TotalTime). This is almost exactly the same mechanism observed with high speed comms when the transmit side of that HUGE pipe is waiting for a tiny little ACK message from the other side. (in case it's not clear the GC time maps to time spent waitng for an ACK so that you can "close the books" on a transmitted packet.) The throughput graph looks like this, where al the white space on the chart represent wasted bandwidth:
Code:
# #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  #
# #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  #
# #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  #
# #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  #
# #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  #
# #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  # # # #  #
# ## # # # #### # # #  # # # #  # # # #  # # # #  # # # #  ### # #  #
######################################################################

(Vertical axis is data/work volume, horizontal axis is time)

We're also not even counting other limiting factors here, like disk and system bus speeds, simplex versus duplex transmission, ... but the main effect they have is to lower the vertical peaks on the graph and make the gaps on the horizontal axis longer or shorter. Real world examples would also look slightly more chaotic as well with both broader and more irregular peaks and gaps, however, I can only do so much with ASCII graphics and the basic shape remains the same. ;-)

I'll have to check with my math and operations research friends to be sure, but I think that the fundamental result here comes from queueing theory.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 24, 2009 6:07 am 
Offline
Joined: Tue Jan 18, 2005 2:07 am
Posts: 1532
Location: California
I'm still trying to chase down the "NETDEV WATCHDOG: eth0: transmit timed out" problem as I just got nailed again. My MSI P6NG Neo-Digital motherboard has a Realtek 8201CL controller. When I do an lspci I see
Quote:
00:0f.0 Ethernet controller: nVidia Corporation MCP73 Ethernet (rev a2)
Subsystem: Micro-Star International Co., Ltd. Unknown device 7505
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 16
Memory at fe977000 (32-bit, non-prefetchable) [size=4K]
I/O ports at c880 [size=8]
Memory at fe97e800 (32-bit, non-prefetchable) [size=256]
Memory at fe97e400 (32-bit, non-prefetchable) [size=16]
Capabilities: [44] Power Management version 2
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/3 Enable-


Does it make sense that the ethernet controller is being identifes as being from Nvidia given that its a Realtek controller?

_________________
Marc

The views expressed are my own and do not necessarily reflect the views of my employer.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 24, 2009 11:42 am 
Offline
Joined: Tue Jan 18, 2005 2:07 am
Posts: 1532
Location: California
OK, I'm starting to understand a bit more but I need to ask a question: How can I determine which driver was loaded for my integrated ethernet device? The presence of the following messages leads me to believe its the "forcedeth" driver from nvidia:
Quote:
Feb 24 07:13:31 mythhd kernel: forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.60.
Feb 24 07:13:31 mythhd kernel: forcedeth: using HIGHDMA
Feb 24 07:13:31 mythhd kernel: eth0: forcedeth.c: subsystem: 01462:7505 bound to 0000:00:0f.0


Having said this, when I do an "lsmod" I don't see forcedeth listed -- below is what I see. I have a suspicion that this is an age-old issue with instability in the nvidia forcedeth driver and the work-around involves reloading the driver. Thanks for any help you can provide!

Marc

Quote:
Module Size Used by
nvidia 7100068 36
autofs4 22148 1
nfsd 219380 13
exportfs 8448 1 nfsd
lirc_pvr150 19512 3
lirc_dev 16132 1 lirc_pvr150
fintek71882 9988 0
hwmon 6404 1 fintek71882
ipv6 257956 33
af_packet 24584 0
agpgart 30808 1 nvidia
fuse 42900 0
raw1394 27388 2
dv1394 20572 0
pcmcia 36524 0
yenta_socket 26764 0
rsrc_nonstatic 14720 1 yenta_socket
pcmcia_core 36504 3 pcmcia,yenta_socket,rsrc_nonstatic
video 19472 0
output 6912 1 video
sbs 19848 0
fan 7684 0
dock 11668 0
container 7552 0
joydev 13248 0
battery 13832 0
ac 7940 0
aufs 126084 0
ftdi_sio 36360 0
usbhid 42496 0
ff_memless 8840 1 usbhid
usb_storage 79680 0
usbserial 34024 1 ftdi_sio
uhci_hcd 26000 0
nvram 11144 0
lgdt330x 12164 1
mt352 9988 0
dvb_pll 15652 2
stv0299 13576 0
nxt200x 16900 1
saa7134_dvb 18444 1
wm8775 9644 0
cx25840 29772 0
videobuf_dvb 8580 1 saa7134_dvb
tda1004x 19076 1 saa7134_dvb
ivtv 139200 3 lirc_pvr150
saa7115 19404 0
msp3400 33612 0
snd_hda_intel 347544 0
tuner 37712 0
tea5767 9860 1 tuner
tda8290 16132 1 tuner
tda18271 15620 1 tda8290
tda827x 13700 1 tda8290
tuner_xc2028 22800 1 tuner
tda9887 13188 1 tuner
tuner_simple 12424 1 tuner
mt20xx 15624 1 tuner
tea5761 8324 1 tuner
snd_pcm_oss 40608 0
snd_mixer_oss 18304 1 snd_pcm_oss
b2c2_flexcop_pci 11288 1
b2c2_flexcop 28428 1 b2c2_flexcop_pci
i2c_algo_bit 9604 1 ivtv
saa7134 125008 1 saa7134_dvb
snd_pcm 70916 2 snd_hda_intel,snd_pcm_oss
cx2341x 15236 1 ivtv
compat_ioctl32 5120 1 saa7134
videobuf_dma_sg 14724 3 saa7134_dvb,videobuf_dvb,saa7134
videobuf_core 18564 3 videobuf_dvb,saa7134,videobuf_dma_sg
ir_kbd_i2c 11664 1 saa7134
dvb_core 74656 4 lgdt330x,stv0299,videobuf_dvb,b2c2_flexcop
snd_timer 23300 1 snd_pcm
snd_page_alloc 11912 2 snd_hda_intel,snd_pcm
snd_hwdep 11012 1 snd_hda_intel
videodev 30336 2 ivtv,saa7134
v4l2_common 19712 9 wm8775,cx25840,ivtv,saa7115,msp3400,tuner,saa7134,cx2341x,videodev
v4l1_compat 17668 2 ivtv,videodev
ir_common 34180 2 saa7134,ir_kbd_i2c
firmware_class 11392 9 lirc_pvr150,pcmcia,nxt200x,saa7134_dvb,cx25840,tda1004x,ivtv,tuner_xc2028,b2c2_flexcop
snd 52644 6 snd_hda_intel,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer,snd_hwdep
tveeprom 18320 2 ivtv,saa7134
thermal 16540 0
ohci_hcd 23556 0
ehci_hcd 34572 0
i2c_core 23680 30 nvidia,lirc_pvr150,lgdt330x,mt352,dvb_pll,stv0299,nxt200x,saa7134_dvb,wm8775,cx25840,tda1004x,ivtv,saa7115,msp3400,tuner,tea5767,\
tda8290,tda18271,tda827x,tuner_xc2028,tda9887,tuner_simple,mt20xx,tea5761,b2c2_flexcop,i2c_algo_bit,saa7134,ir_kbd_i2c,v4l2_common,tveeprom
pcspkr 6528 0
serio_raw 9348 0
button 10128 0
processor 32296 1 thermal
soundcore 10080 1 snd
rtc_cmos 11168 0
rtc_core 18568 1 rtc_cmos
rtc_lib 6656 1 rtc_core
evdev 12928 0
tsdev 11456 0
usbcore 125448 8 ftdi_sio,usbhid,usb_storage,usbserial,uhci_hcd,ohci_hcd,ehci_hcd
sbp2 23048 0
ohci1394 32432 2 dv1394
ieee1394 83896 4 raw1394,dv1394,sbp2,ohci1394

_________________
Marc

The views expressed are my own and do not necessarily reflect the views of my employer.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Feb 24, 2009 12:33 pm 
Offline
Joined: Tue Nov 14, 2006 2:55 pm
Posts: 245
Location: South Jersey
Marc,

Are you sure you're at full duplex? Have you looked at the netstat -ni output for errors? Have you tried using a cat 6 cable and/or removing the hubs/switches?

_________________
R6.04, dual core 3ghz, 3 gig memory, Zotac 8400 passive heat sink dvi/hdmi out video, 500 gig sata, dual tuner hdhomerun, streamzap remote

Abby


Top
 Profile  
 
 Post subject:
PostPosted: Wed Feb 25, 2009 2:29 am 
Offline
Joined: Tue Jan 18, 2005 2:07 am
Posts: 1532
Location: California
abigailsweetashoney, good suggestion but it's not the problem. I started to have problems with throughput on the network dropping way down, so I put a spare gigabit PCI card I have into the system and I am seeing transfer rates faster than I've even seen before -- 400mbits/second of data on a samba transfer from the myth box to a windows vista box. At this point I very suspicious that it's either a hardware issue with the on-board NIC or a driver issue.

It looks like nvidia provides the integrated NIC driver -- if I update to the latest nvidia driver will it also update me to the latest nvidia NIC driver?

Marc

_________________
Marc

The views expressed are my own and do not necessarily reflect the views of my employer.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 01, 2009 8:56 am 
Offline
Joined: Tue Jan 18, 2005 2:07 am
Posts: 1532
Location: California
marc.aronson wrote:
I'm still trying to chase down the "NETDEV WATCHDOG: eth0: transmit timed out" problem as I just got nailed again. My MSI P6NG Neo-Digital motherboard has a Realtek 8201CL controller. When I do an lspci I see
Quote:
00:0f.0 Ethernet controller: nVidia Corporation MCP73 Ethernet (rev a2)
Subsystem: Micro-Star International Co., Ltd. Unknown device 7505
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 16
Memory at fe977000 (32-bit, non-prefetchable) [size=4K]
I/O ports at c880 [size=8]
Memory at fe97e800 (32-bit, non-prefetchable) [size=256]
Memory at fe97e400 (32-bit, non-prefetchable) [size=16]
Capabilities: [44] Power Management version 2
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/3 Enable-


Does it make sense that the ethernet controller is being identifes as being from Nvidia given that its a Realtek controller?


After 6 months of chasing this problem on an on-again / off-again basis I've finally nailed it down to two discrete issues:

1. A bad coupler between two cables was causing the problem to occur with increasing frequency -- as often as every few minutes under load. It was located in a vulnerable spot and I suspect it got "bongo'ed" during a cleaning of that room. I replaced it was a new coupler and the problem frequency reduced itself to only happening every 2-3 hours when under intense load.

2. I then reduced the "max xmit" and "buffer size" Samba parameters in /etc/samba/smb.conf from 65535 to 8192. I have been running for 5 hours under load without a problem.

I'm not sure I understand why step # 2 made a difference, but all is well that ends well...

_________________
Marc

The views expressed are my own and do not necessarily reflect the views of my employer.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 01, 2009 2:13 pm 
Offline
Joined: Sun Sep 25, 2005 3:50 pm
Posts: 1013
Location: Los Angeles
marc.aronson wrote:
How can I determine which driver was loaded for my integrated ethernet device?


Ok, I know I'm late to the party and it may not matter to you anymore, but if you want to know what ethernet driver is being used for a particular interface:
Code:
[mythtv@mythbox-mbe ~]$ sudo ethtool -i eth0
driver: e1000e
version: 0.3.3.3-k6
firmware-version: 0.15-4
bus-info: 0000:0d:00.0


FWIW, I'm having issues on my main workstation box when I transfer large files (25 GB-ish) to my backend machine over NFS. The driver for the onboard NIC in my workstation is forcedeth. I was seeing some RX errors on the workstation and noticed the auto negotiation was selecting 100Mbit/s full duplex when it should be choosing 1000Mbit/s full duplex. I swapped out the NIC yesterday (installed a RealTek based r8169 driver) but it did not help matters. Speed and duplex with the Realtek NIC was correctly selected (1000/full). iperf showed stellar throughput (900+ Mbit/s), but real-world file transfers (via cp, mv and rsync) were stalling and taking forever.

Given the info I've stumbled upon in this thread, I may give SAMBA a try. In the meantime, I found that pulling the large files over NFS (ssh into server, mount workstation over NFS and initiate the transfer from the server side) was much more stable and faster than pushing them (workstation to NFS mounted server) but transfer speeds were no where near the 900+ Mbit/s I saw with iperf. More like a steady 200 Mbit/s.

_________________
Mike
My Hardware Profile


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 01, 2009 4:27 pm 
Offline
Joined: Sun Sep 25, 2005 3:50 pm
Posts: 1013
Location: Los Angeles
mihanson wrote:
Given the info I've stumbled upon in this thread, I may give SAMBA a try. In the meantime, I found that pulling the large files over NFS (ssh into server, mount workstation over NFS and initiate the transfer from the server side) was much more stable and faster than pushing them (workstation to NFS mounted server) but transfer speeds were no where near the 900+ Mbit/s I saw with iperf. More like a steady 200 Mbit/s.

Had a chance to get SAMBA installed and running. Results are slightly better than my "pull" scenario above. With SAMBA I can push the large file from workstation to server at about 23-27MB/s or about 175-200Mbit/s. Way better than "pushing" over NFS, but about the same as "pulling."

_________________
Mike
My Hardware Profile


Top
 Profile  
 
 Post subject:
PostPosted: Sat Oct 03, 2009 11:15 am 
Offline
Joined: Tue Jan 18, 2005 2:07 am
Posts: 1532
Location: California
marc.aronson wrote:
marc.aronson wrote:
I'm still trying to chase down the "NETDEV WATCHDOG: eth0: transmit timed out" problem as I just got nailed again. My MSI P6NG Neo-Digital motherboard has a Realtek 8201CL controller. When I do an lspci I see
Quote:
00:0f.0 Ethernet controller: nVidia Corporation MCP73 Ethernet (rev a2)
Subsystem: Micro-Star International Co., Ltd. Unknown device 7505
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 16
Memory at fe977000 (32-bit, non-prefetchable) [size=4K]
I/O ports at c880 [size=8]
Memory at fe97e800 (32-bit, non-prefetchable) [size=256]
Memory at fe97e400 (32-bit, non-prefetchable) [size=16]
Capabilities: [44] Power Management version 2
Capabilities: [50] Message Signalled Interrupts: Mask+ 64bit+ Queue=0/3 Enable-


Does it make sense that the ethernet controller is being identifes as being from Nvidia given that its a Realtek controller?


After 6 months of chasing this problem on an on-again / off-again basis I've finally nailed it down to two discrete issues:

1. A bad coupler between two cables was causing the problem to occur with increasing frequency -- as often as every few minutes under load. It was located in a vulnerable spot and I suspect it got "bongo'ed" during a cleaning of that room. I replaced it was a new coupler and the problem frequency reduced itself to only happening every 2-3 hours when under intense load.

2. I then reduced the "max xmit" and "buffer size" Samba parameters in /etc/samba/smb.conf from 65535 to 8192. I have been running for 5 hours under load without a problem.

I'm not sure I understand why step # 2 made a difference, but all is well that ends well...


And the saga continues. While I did achieve stability when doing samba-based copies from a WIndows box to my mythtv box, I subsequently found the problem occurred frequently when watching recordings stored on the mythtv box and played on my Networked Media Tank. The NMT uses NFS to mount the mythtv file system.

Many experiments later I tried adding the "NOAPIC" option to the LILO boot options. Things were stable enough that I have tried putting "max xmit" and "buffer size" back to 65535. I am still running tests, but things so far look good. Of course, I've felt that way before and then had the problem come back.

Does anyone understand why the NOAPIC option might resolve the problem I am seeing?

_________________
Marc

The views expressed are my own and do not necessarily reflect the views of my employer.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Oct 05, 2009 12:47 am 
Offline
Joined: Tue Jan 18, 2005 2:07 am
Posts: 1532
Location: California
I ran 2 5-hour stress tests and the network remained stable at all times. I restored all buffers to larger sizes for these tests. During the tests I had the follow 3 concurrent jobs:

1. Samba-based copies between myth box and Windows Vista box at gigabit speeds.

2. NFS-based playback from myth box to linux-based NMT (Networked Media Tank) box. The NMT has 100mbps NIC.

3. ftp-based copies from myth box to a Linux-based NAS (brand=Airlink101). The Airlink NAS has a 100mbps NIC.

So it looks like the "noapic" option was the key.

_________________
Marc

The views expressed are my own and do not necessarily reflect the views of my employer.


Top
 Profile  
 

Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 


All times are UTC - 6 hours




Who is online

Users browsing this forum: No registered users and 18 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group

Theme Created By ceyhansuyu