LinHES Forums
http://forum.linhes.org/

R5.5 backend suddenly starts to crash (segfault) SOLVED!
http://forum.linhes.org/viewtopic.php?f=6&t=20195
Page 1 of 1

Author:  cahlfors [ Sat Aug 01, 2009 6:50 am ]
Post subject:  R5.5 backend suddenly starts to crash (segfault) SOLVED!

Hi all,
After a year's faithful service, my R5.5 installation started acting up tonight at around midnight. :shock: The only thing I was doing out of the ordinary was a DVD archive creation for the first time on this hardware, which went successfully but didn't complete until 6am this morning (it's normal to take very long) - long after mythbackend had crashed.

Here is output from the syslog:
Quote:
Aug 1 14:26:41 Fido mysqld_safe[14487]: started
Aug 1 14:26:41 Fido mysqld[14490]: 090801 14:26:41 InnoDB: Started; log sequence number 0 102512
Aug 1 14:26:41 Fido mysqld[14490]: 090801 14:26:41 [Note] /usr/sbin/mysqld: ready for connections.
Aug 1 14:26:41 Fido mysqld[14490]: Version: '5.0.32-Debian_7etch1' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Debian etch distribution
Aug 1 14:26:42 Fido /etc/mysql/debian-start[14527]: Upgrading MySQL tables if necessary.
Aug 1 14:26:43 Fido /etc/mysql/debian-start[14544]: Checking for crashed MySQL tables.
Aug 1 14:26:44 Fido kernel: eth0: freeing mc frame.
Aug 1 14:27:01 Fido /USR/SBIN/CRON[14581]: (mythtv) CMD (/usr/local/bin/babysit_backend.sh >>/var/log/mythtv/babysit_backend.log 2>&1)
Aug 1 14:27:34 Fido kernel: smb_proc_readdir_long: error=-2, breaking
Aug 1 14:28:01 Fido /USR/SBIN/CRON[14631]: (mythtv) CMD (/usr/local/bin/babysit_backend.sh >>/var/log/mythtv/babysit_backend.log 2>&1)
Aug 1 14:28:59 Fido kernel: mythbackend[14568]: segfault at aecb7f90 eip b59ec26c esp aecb7f94 error 6

The mythbackend.log doesn't say much (?):
Quote:
2009-08-01 14:26:43.373 Using runtime prefix = /usr
2009-08-01 14:26:43.380 Empty LocalHostName.
2009-08-01 14:26:43.381 Using localhost value of Fido
2009-08-01 14:26:43.409 New DB connection, total: 1
2009-08-01 14:26:43.420 Connected to database 'mythconverg' at host: Fido
2009-08-01 14:26:43.426 Closing DB connection named 'DBManager0'
2009-08-01 14:26:43.429 Connected to database 'mythconverg' at host: Fido
2009-08-01 14:26:43.435 New DB connection, total: 2
2009-08-01 14:26:43.437 Connected to database 'mythconverg' at host: Fido
2009-08-01 14:26:43.448 Current Schema Version: 1214
Starting up as the master server.
2009-08-01 14:26:43.482 New DB connection, total: 3
2009-08-01 14:26:43.485 Connected to database 'mythconverg' at host: Fido
2009-08-01 14:26:44.330 TVRec(16) Error: Problem finding starting channel, setting to default of '3'.
2009-08-01 14:26:44.333 ChannelBase(16) Error: InitializeInputs():
Could not get inputs for the capturecard.
Perhaps you have forgotten to bind video
sources to your card's inputs?
2009-08-01 14:26:44.339 New DB scheduler connection
2009-08-01 14:26:44.341 Connected to database 'mythconverg' at host: Fido
2009-08-01 14:26:45.563 Main::Registering HttpStatus Extension
2009-08-01 14:26:45.565 mythbackend version: 0.21.20080304-1 www.mythtv.org
2009-08-01 14:26:45.566 Enabled verbose msgs: important general
2009-08-01 14:26:45.571 AutoExpire: CalcParams(): Max required Free Space: 15.0 GB w/freq: 15 min
2009-08-01 14:26:47.388 Reschedule requested for id -1.
2009-08-01 14:26:53.104 Scheduled 324 items in 5.7 = 0.36 match + 5.32 place
2009-08-01 14:26:53.115 AUTO-Startup assumed
2009-08-01 14:26:57.952 UPnpMedia: BuildMediaMap VIDEO scan starting in :/mnt:
2009-08-01 14:28:04.350 AutoExpire: CalcParams(): Max required Free Space: 15.0 GB w/freq: 15 min

(There is a usb capture device attached that I never managed to get working. There is an error pertaining to this - TVRec16.)

The backend only lives for about two minutes before crashing. I've tried the mysql repair procedure described in the "repairing broken MySQL tables" thread. It found no errors. I also tried the optimize_db.sh. No reported errors that I can see.

What steps should I take to diagnose and correct?

Thanks for any help,
/Chris

Author:  tjc [ Sat Aug 01, 2009 9:21 am ]
Post subject: 

Check all the usual things you can run out of (disk space, inodes, RAM, CPU bandwidth, ...) using top and the rrd log pages. Also pay attention to the temperatures. It is that time of year in the northern hemisphere, and I've been finding myself making a bunch of minor adjustments to keep the production box from overheating.

If none of those are problems my next guess would be a hardware problem. Open the box up and vacuum or dust it out (carefully you don't want to suck components off the boards), as excess dust can cause thermal problems and even shorts. Also make sure that everything is still seated properly. Next run memcheck for at least several hours, and check your disk to make sure it's not failing (possibly the SMART logs although they're not definitive),

Author:  cahlfors [ Sat Aug 01, 2009 1:15 pm ]
Post subject: 

How very odd! I see no signs of hardware issues, blew the dust out of the system, powered off and disconnected - no improvement.

The backend was very busy when started - I wonder what it was doing? Thinking it might have something to do with autoexpire (Iarge file delete), I manually deleted the first file in the autoexpiry list from the menu. It went quickly and after that the backend stood for a little longer.

But it looks more like it has to do with Samba. To run archive, I had set up a Samba mount. That's when the trouble started. I unmounted it again an hour ago and the backend is still standing.

This is the line from /etc/fstab, which is the exact line I'm using successfully on my frontends:
Code:
//NAPOLEON/data /mnt/NAPOLEON smbfs lfs,username=user,password=secret,uid=mythtv,gid=mythtv 0 0


I'm still puzzled, but the situation is better since the backend is still running. :?

/Chris

Author:  tjc [ Sat Aug 01, 2009 1:44 pm ]
Post subject: 

Well that is the first rule of debugging. It's also the 2nd and 3rd rule.

Rule #1) What changed, what's different.
Rule #2) No really, what changed or what's different
Rule #3) ...

Debugging "war" story time...

I was talking to one of our hardware engineers earlier in the week, who had just spent several *very* long days debugging a network access problem at a customer site. There was a major problem with about 1/4 to 1/3 of the access points. The hardware engineers had asked over and over what was different about those APs, was there any pattern or commonality, and repeatedly been told that there wasn't anything different or common to the failing APs... Until, after _days_ of this and just hours before we flew two people out to the site, the customer finally revealed that all the ones that were working were plugged into one switch, and all the ones that were failing were plugged into another...

Duh-oh!

Author:  cahlfors [ Sun Aug 02, 2009 7:46 am ]
Post subject: 

Googling on the issue turns up no answers, but there are comments that suggest that it might have to do with the fact that the Samba project has been "abandoned" and "replaced" with CIFS. I knew that, but thought that it was just a name change in the same project - the developers being the same and all.

So, my Samba mount line in /etc/fstab that I have been using since I set KnoppMyth up for the first time:
Code:
//NAPOLEON/data /mnt/NAPOLEON smbfs lfs,username=user,password=secret,uid=mythtv,gid=mythtv 0 0

is in fact using abandoned Samba code.

I'm now trying this instead:
Code:
//NAPOLEON/data /mnt/NAPOLEON cifs username=user,password=secret,uid=mythtv,gid=mythtv 0 0

Note that "smbfs" is now replaced with "cifs" and "lfs" (large file support, i e >2GiB) is no longer needed (or so I assume - there are no such switches in mount.cifs).

Let's see if this works better!

I still don't have any clue as to why an old Samba mount would break the backend, though.

/Chris

Author:  cahlfors [ Mon Aug 03, 2009 3:04 am ]
Post subject: 

Yep, a day later and it's still standing! :)

CIFS instead of Samba seems to be the way to go.

/Chris

Author:  manicmike [ Mon Aug 03, 2009 4:39 pm ]
Post subject: 

cahlfors wrote:
CIFS instead of Samba seems to be the way to go.
/Chris


Hi Chris,

You seem to be confusing samba with smbfs.

Smb means "server message block" http://en.wikipedia.org/wiki/Server_Message_Block and is a Microsoft thing. smbfs and cifs allow the linux kernel to read Windows network file systems, but Samba is a software suite http://us6.samba.org/samba/.

Samba is not smbfs.

Mike

Page 1 of 1 All times are UTC - 6 hours
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/