LinHES Forums • View topic - Windows-based North America Scraper

Board index Linux MythTV

All times are UTC - 6 hours

Windows-based North America Scraper

Page 1 of 1

[ 9 posts ]

Print view

Previous topic Next topic

Author

Message

whitepines

Post subject: Windows-based North America Scraper

Posted: Mon Sep 03, 2007 7:22 pm

Joined: Tue Apr 04, 2006 3:47 pm

Posts: 43

I have created a Windows-based XML TV scraper that uses TVNow as its source.

Enter your zip code, click Initialize, and select your data source. Select a location for the output XML file and click scrape.

I have not yet checked the MythTV import, as I still have some data from DataDirect. As soon as my DataDirect data is gone, I will switch over to this source.

You can download an executable and the source in a zip file here:
http://www.pearsoncomputing.net/public_ ... craper.zip

Let me know how it works for you!

Tim

EDIT: I would like to mention a couple of things that I forgot to in the first post:
1. PLEASE, PLEASE do NOT abuse this tool!!! There is a reason that I only allow it to scrape out to 7 days; we do not want to force them to take anti-scraping measures!!!
2. This is currently alpha-level software--I only put it out here as an alternative to a paid guide service, since no one else has posted anything yet.
3. If you find a bug, see if you can track it down in the VB source and fix it.

If you don't know VB, then please let me know about it and I will do my best to fix it.
4. As this is a screen scraper, any data gathered may or may not be accurate, and the data source may break this tool at any time.

Thanks!

EDIT2: It looks like someone beat me to this idea, but they are using a different data source (Yahoo TV) which does not support HDTV broadcasts in my area. The data source that I am using (TVNow) supports HDTV broadcast listings in my area.

Last edited by whitepines on Tue Sep 04, 2007 6:28 pm, edited 1 time in total.

Top

whitepines

Post subject:

Posted: Tue Sep 04, 2007 12:57 pm

Joined: Tue Apr 04, 2006 3:47 pm

Posts: 43

I have fixed a couple of critical bugs and have updated the file above. If you already downloaded this prior to 9/4/07, please download the updated version.

Also note that DST is in effect right now, so you will need to add one hour to your current GMZ timezone; e.g -0600 standard central time becomes -0500.

Thanks!

Tim

Top

alga

Post subject: Scrapper

Posted: Tue Sep 04, 2007 1:04 pm

Joined: Mon Feb 20, 2006 10:03 am

Posts: 1

Whitepines,

Thanks for posting the scrapper.

Unfortunately, it seems not to work with my Canadian Zip, Montreal located.

Top

whitepines

Post subject:

Posted: Tue Sep 04, 2007 2:49 pm

Joined: Tue Apr 04, 2006 3:47 pm

Posts: 43

Alga,

I just checked the data source (www.tvnow.com) and it looks like they only have data for the United States.

Sorry about that!

Tim

Top

whitepines

Post subject:

Posted: Tue Sep 04, 2007 7:42 pm

Joined: Tue Apr 04, 2006 3:47 pm

Posts: 43

I went ahead and tested MythTV import, and it worked beautifully. Unless you are heavily reliant upon the category (drama, talk show, etc.) feature, you shouldn't even know the data source has changed.

You may ask why I didn't wait for the DataDirect data that I have to expire? Well, it would seem that as a parting gift to those who didn't disable the DataDirect service, they pushed a bunch of incorrect guide data to my machine (wrong movie names, etc.)! I was not happy... :evil:

Tim

EDIT: OK, here's some info on how to integrate this into your Myth box with minimum disruption to the usage of your Mythbox:
(Note that I will be assuming fluent useage of Windows and Linux here)
1.) On the Windows machine, set a scheduled task to run the grabber. Make sure that you have initialized the scraper and selected a data source at least once (also make sure to do this if you clear your cookies!), and also check the Autorun and Exit checkbox.
2.) On that same machine, create another task what runs 1 hour later to move the XML file to your MythTV box via SCP or any other method
3.) On your MythTV box, set a scheduled task to run this command. The source ID can be found in the mythconverg mysql database:

Code:

mythfilldatabase --file <<source id>> 1 <<XML file location>>

4.) Edit this file /usr/share/mythtv/mythweb/modules/tv/canned_searches.conf.php (it may be in a different location; run locate canned_searches.conf.php to find it) Replace

Code:

t('Movies')
        => 'category_type="movie"',

with

Code:

t('Movies')
        => 'category="Movies"',

(This is just to fix the Canned:movies search; I've gotten rather used to it!

)
6.) Make sure that your XML channel identifiers consist of the channel number like MythTV does it; i.e. for SDTV: 13, 25, 16, etc. For HDTV: 13_1, 15_2, 17_1, etc. This information can be edited in the Settings tab of MythWeb.

If you have questions, don't hesitate to contact me!

Last edited by whitepines on Tue Sep 04, 2007 8:53 pm, edited 1 time in total.

Top

whitepines

Post subject:

Posted: Tue Sep 04, 2007 8:52 pm

Joined: Tue Apr 04, 2006 3:47 pm

Posts: 43

OK, updates are done. Autorun is now an option.

Grab the updated version (v1.2) from the same link above.

Top

whitepines

Post subject:

Posted: Wed Sep 05, 2007 10:30 am

Joined: Tue Apr 04, 2006 3:47 pm

Posts: 43

I would like to throw an idea out here:

The problem with screen scrapers, historically, has been that you have tens of thousands of people all trying to download the same data over and over.

What if I was to create a free, community based TV listings site, that would allow users to manually enter data or upload scrape results? This way, we could have multiple sources for the data, but MythTV would only have to pull an XML file off of the project website? Also, it would reduce scraping loads considerably, as we would only have to have one person per zip code scrape and upload to the site. The XML files are small (< 500Kb in most cases for a few days of data), so it would not create much network load to transfer them.

If you have any thoughts on the matter, please share them. I will look into a way to do this using MySQL and PHP, and I can host the system within certain bandwith limits.

I am especially interested in legality in North America of this type of thing. I think it should be legal, as I have not seen any anti-scrape notices on the source I use, and as I am simply a repository for data that users have uploaded, I believe I can legally host the repository as long as I remove data if formally asked to by the data owner?

Thanks!

Tim

Top

Yeraze

Post subject:

Posted: Wed Sep 05, 2007 11:34 am

Joined: Thu May 11, 2006 7:42 pm

Posts: 34

I think it's a great idea, but legally I think you're gonna get ambushed. Check this:

Quote:

Proprietary Rights of TV Guide.
Except for User Submissions, TV Guide owns or licenses all rights in the tvguide.com website and all content and services offered on or in connection with TV Guide or tvguide. com. tvguide.com contains copyrighted material, trademarks, and other proprietary information including text, software, photos, video, graphics, music and sound. TV Guide owns the copyright in the selection, coordination, arrangement and enhancement of such content, as well as in all content original to it. Each third party content provider owns the copyright in content original to it. You agree that you will not circumvent, disable or otherwise interfere with security related features of the website or features that prevent or restrict use or copying of any content on the website. Except for that information which is in the public domain or for which you have been given written permission by TV Guide or the copyright owner, you may not copy, modify, publish, transmit, distribute, perform, display, participate in the transfer or sale, create derivative works, or in any way exploit the content of tvguide.com or any portion thereof.

(TVGuide is the source for the data used on tv-now). That last clause is probably what they'll use to nail you in the US. I doubt you'ld be able to successfully argue that the data's in the Public Domain.

Also, while this would help some, it would really only help in highly-populated areas. In Los Angeles, it would probably reduce the bandwidth. But here in Mississippi, I'ld probably still be the only person pulling listings for Jackson.

Top

whitepines

Post subject:

Posted: Wed Sep 05, 2007 12:46 pm

Joined: Tue Apr 04, 2006 3:47 pm

Posts: 43

OK, thanks for the info. I was hoping I could actually do this.

Top

Page 1 of 1

[ 9 posts ]

Board index Linux MythTV

All times are UTC - 6 hours

Who is online

Users browsing this forum: No registered users and 7 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum