hello all,
Now my question is - can i apply the code on the part of the board.
In order to get a "Copy" of the board with category 17 and category
3 .... see here
=http://www.nukeforums.com/forums/viewforum.php?f=17
=http://www.nukeforums.com/forums/viewforum.php?f=3
readers from here i look forward to hear from you
Let's talk about perl since this is a perl forum:
#!/usr/bin/perl use strict; use warnings; use LWP::RobotUA; use HTML::LinkExtor; use HTML::TokeParser; use URI::URL; use Data::Dumper; # for show and troubleshooting my $url = "[URL]http://www.nukeforums.com/forums/viewforum.php?f=17[/URL]"; my $ua = LWP::RobotUA->new; my $lp = HTML::LinkExtor->new(\&wanted_links); my @links; get_threads($url); foreach my $page (@links) { # this loops over each link collected from the index my $r = $ua->get($page); if ($r->is_success) { my $stream = HTML::TokeParser->new(\$r->content) or die "Parse error in $page: $!"; # just printing what was collected print Dumper get_thread($stream); # would instead have database insert statement at this point } else { warn $r->status_line; } } sub get_thread { my $p = shift; my ($title, $name, @thread); while (my $tag = $p->get_tag('a','span')) { if (exists $tag->[1]{'class'}) { if ($tag->[0] eq 'span') { if ($tag->[1]{'class'} eq 'name') { $name = $p->get_trimmed_text('/span'); } elsif ($tag->[1]{'class'} eq 'postbody') { my $post = $p->get_trimmed_text('/span'); push @thread, {'name'=>$name, 'post'=>$post}; } } else { if ($tag->[1]{'class'} eq 'maintitle') { $title = $p->get_trimmed_text('/a'); } } } } return {'title'=>$title, 'thread'=>\@thread}; } sub get_threads { my $page = shift; my $r = $ua->request(HTTP::Request->new(GET => $url), sub {$lp->parse($_[0])}); # Expand URLs to absolute ones my $base = $r->base; return [map { $_ = url($_, $base)->abs; } @links]; } sub wanted_links { my($tag, %attr) = @_; return unless exists $attr{'href'}; return if $attr{'href'} !~ /^viewtopic\.php\?t=/; push @links, values %attr; }
If you have the necessary modules installed, and run it from the command line you'll see output such as the following:
$VAR1 = { 'thread' => [ { 'post' => 'Hello, I\'m pretty new to PHPNuke. I\'ve got my site up and running great! I\'m now starting to make modifications, add modules etc. I\'m using the most recent RavenPHP76. I want to display the 5 most recent forum posts at the top of the forum page. I\'m not sure if this functionality is built in, if so, how to activate. Or if there is a module or block made to do this. I looked at Raven\'s Collapsing Forum block but wasn\'t crazy about the format, and I don\'t want it to be collapsable. Thanks! mopho', 'name' => 'mopho' }, { 'post' => 'hi there', 'name' => 'sail' }, { 'post' => 'thanks for asking this; :not very sure if i got you right; Do you want to have a feed of the last forumthreads? guess the easiest way is to go to raven and ask how he did it. hth sail.', 'name' => 'sail' }, { 'post' => 'Thanks. i found what I was looking for. It wasn\'t so easy to find! It\'s called glance_mod. mopho', 'name' => 'mopho' }, { 'post' => 'hi there thx', 'name' => 'sail' }, { 'post' => 'it sound interesting - i will have also a look i google after it - and try to find out more regards sailor', 'name' => 'sail' } ], 'title' => 'Recent Forum Posts Module' };
This is really preliminary. It just grabs the basic text from the threads and doesn't handle the quoted text right yet. I don't think that would be hard to fix. There are many parsing approaches that can be taken in perl, I just don't have more time tonight.
You obviously also have to set up a database to capture information you want to store.
Additionally, I just looped over the first index page, I didn't set up a loop to grab each of the index pages but I consider that trivial.
Continue with perl, or use some other language. There will not be a ready made product to take exactly what you want from the web. You will have to make a little effort no matter what method you use.
, this is a super: this is obviously a great idea that is written here. Now my question is -
can i apply the code on the part of the board. In order to get
a "Copy" of the board with category 17 and category 3 ....
=http://www.nukeforums.com/forums/viewforum.php?f=17
=http://www.nukeforums.com/forums/viewforum.php?f=3
Can this be done with the code written above?!
well i am very happy,
the demonstration is very imressive - and makes me thinking that Perl is very very powerful.
I will try to harvest this category of the Forum (note those both categories are of my
interest nothing more:
=http://www.nukeforums.com/forums/viewforum.php?f=3
=http://www.nukeforums.com/forums/viewforum.php?f=17
i want to discuss a little change here. The minimal change consists of changing
my $url = "[URL]http://www.nukeforums.com/forums/viewforum.php?f=17[/URL]";
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);
my @links;
get_threads($url);
foreach my $page (@links) {
...
}
to
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);
my @links;
foreach my $forum_id (17, 3) {
my $url = "[URL]http://www.nukeforums.com/forums/viewforum.php?f=$forum[/URL]
+_id";
@links = (); # yuck!
my $links = get_threads($url);
foreach my $page (@$links) {
...
}
}
As i want to show, i change the use of the global variable @links.
We're forced to provide and initialize a variable that should be local to get_threads. Here's the fix:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::RobotUA;
use HTML::LinkExtor;
use HTML::TokeParser;
use URI::URL;
use Data::Dumper; # for show and troubleshooting
my $ua = LWP::RobotUA->new();
foreach my $forum_id (17, 3) {
my $url = "[URL]http://www.nukeforums.com/forums/viewforum.php?f=$forum[/URL]
+_id";
my $links = get_threads($url);
foreach my $page (@$links) {
...
}
}
sub get_thread {
...
}
sub get_threads {
my $page = shift;
my @links;
my $lp = HTML::LinkExtor->new(sub {
my($tag, %attr) = @_;
return unless exists $attr{'href'};
return if $attr{'href'} !~ /^viewtopic\.php\?t=/;
push @links, values %attr;
});
my $request = HTTP::Request->new(GET => $url);
my $response = $ua->request($request, sub {$lp->parse($_[0])});
# Expand URLs to absolute ones
my $base = $response->base;
return [ map { url($_, $base)->abs } @links ];
Discussion:
with that changes i am able to run the code agains the full category.
=http://www.nukeforums.com/forums/viewforum.php?f=3
=http://www.nukeforums.com/forums/viewforum.php?f=17
Question - am i able to get the results of the above mentionde forum categories - and can i get the forum threads that are stored in the two above forums.... i love to hear from you. And all the other readers from here] i look forward to hear from you
i really look forward to hear from you
regards
:D