Hi guys, I'll try to keep it as short as possible.
Basically, I love the webcomic xkcd. And recently, I've been browsing around for a downloader, and have found one written in Perl that works great. It even grabs the Alt-Text (text that pops up when you hover your mouse over the comic for a second or so), which is a a feature I was desperately looking for.
I thought it was all too good to be true, the program did everything I wanted it to, so well. It would download each comic to it's own folder, and save the alt-text in a text file in the same folder.
It's only after looking at the output do I find the problems:
1) (Main) When working with a comic strip that uses apostrophes, whether they are in the title, or the alt-text doesn't matter, it would not use an apostrophe. Instead of saving a file say, called "It's Hot Today.png" it would save it as "It's Hot Today.png", but instead of an apostrophe, there would be this:
& # 3 9 ;
NOTE: I had to put a space between each character, otherwise the forum would convert it into an apostrophe.
This would occur within the alt-text files as well, if the alt-text contained an apostrophe.
Is it maybe an issue with the operating system (Ubuntu 10.10, 32-Bit)?
Does anyone know what I can change either in my system, or in the script, to rectify this problem?
The second problem:
2) 90% of the comics on www.xkcd.com are in the .png format. But about 10% are not. The problem is, is that the program says to save the file as "x.png"
-even if the actual file is a .jpg
Being the absolute beginner I am, I have no idea how to change this from "save all images to the png extension" to "save all images to the original format it is in".
If anyone could provide any help at all, or even just give their opinion on potential solutions, it would be greatly appreciated.
The full source code of the application is listed below.
Again, I am just starting out in the world of Perl, and have next to no experience with it, nor it's syntax.
Thank you so much in advance for your help,
Liam.
#!/usr/bin/perl
use LWP::Simple;
# use Smart::Comments;
## Objectives ##
# Download all comics from xkcd.com
# Ability to download new comics
# Download ALT text
# Saved in: ~/Desktop
# Set Specifics
$sitePrefix = "http://xkcd.com/";
## Path to main XKCD directory ##
$path = "$ENV{HOME}/Desktop";
mkdir "$path/XKCD", 0755 or print "XKCD Directory Exists\n";
chomp($path = "$path/XKCD");
$d = get($sitePrefix);
if ($d =~ /http:\/\/xkcd.com\/(\d+)\//) {
$current = $1;
}
# Obtains all individual comic data
sub getComicData {
my $siteData = get("$sitePrefix$current/");
my @data = split /\n/, $siteData;
foreach (@data) {
if (/http:\/\/xkcd.com\/(\d+)\//) {
$current = $1;
}
if (/src="(http:\/\/imgs.xkcd.com\/comics\/.+\.\w{3})"/) {
$currentUrl = $1;
if (/alt="(.+?)"/) {
$title = $1;
$title = "House of Pancakes" if $current == 472; # Color title on comic 472 with weird syntax
}
if (/title="(.+?)"/) { #title commonly know as 'alt' text
$alt = $1;
}
}
}
}
chdir "$path" or die "Cannot change directory: $!";
&getComicData();
while ( get("$sitePrefix$current/")){ ### Writing Files $current: $title
print "Writing Files $current: $title\n";
# Create directories for individual comics
mkdir "$current $title", 0755 or die "Previously Downloaded";
chdir "$path/$current $title" or die "Cannot change directory: $!";
# Save image file
$image = get($currentUrl);
open my $IMAGE, '>>', "$title.png"
or die "Cannot create file!";
print $IMAGE $image;
close $IMAGE;
# Save alt text
open my $TXT, '>>', "$title ALT.txt"
or die "Cannot create file!";
print $TXT $alt;
close $TXT;
chdir "$path" or die "Cannot change directory: $!";
$current--;
# Check for non existent 404 comic
$current-- if $current == 404;
&getComicData();
}
# End Gracefully
print "Download Complete\n"