I need to retrieve a HTML file from an external site (http://example.com/page.html) and parse it. Parsing it is fine, however I can't seem to find a way to retrieve the file given this:

  • Cannot use cURL
  • Must work on PHP 4.3.9+
  • Must retrieve the file as a HTTP request
  • Must return file as a string

Can anyone suggest something that meets these requirements?

Edit: Running

echo implode('', file('http://www.example.com/'));

returns this:

Warning: file(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /www/test.php on line 4

Warning: file(http://www.example.com/): failed to open stream: No such file or directory in /www/test.php on line 4

Warning: implode(): Bad arguments. in /www/test.php on line 4

now I get this:

Warning: file_get_contents(): php_network_getaddresses: getaddrinfo failed: Name or service not known in /www/test.php on line 4

Warning: file_get_contents(http://www.example.com/): failed to open stream: No such file or directory in /www/test.php on line 4

you pointed to a file that does not exist, check the filename

Nono, that is eactly what I have in my code: "http://www.example.com"

Here is the complete script:

<?php
error_reporting(E_ALL);
ini_set('display_errors', '1');
ini_set('user_agent',$_SERVER['HTTP_USER_AGENT']); //Didn't fix problem
echo file_get_contents('http://www.example.com/'); ?>
Member Avatar for diafol
echo file_get_contents('http://www.example.com/');

Will that work? Although it may resolve in your browser to the index.htm page or whatever, I don't think that php can guess the filename. I may be wrong.

Use the full path, e.g. 'http://www.example.com/index.html' or whatever it is.

you are reading a file
the documentation requests a file
the function file_get_contents() should suggest a file
the op states
http://www.example.com is not a file, and not even the uri from the original post
a file is http://www.example.com/FILENAME.FILE_EXTENSION

PHP, according to the manual, should correctly resolve and retrieve http://www.example.com with file_get_content(). In an attempt to solve my problem, I tried to follow the example to the letter.

Also, from the op, http://example.com/page.html IS a file URI. Regardless, 99.99% of webservers, including example.com, automatically serve up the index if no file is specified. The other .01% return SOME kind of file.

All this happens internally in the server- meaning PHP can't tell the difference. As far as it's concerned, it requests a file, and one is returned.

So http://www.example.com/ does indeed point to a file.

the op
php 4.3.9
the manual php 5.3.9

A URL can be used as a filename with this function if the fopen wrappers have been enabled.

did you read a different documentation than I did
there is no fopen() in your code, therefore URLs cannot be used
a file is still http://www.example/com/FILENAME.FILE_EXTENSION


ya messed up
fix it

Read more carefully. Wrappers! not the fopen() itself! If you looked, the link points to http://www.php.net/manual/en/filesystem.configuration.php#ini.allow-url-fopen which is an INI file setting. It has NOTHING to do with fopen().

My INI has those turned on. So I CAN use URLs

I'll even quote you the example in the PHP manual.

Example #1

<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>

Look familiar?

its not on your site

your way, does not work
the right way, works

no further discussion required

ya messed up
fix it

[flush]

commented: Not helpful at all. -1

its not on your site

your way, does not work
the right way, works

no further discussion required

ya messed up
fix it

[flush]

Oh that is very helpful.

NO, the right way does NOT work. If you TRIED to solve my problem, you would have come up with this: http://www.codingforums.com/archive/index.php/t-140180.html

Seriously, you were extremely unhelpful after your first post, even condescending. That is NOT the attitude this site endorses.

And, when I prove you wrong, you reply with

"its not on your site

your way, does not work
the right way, works"

Which is just like a lot of answers I receive from some 3rd world tech support hotlines, AKA worthless, uninformed, confusing, unintelligible, and almost entirely unhelpful.

Plus, all the
"ya messed up
fix it"
doesn't do a whole lot for me. No, YOU, my friend, messed up. YOU fix it.

Member Avatar for diafol

You guys should get a room. :)
Anyway, did you try the full path name?

I got this from http://nadeausoftware.com/articles/2007/07/php_tip_how_get_web_page_using_fopen_wrappers :

"The fopen wrappers are a standard feature from PHP 4.0.4 onwards. The wrappers extend the functionality of the file functions, such as fopen(), file(), and file_get_contents(), enabling them to access remote files on a web or FTP server."

Seems to cover file_get_contents as AB says.

I did, but see my bug report link in the last post. Turns out its a bug with the server, so I thehave to contact dotster.... I downloaded XAMPP and tested my code with that (I copied my php.ini from my webserver to xampp so the config was the same) and it worked every which way I tried it. Http://example.com, http://example.com/page.html, ect, all worked.

Also, for some reason, ab said the fopen_wrappers only applied to fopen(). This is not the case. Regardless, the problem is solved.

My way, works.
Almostbob messed up.
Fix it.

:D

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.