Using a robots.txt to block links

Question

Dani 4,084 The Queen of DaniWeb

19 Years Ago

I would like to maximize the PR on my forum display and thread display pages, without sacrificing PR to less important pages. (for instance, the newthread.php page)

Now suppose there are 5 links on page A. Normally, page A's PR will be spread out to the 5 pages. But now if I added a robots.txt file which blocked indexing of two of the links. Would page A's PR now be spread less thin among 3 pages? Or would it be spread just as thin, but 2 of the pages would be entitled to a share of PR but just wouldn't use it.

I hope this sorta made a bit of sense.

seo

11 Contributors
22 Replies
732 Views
3 Years Discussion Span
Latest Post 16 Years Ago Latest Post by selfhelpebooks

Arizona Web 5 Junior Poster

19 Years Ago

I sent you a PM RE: this topic. :)

Arizona Web 5 Junior Poster

19 Years Ago

Ok I am a knuckelhead. I guess this was a chicken before the egg thing. ;-)

Arizona Web 5 Junior Poster

19 Years Ago

without going too far in, it looks decent to me. I am not sure however, if the robots.txt blocks the weakening link pop from all the links. The less links on a page the more potent the links are. A page with tons of links is spreading the pop thin. That would be a good question to ask SEO-Guy.

Arizona Web 5 Junior Poster

19 Years Ago

It would be very nice if the robots.txt would block the weakening spread of PR. However, even if it doesn't do this, it would still be valuable because it would eliminate spidering duplicate content (i.e. showthread.php?t=10 and thread10.html)

*nods* for sure.

RobUK 0 Light Poster

19 Years Ago

may i aks how you changed it to thread6988.html instead of showthread?

It's done using a technique called url re-writing.

On this server, the page thread6988.htm does not physically exist. Instead, the web server monitors incoming url requests and looks for the word thread in that request..... if so, it grabs the numbers from that and passes it along to showthread.php easy enough.

Hope this helps.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Dani 4,084 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 1 · 2004-06-14T22:48:23+00:00

Dani 4,084 The Queen of DaniWeb

19 Years Ago

And I started this thread RE: your PM :)

Dani 4,084 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 2 · 2004-06-15T01:48:55+00:00

The robots.txt file I'm currently using is in my root directory (not my forum root) and looks like this:

User-agent: googlebot
Disallow: /techtalkforums/announcement.php
Disallow: /techtalkforums/faq.php
Disallow: /techtalkforums/forumdisplay.php
Disallow: /techtalkforums/login.php
Disallow: /techtalkforums/member.php
Disallow: /techtalkforums/newreply.php
Disallow: /techtalkforums/newthread.php
Disallow: /techtalkforums/online.php
Disallow: /techtalkforums/printthread.php
Disallow: /techtalkforums/search.php
Disallow: /techtalkforums/showthread.php

I am disallowing access to showthread.php and forumdisplay.php because I would rather Google only spider the .html mod_rewrite versions of the forums and threads, and therefore not get duplicate content. Was this done correctly? Am I excluding the correct things?

Dani 4,084 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 3 · 2004-06-15T06:49:04+00:00

It would be very nice if the robots.txt would block the weakening spread of PR. However, even if it doesn't do this, it would still be valuable because it would eliminate spidering duplicate content (i.e. showthread.php?t=10 and thread10.html)

AlanM 0 Newbie Poster · Answer 4 · 2004-06-19T03:58:09+00:00

AlanM 0 Newbie Poster

19 Years Ago

I don't see how this will help at all.

Dani 4,084 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 5 · 2004-06-19T13:10:28+00:00

Google frowns upon multiple pages with the same content. For example, if two different URLs have the exact same content on them, google considers it spamming their search engine. This forum uses Apache's mod_rewrite to rewrite URLs to have a .html extension for search engine purposes. Therefore, the webpage showthread.php?t=100 is the exact same thing as thread100.html - if google spiders see this duplicate contact, they will think that daniweb.com is trying to inflate its page count in google by having multiple URLs with the same content. However, by using robots.txt to block google from spidering the showthread.php pages, google only spiders the pages ending in .html - and therefore doesn't penalize us for duplicate content.

P.Jackson 0 Newbie Poster · Answer 6 · 2004-06-29T02:00:07+00:00

may i aks how you changed it to thread6988.html instead of showthread?

Matzefn1 0 Newbie Poster · Answer 7 · 2005-07-06T14:31:20+00:00

Hi csgal,
can you post the new robots.txt here, please?

Thank you.

jewboy 0 Posting Whiz in Training · Answer 8 · 2005-07-08T00:14:57+00:00

Would it make sense to use the robots no follow tag in your particular case?

Matzefn1 0 Newbie Poster · Answer 9 · 2005-07-08T16:01:37+00:00

Yes, i need the robots.txt from daniweb for my vbulletin forum: http://www.schachfeld.de/

Where can I find the robots.txt?

Matzefn1 0 Newbie Poster · Answer 10 · 2005-07-09T01:41:53+00:00

Can you send me the robots.txt to Matzefn1@web.de?
Thank you very much.

Matzefn1

Matzefn1 0 Newbie Poster · Answer 11 · 2005-07-12T02:50:42+00:00

Matzefn1 0 Newbie Poster

18 Years Ago

Can I have the robots.txt, please.

Dani 4,084 The Queen of DaniWeb Administrator Featured Poster Premium Member · Answer 12 · 2005-07-12T21:18:32+00:00

Post #5 shows the robots.txt file that I used to use. I no longer use a robots.txt file.

Matzefn1 0 Newbie Poster · Answer 13 · 2005-07-13T00:23:48+00:00

Post #5 shows the robots.txt file that I used to use. I no longer use a robots.txt file.

Why do you no longer use a robots.txt? Google frowns upon multiple pages with the same content...!?

My robots.txt file: http://www.schachfeld.de/robots.txt

dphammer 0 Newbie Poster · Answer 14 · 2005-07-22T06:43:13+00:00

We had a problem where pages that had a no-crawl code at the root directory still were being crawled (they were PDFs that had valuable IP in them).

We discovered that the bots were getting in through links on other pages of ours (the PDFs are "samples" of products that we use as marketing tools), so we put "no follow" codes -- <meta name="robots" content="index,nofollow" /> -- on those pages.

This let's the spider index the page but not follow the links on the page.

But if someone includes a link to the non-HTML thread in a page that you don't control, do you think it will bypass your html rewrite?

turbopidar 0 Newbie Poster · Answer 15 · 2008-05-01T03:24:53+00:00

My favorite megaupload files search engine is http://megauploadfiles.com
it’s the most powerful and easy to use.

snapshot 0 Junior Poster · Answer 16 · 2008-05-04T02:47:30+00:00

Sometimes, even after adding the no index tag to the pages, it will take some weeks before search engines know exactly what you mean.

selfhelpebooks -6 Posting Pro · Answer 17 · 2008-05-05T02:46:51+00:00

Aren't robot files bad for tracking purposes, like a no follow? I guess I'm new to this kind of discussion..lol.