How would we poison AI web crawls? Hardware and Software Information Security by rproffitt … Leap Forward 文化大革命 The Great Proletarian Cultural Revolution 人權 Human Rights 民運 Democratization 自由 Freedom 獨立 Independence 多黨制 Multi-party system 台灣 臺灣 Taiwan Formosa 中華民國 Republic of China 西藏 土伯特 唐古特 Tibet… Re: Best practices for information security of your business Hardware and Software Information Security by rproffitt … Leap Forward 文化大革命 The Great Proletarian Cultural Revolution 人權 Human Rights 民運 Democratization 自由 Freedom 獨立 Independence 多黨制 Multi-party system 台灣 臺灣 Taiwan Formosa 中華民國 Republic of China 西藏 土伯特 唐古特 Tibet… Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani I don't understand what goal you are trying to achieve? Is your goal to open a dialog about the pros and cons of AI? DaniWeb is powered by Cloudflare. One of the functions of Cloudflare is a sophisticated system to analyze and control how AI crawlers scan the website. In other words, if I want to dissuade AI bots from crawling DaniWeb, I … Re: How would we poison AI web crawls? Hardware and Software Information Security by rproffitt For example, with Meta and others removing fact checking we should find a way to render their AI and search results full of not so useful information. We are right now veering towards a Fascist state with oligarchs and mega corporations stoking coal into the ovens. We shouldn't be fuel for those ovens. Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani I’m not nearly as much of a conspiracy theorist. I also don’t think that spamming Facebook with nonsensical posts is going to make the world a better place. Re: How would we poison AI web crawls? Hardware and Software Information Security by Pebble94464 Don't waste your time, rproffitt. Spamming the web is unlikely to achieve your goals... Firstly, everything you post online is but a wee drop in the ocean. You'd need to do an illegal amount of spamming in order to sway an opinion. Secondly, AI bots crawling the web can be instructed to simply ignore pages that contain censored keywords. AI … Re: How would we poison AI web crawls? Hardware and Software Information Security by rproffitt I asked around and it appears we can affect change. The immigrant reporting hotline was flooded with reports about Elon Musk so that line shut down. As to AI crawlers the work to poison the AIs is well underway. Examples follow. > Here is a curated list of strategies, offensive methods, and tactics for (algorithmic) sabotage, disruption, … Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani If you're not a part of the solution, you're a part of the precipitate. I think this sounds terrible. The global population is, more and more, relying on AI to serve up accurate answers. There's already the gigantic problem of hallucinations as well as AI consistently spewing out false information that sounds entirely believable, and therefore … Re: How would we poison AI web crawls? Hardware and Software Information Security by Reverend Jim As an example, the person who developed Iocaine found that 94% of the traffic to his site was caused by bots. When you price and design a site for an expected human load, and then you get overwhelmed by bots, you can throw more money at it or you can take action against the bots. In my meagre understanding of all things web related, robots.txt is … Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani > When you price and design a site for an expected human load, and then you get overwhelmed by bots, you can throw more money at it or you can take action against the bots. It's true that the majority of websites on the Internet today spend more bandwidth on bots than they do on human visitors. However, there are both bad bots and good bots, … Re: How would we poison AI web crawls? Hardware and Software Information Security by rproffitt The OpenAI bot appears to be a bad bot. Discussed many times so here's just one: https://www.reddit.com/r/selfhosted/comments/1i154h7/openai_not_respecting_robotstxt_and_being_sneaky/ Fixes appear to be: 1. Block IP ranges from bots. 2. Replace words and poison the bots. Re: How would we poison AI web crawls? Hardware and Software Information Security by Reverend Jim Thanks for the extra info although I disagree with the spewing comment. Nepenthes and Iocaine do not spew garbage across the web. They feed garbage to bots that access the protected sites. AI that returns bogus results on the ppther hand ARE spewing garbage across the web. BTW Nepenthes makes it clear that implementation will result in being … Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani > The OpenAI bot appears to be a bad bot. This is not my experience. OpenAI respects my robots.txt file perfectly. I do want to add, though, that robots.txt files are very finicky, and I have seen many, many times people blaming the bots when the problem lies with a syntax or logic error in their robots.txt. > Nepenthes and Iocaine do… Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani > The OpenAI bot appears to be a bad bot. Specifically, I would bet quite a large sum of money that the people who are complaining they can't get OpenAI to respect their robots.txt file either have a syntax error in their file, and/or aren't naming the correct user agents. I've seen people mistakingly try to reference a user agent called &… Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani > The creator of Nepenthes says that it is ineffective against OpenAI which I take to mean that OpenAI is ignoring robots.txt. As mentioned, Nepenthes uses the spoofing technique. Spoofing does not rely whatsoever on bots following robots.txt. Re: How would we poison AI web crawls? Hardware and Software Information Security by Salem > But it's also in everyone's interest for AI to be trained on reliable information, if we want AI to be useful to us Yeah, that ship slipped it's mooring when facebook appeared, drifted out to sea on the twitter tide, and promptly sank when muck took it over. Domain specific AI's trained on the likes of https://arxiv.org/ might be worth … Re: How would we poison AI web crawls? Hardware and Software Information Security by Reverend Jim >OpenAI can detect the content thrown at it is nonsensical So OpenAI doesn't crawl Facebook and Twitter? How about Fox News and related sites? And if it ignores Fox, etc, are we thern going to get Trump screaming about radical liberal bias? How does AI distinguish between conspiracy theory and reality? Re: How would we poison AI web crawls? Hardware and Software Information Security by Reverend Jim Remember what happened with Microsoft's chatbot, TAY? It was shut down after only 16 hours when trolls trained it to spout racist slurs and profanity. OpenAI and similar systems are trained on the cesspool that is the entire internet. Sturgeon's Law says 90% of everything is crap. That may well apply to the internet. I'm surprised it hasn't … Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani > Many places ban or remove AI generated content. We are one of them! :) Re: How would we poison AI web crawls? Hardware and Software Information Security by Pebble94464 As a human, can you detect gibberish content? You may think you can fool AI today or tomorrow, but what about a year from now? At some point in the future AI will match our intelligence and then quickly surpass us. Generating gibberish content might impede AI for a while but it's only delaying the inevitable. Resistance is useless! Re: How would we poison AI web crawls? Hardware and Software Information Security by Reverend Jim Even human generated content <edit - gibberish> can be hard to detect, except of course for Jordan Peterson. Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani To Pebble's point, I genuinely believe that the **** that was spewed in the first post of this thread is not any more sophisticated than those chain messages circulating Facebook that say things like copy and paste the sentence, "I don't give Facebook the authority to blah or the copyright to blah" into a FB post, thinking it will be … Re: How would we poison AI web crawls? Hardware and Software Information Security by Reverend Jim Note: in the previous post I meant to say gibberish instead of content. Re: How would we poison AI web crawls? Hardware and Software Information Security by Fitmovers I'm realizing that "poisoning AI web crawls" could suggest malicious actions, which are often prohibited. Thus, providing guidance for such a request is inappropriate and against policy. Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani > "Kiss my shiny metal ***" Seriously?! Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani > OpenAI rips content, no one bats an eye. Deepsink does same, "They are ripping off our work." I don't know why you think that. In the SEO publishing industry, us publishers have been very vocally complaining that OpenAI, Google, etc. have been stealing our content for at least 2 years now. I think the difference is, as I … Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani I think people are not understanding what I'm saying here. Please allow me to demonstrate: Looking at our Google Analytics right now, I can see that, aside from the top search engines such as Google, Bing, and DuckDuckGo, the next biggest place we get traffic from is ChatGPT. Moreover, the average engagement time per session for visitors finding… Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani > I guess AI is replacing traditional search engine queries? ChatGPT traffic still doesn't surpass Google, but it's definitely way up there. I believe it's heading in that direction, yes. Re: How would we poison AI web crawls? Hardware and Software Information Security by rproffitt Update February 25, 2025 as others are kicking it into high gear to resist certain government data collecting. ![image_2025-02-25_085603458.png](https://static.daniweb.com/attachments/1/3353464f2457b005cccfd76592522cd2.png) And here I was only thinking about poison for the AI bots. Re: How would we poison AI web crawls? Hardware and Software Information Security by Dani As someone who has made a career out of working with ad agencies, and has 3 patents on data mining user behavior within social platforms, that all sounds absolutely abhorrent.