John's Blog: Controlling Search Engines
September 14, 2017
A good question came in this morning...
I just noticed that search engines were able to find our ftp site. I know there are ways to block them, but can you describe the particulars for controlling crawlers on a Rumpus server?
In general, the mechanism used to control crawler access on any Web service is by creating a "robots.txt" file, which provides instructions to search engines. Adding the file is easy, but before you do, there are three important things to note:
- Assuming you don't have anonymous access enabled and "Always Prompt For Login" disabled, search engines won't be able to go past the login page. With or without the "robots.txt" file, they won't be able to see or index any actual content on the server.
- Well-behaved robots, like reputable search engines, will pay attention to the "robots.txt" file, but it's really only a suggestion for bots. There is no way to enforce its use by site crawlers.
- Search engines most likely won't come around to your server very often, so don't expect an immediate change in search engine results. The "robots.txt" file will only impact search results after your site is scanned next by the search engine.
With all that said, creating a simple file to tell robots to ignore the site is pretty simple. Google "robots.txt file" for complete details, but here is a file that will generally tell crawlers to ignore all content on the site:
In Rumpus, open the Web Settings window and click "Open WFM Templates Folder". Extract and then drop that file into that folder and you should be all set.
The file I've provided is as simple as they come, but you can get more sophisticated if you like. For example, if you want the "anonymous" areas of your site to be scanned and searchable, that's entirely possible. It does depend on your content, though, so you'll need to customize the file accordingly. As always, send e-mail to support@maxum.com if you need additional help.

