Denial of service (DoS) attacks work by swamping your Web server with a great number of simultaneous requests, slowing down the server or preventing access altogether. DoS attacks are difficult to prevent in general, and usually the most effective way to address them is at the network or operating system level. One example would be to block specific addresses from making requests to the server; although you can block addresses at the Web server level, it is more efficient to block them at the network firewall/router or with the operating system network filters.
Other kinds of abuse include posting extremely big requests or opening a great number of simultaneous connections. You can limit the size of requests and timeouts to minimize the effect of attacks. The default request timeout is 300 seconds, but you can change it with the TimeOut directive. A number of directives enable you to control the size of the request body and headers: LimitRequestBody, LimitRequestFields, LimitRequestFieldSize, LimitRequestLine, and LimitXMLRequestBody.
Robots, Web spiders, and Web crawlers are names that describe a category of programs that access pages in your Web site, recursively following your site's links. Web search engines use these programs to scan the Internet for Web servers, download their content, and index it. Real-life users use these types of programs to download an entire Web site or portion of a Web site for later offline browsing. Normally, these programs are well behaved, but sometimes they can be very aggressive and swamp your Web site with too many simultaneous connections or become caught in cyclic loops.
Well-behaved spiders will request a special file, called robots.txt, that contains instructions about how to access your Web site and which parts of the Web site won't be available to them. The syntax for the file can be found at http://www.robotstxt.org/. By placing a properly formatted robots.txt file in your Web server document root, you can control spider activity. Additionally, you can stop the requests at the router or operating system levels.