The following sections describe the Verity Spider networking options.
Specifies the value for the agent name field that is part of the HTTP request. Since web servers can be configured to return different versions of the same page depending on the requesting agent, you can use the -agentname option to impersonate a browser client.
Use double quotation marks if the name contains a space. Use the -cmdfile option if the agent name you want to use contains forbidden characters, such as slashes or backslashes.
Syntax: -connections num_connections
Specifies the maximum number of simultaneous socket connections to make to websites for indexing. Each connection implies a separate thread.
Note: The Verity Spider dynamic flow control makes the most use of all available connections when indexing websites. If you are indexing multiple sites, you might want to increase this number. Increasing the number of connections does not always help, because of such dependencies as your network connection and the capabilities of the remote hosts.
Syntax: -delay num_milliseconds
Specifies the minimum time between HTTP requests, in milliseconds. The default value is 0 milliseconds for no delay.
Specifies an HTTP header to add to the spidering request; for example:
-header "Referer: http://www.verity.com/"
Verity Spider sends some predefined headers, such as Accept and User-Agent, by default. Special headers are sometimes necessary to correctly index a site.
For example, earlier versions of Verity Spider did not support the Host header, which is needed for Virtual Host indexing. Also, a Proxy-authentication header was needed to pass a username and password to a proxy server.
In Verity Spider V3.7, the Host header is supported by default, and the -proxyauth option is available for proxy server authentication. Therefore, the -header option is maintained only for backwards compatibility and possible future enhancements.
Note: Misuse of this option causes spider failure. If this happens, rerun the indexing task with modified -header values.
Syntax: -hostcache num_hostnames
Specifies the number of host names to cache to avoid DNS lookups. Without this option, the host cache continues to grow.
Disables round-robin indexing of websites with network flow control.
By default, Verity Spider uses round-robin indexing of websites to avoid overwhelming a web server and to improve indexing performance. Verity Spider connects to each web server in a round-robin manner, using up to the value for the -connections option. This means that one URL is fetched from each web server, in turn.
Note: Using the -noflowctrl option can result in a significant drop in performance.
Syntax: -noproxy name_1 [name_n] ...
Used in conjunction with the -proxy option, the -noproxy option specifies that Verity Spider directly access the hosts whose names match those specified. By default, when you specify the -proxy option, Verity Spider first tries to access every host with the proxy information. To improve performance, use the -noproxy option for the hosts you know can be accessed without a proxy host. For the name variable, you can use the asterisk (*) wildcard for text strings; for example:
'*.verity.com'
You cannot use the question mark (?) wildcard, and the -regexp option does not let you use regular expressions.
On Windows, include double quotation marks around the argument to protect the asterisk special character (*). On UNIX, use single quotation marks. This is only required when you run the indexing job from a command line. Quotation marks are not necessary within a command file (the -cmdfile option).
Note: You must have valid Verity Spider licensing capability to use this option.
Specifies host and port for proxy server.
Note: You must have valid Verity Spider licensing capability to use this option.
See also -proxyauth for proxy servers that require authentication, and -noproxy for hosts that you know are accessible without having to go through a proxy server.
Syntax: -proxyauth login:password
Specifies login information for proxy server connections that require authorization to get outside the firewall. Use this option in conjunction with the -proxy option.
Note: You must have valid Verity Spider licensing capability to use this option. Information Server V3.7 does not support retrieving documents for viewing through secure proxy servers. Do not use the -proxyauth option for indexing documents that are viewed through Information Server V3.7
Specifies the number of times that Verity Spider should attempt to access a URL. Use the -retry option when it is likely that an unstable network connection will give false rejections.
Specifies the time period, in seconds, that Verity Spider should wait before timing out on a network connection and on accessing data. The data access value is automatically twice the value you specify for the network connection timeout.
The default value for the network connection time-out is 30 seconds, and therefore the default value for the data access time-out is 60 seconds.