|
|
||||
![]() |
![]() |
|||
![]() |
![]() |
||
| ||||||||||||||
| ||||||||||||||
Resources Home About InternetWeek.com Contact Us E-Mail Newsletter Tech Library TechCareers Privacy Statement Resource Centers Virtual Private Networks (VPNs) TechWeb Sites InformationWeek InternetWeek Network Computing Financial Technology Network Bank Systems & Technology Insurance & Technology Wall Street & Technology Technology & Learning Optimize Magazine The Open Enterprise Ad Info |
||||||||||||||
|
AltaVista's Search Engine indexes an enterprise's shared data resourcesWeb pages, word processing documents, spreadsheets, PDF files, databases--a modern enterprise has information up the wazoo. That's good news, because if you're looking for information, chances are that it's on one of your servers. The bad news is that it's really hard to find the documents that contain it. That's exactly the problem Internet search engines have been designed to address: They index accessible data files residing on multiple public Web servers, storing the results in a database. Users can query that database against full-text keywords to find just the information they're looking for. More bad news: Although that's wonderful for using the public Internet, those big commercial search engines won't help you locate private enterprise data. AltaVista Co.'s AltaVista Search Engine 3.0 (AVSE) is a version of the public search-engine software designed to be installed within an organization. AVSE indexes not only data on intranet and even Internet Web pages, but also data files and databases on file servers. We evaluated AVSE on our LAN and were impressed with the depth of both its indexing and query functions. It's hard to imagine a company that wouldn't benefit from setting up its own in-house AltaVista server. Anatomy Of A Search Engine We evaluated the Windows NT 4 version of the AltaVista Search Engine. The company also offers AVSE for Red Hat Linux, Solaris and Tru64 Unix, but according to AltaVista, the Windows NT version is the most popular with midsize businesses. As of mid-June, there is no version for Windows 2000 Server. AVSE consists of two separate parts integrated into a single application. One is an indexing engine, which periodically scours explicitly defined data sources, reading new files and indexing their content into a database. The other is an HTTP/HTML server, which presents a browser-based user interface for accepting user queries. The Web-based user interface may be customized using tools provided with AVSE; it's services may also be incorporated into other corporate Web sites using a so-called Search Developer's Kit that may be downloaded from the company's Web site. For this evaluation, we started with AltaVista's supplied Web site and customized it, but did not use the SDK. The Windows NT version installs as a single service; the company suggests that an entire server be dedicated to running AVSE. Suggested hardware requirements are quite low, essentially that for Windows NT 4 Server itself: a 300-MHz processor, 256-MB RAM and 5-GB disk space for the index. The documentation suggested at least 1 GB of virtual memory. We gave the server a beefier platform: a Compaq ProLiant DL360 with dual 800-MHz Pentium III processors, 512-MB RAM and 9GB of drive space. The server was running Windows NT 4 Server with Service Pack 6A, but did not have the option pack with Internet Information Explorer installed. Pick A Source, Any Source Once installed, the first task is to configure the indexing services. Those are created and changed via a Web-based interface, which AVSE calls its management page. It, like the query interface, is accessed through a browser to a specified TCP/IP port; the default is port 9000, specified in a configuration file. The documentation advises against altering this port assignment once AVSE has been run for the first time. AltaVista gets kudos for developing an exceptionally intuitive and efficient management facility. The management page is based on a Java applet, which we had no difficulty using on Internet Explorer 5 on Windows 2000 Professional or Netscape Navigator 4.08 on a Windows 98 workstation. (The applet would not load properly on Internet Explorer 4.5 on an iMac running MacOS 8.6 and crashed the browser.) From the management interface, you can configure one or more indices, each of which is maintained separately. We chose to build only a single comprehensive index for our evaluation, but it's easy to imagine situations where targeted indices would make sense. Within each index, there may be one or more "collectors," separate tasks that gather information from different types of information sources. There are three types of collectors: for Web pages, shared disk files and relational databases accessed via Java Database Connectivity (JDBC). Each collector may access multiple data sources; for example, the collector we created for Web pages was told to search both our intranet server, located on our LAN, and our public Web server, hosted on an ISP. Once those URLs were configured, we just pushed the "start" button, and the collector went off to do its bidding. It took only a few minutes to index the Web server located on our Fast Ethernet LAN, and about two hours to index the public Web server, which it accessed via cable modem (400 Kbps down, 100 Kbps up). Unfortunately, the default is to perform the index operation only a single time. Most administrators will want to index automatically, and a "collector schedule" tab lets you define the hours when indexing should be performed. We like that AVSE provides the ability to define its collector's impact on the Web server's performance by specifying the number of TCP sessions it will initiate with the Web server. One major difference, by the way, between AVSE's Web collectors and a Web server's own indexing engine is that AVSE has to "crawl" through a Web site. By default it will only pick up pages and data files that are accessible through the site's home page, though it is possible to manually provide additional entry points. Also, AVSE will not index pages that are on the Web site but aren't linked in. By comparison, a Web server's own indexer can find anything that's physically on its hard disk. Access Control We were equally impressed with AVSE's ability to search network hard disks and index their content, although we wish that it was easier to set up. AVSE's purely Web-based documentation is poorly organized, and lacks clear, coherent examples and explanations. This was particularly true when configuring the collector to access file servers. The search engine service itself must be given full administrative privileges to each server that you wish to index, including the ability to take ownership of documents. The way this is implemented is to either set up trust relationships between all of the servers and their NT domains or to create a common administrative name and password on all of the servers-- only a single user name and password may be associated with the AVSE service, specified as the Windows NT services "log in as" parameters for the service. To us, giving the search engine such broad administrative privileges could create an unnecessary security risk; creating a read-only account on the file servers would seem to make more sense. The AVSE documentation, at least for the product version we examined, did not address the issue of indexing files on non-Windows NT servers. Once we worked out (through trial and error) the issues involved in giving AVSE access to our file servers, the service took two hours and 37 minutes to index about 9 GB of data; it scanned 15,397 files and selected 3,863 files for indexing. Those included Word and Excel documents, dBase-format database files, PDF files, PowerPoint files and more. Pick And Choose Now we were ready to search. By no surprise, the Web-based user interface looks and feels like AltaVista's public Web search engine at www.altavista.com, only without the advertising. All the same tricks and techniques worked, and response time was too short to measure, even on complex searches with multiple Boolean terms. Click on the link, and the results either download to the workstation's hard disk or open up in the browser. There's not much we can say about that, other than it works! The query agent is straight HTML and had no compatibility problems on any of our client platforms. One feature we enjoyed playing with was AVSE's multiple-language support. Not only can it identify documents in different languages, but it can present parts of the query user interface in dozens of other languages, including those with non-Roman character sets. Field names are translated, but the "Search" button's text is not, and nor is AltaVista's handy search tips, like "Use the at sign to find grammatical variations: see@ matches see, sees, seeing and saw." For some languages like French and Greek, the end-user help documentation is in the native language; for others, like Hebrew and Swedish, it is in English. Perfectly functional though it is, most businesses likely would wish to customize the appearance of the search screen. The most comprehensive way is to integrate AVSE's query engine into another Web site using its Search Developer's Kit. A simpler way is to modify a text file that specifies items like the search-window name, corporate logo, background colors and other attributes. The AVSE documentation was also misleading in describing how to edit the file and where to save it after it was edited, resulting in more trial-and-error experimentation before our changes would stick. What You're Looking For Despite those few weaknesses, the underlying technology is sound; we can say unequivocally that AVSE is an excellent product. Its value, however, is harder to assess. AltaVista quoted us a starting price of $1,495 for an index for 3,000 documents. Note that our test, indexing a single file server, yielded more than that number; that single file server plus three Web sites was 4,180 documents. When we added more servers as well as other sites that we manage, the total swiftly rose to more than 7,000 documents. When pressed about pricing, the company simply wouldn't talk, other than to say that commercial applications (where the search engine is made visible to the public) cost more than intranet applications. Beyond that, the spokeswoman said, "Our pricing schedule is really just a guide for the sales team, but we have the freedom to negotiate with customers depending on their needs." So be prepared to dicker. Alan Zeichick is principal analyst with Camden Associates, and is a contributing editor to InternetWeek. He can be reached at zeichick@camdenassociates.com. |
Let our Solution Center help you find the network products you need. Then, receive customized proposals from qualified suppliers -- fast! MORE Looking for technical information, white papers and analyst reports on CRM, wireless, enterprise networking, and more? Don't miss Tech Library's collection of 14,000+ white papers. Featured White Paper: Supply Chain Management: Why B2B eMarkets Are Here to Stay -- Accenture |
||
| Home | Breaking News | Supply Chain | Web Development | |
| Security | IT Services | All Stories | Sitemap | |
| Media Kit | Copyright © 2010 | CMP Media LLC | Privacy Statement | Feedback |