2121212121 2121212121 2121212121 2121212121 2121 2121 2121 2121212121 2121 2121 2121 212121212121 21212121 2121212121 21212121 212121212121 21212121 2121 2121 2121 21212121 2121 2121 2121 2121 2121 2121 2121 2121 2121 2121 2121 2121212121 2121 2121212121 2121212121 2121 2121 2121212121 2121 2121212121 2121212121 2121 2121 No 21 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% BIO-NAUT NEWSLETTER 25-9-91 << EDITED BY HEIKKI LEHVASLAIHO & ROBERT HARPER >> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% (Note from ED: Harper is still off in a huff refusing to write anymore BioBits so Heikki Lehvaslaiho has steped into his shoes for this edition of BioBit. I have only made minor cosmetic changes to the text so it conforms to the basic impudent style of former BioBit's... other than that it was Heikki who set the keyboard on fire to get this edition out to the uncouth biological masses. RGDS ED.) We have Biosci newsgroups, Listserv, anonymous FTP, ARCHIE, GenBank and EMBL online services, shouldn't we be happy with all these different ways of accessing information? Well there is yet another way of obtaining information related to the Bio-sciences and that is by using a programme called WAIS. WAIS is an acronyom for Wide Area Information Server, and it is a set of programmes for accessing information contained in special databases called "sources". According to Thinking Machines who have made the software, WAIS should be pronounced "ways"... I suppose you should think of the programme as the "ways and means" of retrieving information. Personally I would prefer the Finnish pronounciation and call WAIS "wise" since it is rather intelligent. WAIS... WHAT IS IT? WAIS (Wide Area Information Servers) is a set of products supplied by different vendors to help end-users find and retrieve information that is stored as "sources" on computer networks. Thinking Machines, Apple Computer, and Dow Jones initially implemented such a system for use by business executives. It is now available for anyone in the net. WAIS provides a Graphics User Interface (GUI), Wais station for networked Macs and xwais running under Xwindows on Suns. Transparency is the key - you don't need to know where the source is, who has made it, or even what machine it's running on. HOW DOES IT WORK... THE METAPHOR The metaphor behind this is the client/library scenario: A client approaches a librarian with some keywords and a reference or two and askes for more information along these lines. The librarian scuttles off an after some time returns with a big printout of things that might interest the client. The client looks over the listing and then asks for photocopies of certain articles. The client then then finds more interesting references and asks the librarian to retrieve them, and so the process goes on untill the client is satisfied with all the information that the librarian has provided him with. HOW DOES IT WORK... THE NUTS AND BOLTS. The WAIS system is composed of three separate parts: clients, servers, and the protocol which connects them. The client is the user interface, the server does the indexing and retrieval of documents, and the protocol is used to transmit the queries and responses back and forth between the client and the server. The client and server are isolated from each other through the protocol. Any client which is capable of translating a users request into the standard protocol can be used in the system. Likewise, any server capable of answering a request encoded in the protocol can be used. In order to promote the development of both clients and servers, the protocol specification is public. This means that a client running on a Mac can talk to a server runing on a Sun or a Connection Machine. On the client side, questions are formulated as plain English language questions. The client then translates the query into the WAIS protocol, and transmits it over the network to a server where sources are kept. (source is a WAIS term for a database, and is not to be confused with source code) The server receives the transmission, translates the received packet into its own query language, and searches for documents in the specified source which satisfies the query posed by the client. A list of relevant documents are then encoded in the protocol, and transmitted back to the client. The client decodes the response, and displays the results. The client selects an interesting document which can then be retrieved from the server. It is more simple than it sounds. THE KEY CONCEPTS OF WAIS: * ease of use: No cryptic commands needed, just plain English. * remote accessibility: Network connections are managed without user interaction. * public protocol: Anyone can publish a source (a database in WAIS parlance) * separate client and server: Features supported by server/source can be ignored or utilized by the client depending of its capabilities. Several sources can be searched simultaneously. * dynamic folder: Searches can be saved on local disk for future use and updated later on request or automatically. * relevance feedback: The user can select a document or part of it and rerun the search asking for "more-like-this-one". * directory of servers: servers keep information of services. A search result can be a pointer to another source. * multimedia: Multiple file formats are possible. Currently plain text and Macintosh PICTs are supported. THIS SOUNDS FINE. WHAT'S THE CATCH? Everything really sounds terrific. Numerous end-users get easy access to vast databases using a GUI that they are already familiar with. Data no longer needs to be held at some all-encompassing central resource, but instead can be distributed between different nodes To us the catch seems to be that this is after all a concept promoted by commercial interests. In their current forms the implementations are free and public domain, but how long? To fully utilize the potential of WAIS, for example in multimedia, considerable amount of work must be invested (and have already been invested). These developments will not come free. There is no free lunch... A citation below illustrates the point: Currently the search engines that are available for different machines vary in one important feature. UNIX and Mac search engines that are in public domain simply index a file and search for keywords. The Connection Machine Text Retrieval software is a commercial package that is quite sophisticated. In particular it provides Relevance Feedback. By way of a simplistic example of the value of Relevance Feedback consider searching a text database for the keyword HIV. You would miss those articles that referred to AIDS but not HIV. With relevance feedback the user has the option of examining the articles found by the 'HIV' search, marking some of these as being "What I'm interested in" and then asking for more articles "like the ones I've marked". The software examines the full text of the marked articles and extracts common terms including,one might expect, the term 'AIDS'. This new set of terms is used to rescreen the database and any new matches are presented to the user. As biology abounds with nomenclature issues like this I feel that this approach has a lot of value for our community. WHAT DO I NEED TO START USING WAIS Currently, there are free client programs available for Mac, X Windows and a GNU Emacs interface from quake.think.com by anonymous FTP. Several other vendors are developing interfaces: NFS, NeXT, Pandora, Sphere Telecommunications. The main ftp site for software and documentation files is the FTP server at Thinking Machine Corporation. You can transfer them from quake.think.com to your local computer by FTPing to "quake.think.com". Login as "anonymous" with password as your username or simply "guest". Move to wais directory by command "cd pub/wais". "ls " shows the list of available files: ftp> ls -l -rw-rw-rw- 1 999 50 Aug 12 23:37 .places.wmd -rw-rw-rw- 1 999 75517 Sep 4 20:18 TOIS-WAIS.sit.hqx drwxrwxrwx 2 14 512 Sep 6 19:29 UNC -rw-rw-rw- 1 14 1070714 Mar 23 01:24 WAIStation-0-61.sit.hqx -rw-rw-rw- 1 1637 1094536 Mar 28 00:37 WAIStation-0-62-Sources.sit.hqx -rw-rw-rw- 1 1637 635225 May 16 03:01 WAIStation-0-62.sit.hqx -rw-rw-rw- 1 1637 635857 Jun 21 21:50 WAIStation-Canned-Demo.sit.hqx -rw-rw-rw- 1 999 623 Aug 6 01:44 WAIStation-README drwxrwxrwx 2 14 1024 Sep 16 23:03 doc -r--r--r-- 1 14 463981 Jun 13 20:44 wais-8-b1.tar.Z -rw-rw-r-- 1 14 487029 Jul 29 06:31 wais-8-b2.tar.Z drwxrwxrwx 2 14 1536 Sep 17 02:37 wais-discussion ftp> cd doc ftp> ls -l -rw-rw-rw- 1 999 50 Sep 9 18:24 .places.wmd -rw-rw-rw- 1 14 1124 Jul 23 00:57 Makefile -rw-rw-rw- 1 14 1220 Jul 23 00:57 README -rw-rw-rw- 1 14 59571 Sep 16 23:03 cm-servers.ps -rw-r--r-- 1 997 8438192 Sep 6 21:28 core -rw-rw-rw- 1 14 10588 Jul 23 00:57 doc-ids.txt -rw-rw-rw- 1 14 1417 Jul 23 00:57 helpful-scripts.txt -rw-rw-rw- 1 14 3234 Aug 2 22:21 overview.txt -rw-rw-rw- 1 14 37911 Jul 23 00:57 protspec.txt -rw-rw-rw- 1 14 4453 Jul 23 00:57 question.txt -rw-rw-rw- 1 14 8317 Jul 23 00:57 source.txt -rw-rw-rw- 1 14 63756 Jul 23 00:57 wais-concepts.txt -rw-rw-rw- 1 14 21140 Jul 23 00:57 wais-corp.txt -rw-rw-rw- 1 14 4700 Jul 23 00:57 wais.el.txt -rw-rw-rw- 1 14 2918 Jul 23 00:57 waisindex.txt -rw-rw-rw- 1 14 31163 Jul 23 00:57 waisprot.txt -rw-rw-rw- 1 14 4849 Jul 23 00:57 waisq.txt -rw-rw-rw- 1 14 1918 Jul 23 00:57 waissearch.txt -rw-rw-rw- 1 14 3201 Jul 23 00:57 waisserver.txt -rw-rw-rw- 1 14 18344 Jul 23 00:57 waistation_users_guide.txt -rw-rw-rw- 1 14 2847 Jul 23 00:57 xwais.txt -rw-rw-rw- 1 14 3767 Jul 23 00:57 xwaisq.txt -rw-rw-rw- 1 14 126644 Aug 9 18:56 z3950-spec.txt WAIS AND THE MAC. A Mac should be running MacTCP over Localtalk or Ethernet. The WAIS protocol - and WAIStation - support X.25 communications and modems. A scripting language related to White Knight language comes builtin with the WAIStation. The WAIStation-Canned-Demo.sit.hqx is also a very useful "movie" which covers most of the basics concerning the use of WAIStation on the MAC. WAIS AND THE SUN. If you are a UNIX user you can pick your choice of user interface. If you are lucky, you are able to run any of the X Window server applications, in which case you use 'xwais'. The crudest possible interface is offered by 'waissearch'. If you give the following command: waissearch -h genbank.bio.net -d biosci -p 210 Ecuador You will then be given a list of messages entered into the Bionet newsgroups over the past 3 years that contain the word Ecuador in them. This waissearch programme is not particularly "intuative" and is not what your typical end-user would want. However in some cases it might come in useful. For example, by giving the command waissearch -h nic.funet.fi -d INFO -p 210 bionic you will receive a list of the WAIS "bionic" sources on nic.funet.fi If we ever put pictures up on NIC then I guess those sources will be called PICNIC:-) WAIS AND DOS. There isn't yet a publicly distributed version of WAIS client, but Jim Fullton Jim has been working on porting WAIS to DOS. What will come out supposed to be something that will work with TCP/IP (but not with windows for some reason). WAIS AND VAX/VMS. >From: Jim Fullton >Subject: WAIS for VMS > >We now have a functional VMS server operating. Feel free to announce >it. Next step is to put a VTX search engine in. Think about how >*that* would impact the universities! HOW TO USE WAIS There isn't much sense trying to demonstrate WAIS using text based forum. To see it in all its GUI glory (wif ol ze vlashing lites) you really need a Mac. By far the easiest way to learn WAIS is to run the demonstration available from quake.think.com by anonymous FTP. Get the file /pub/wais/WAIStation-Canned-Demo.sit.hqx, decode binhexing and unstuff. A short, self running Macintosh application with animation and digitized voice gives a good introduction to WAIS concepts and interface. This demo is equally useful for those using xwais, since the interfaces are almost identical. Another way to sniff the air, is to telnet to hub.nnsc.nsf.net and login as 'wais'. No password is needed and you are presented a menu of choices to get started. PREPARING YOUR OWN SOURCES It is not a difficult task to prepare your own sources. What you need is big chucks of text and the disk space to store them. Databases like LIMB or ENZYME can easily be adapted so that they can be processed by the programme waisindex to produce a source which can be queried by a client. LIMB and ENZYME I processed with the "para" option and ENZCLASS could be processed with the "one-line" option. As you can see there are loads of different possibilities for indexing. Sources like Biobit and Biodocs are just a conglomeration of different sized files were just indexed with the default value text. By giving the command Waisindex by itself you get a listing of all the different options. sun4 /home/csc/harper 39> waisindex Usage: waisindex [-d index_filename] [-a] /* adding to an existing index, otherwise it erases the index*/ [-r] /* recursively index subdirectories */ [-mem mbytes] /* number of megabytes to run this in */ [-register] /* registers the database with the directory of servers. This should be done with care. */ [-export] /* uses short dbname and port 210 */ [-v] /* print the version of the software */ [-t] /* format of the file. if none then each file is a document*/ | text /* simple text files, this is the default */ | groliers /* groliers encyclopedia format */ | mail /* unix mail and netnews format */ | rmail /* gnu rmail */ | mail_or_rmail /* mail or rmail or both */ | mail_digest /* standard internet mail digest format */ | netnews /* netnews format */ | catalog /* Thinking Machines library catalog */ | bio /* biology abstract format */ | cmapp /* CM applications from Hypercard */ | pict /* pict files, only indexes the filename */ | gif /* gif files, only indexes the filename */ | jargon /* the jargon file (the hackers dictionary) */ | irg /* internet resource guide */ | dash /* entries separated by a row of dashes */ | one_line /* each line is a document */ | para /* paragraphs separated by blank lines */ ] filename filename ... WHAT WAIS SERVICES EXIST? Below is a list of services offered from several sites. The "directory of servers" facility is operated by Thinking Machines so that new servers can be easily registered and users can find out about new sources available on the network. Note that Connection Machine WAIS server is available only between 9AM and 9PM EST. Most servers have a INFO.scr which lists the available sources. Note that from some sites the connection to remote servers is formed more reliably when IP numbers, rather than names, are used. So if you do not get into NIC with nic.funet.fi then try 128.214.6.100 instead. BIOLOGY: nic.funet.fi (128.214.6.100) (The mother of all bio-sources) Archive-info.src: biodocs.src: A collection of networking documents. compalgo.src: Bibliography of Computational algorithms in molecular biology and genetics limb.src: The LIMB database. emblsoft.src: Software held by EMBL on NETSERV seqanalr.src: Amos Bairoch's sequence analysis references toc.src TOC from Biosci lists from 89-91 enzyme.src ENZYME database biobit.scr BioBit newsletter back issues gbsoftware.src Software that uses Genbank datbase genbank.bio.net (134.172.3.160) (The Grandmother of all biosources) not be available on Mondays from 0300 to 0600 US/Pacific time biosci.src: BioSci newsgroups biology-journal-contents: Periodical references to journals in the area of molecular biology. Examples of such journals are NAR, EMBO, and CABIOS. quake.think.com (192.31.181.1) available only between 9AM and 9PM EST NIH-Guide.src: National Institutes of Health Guide to Grants and Programs for most of 1991. Molecular-biology.src: the annotation of the GenBank(R) DNA sequence database (Bacterial Division) - release 64.0. OTHER SOURCES (5): * Dow Jones is putting a server on their own DowVision network. This server contains the Wall Street Journal, Barons, and 450 magazines. You have to pay to use this server. * Thinking Machines operates a Connection Machine on the Internet for free use. The sources that it supports are some patents, a collection of molecular biology abstracts, a cookbook, and the CIA World Factbook, and the most famous jargon.src which formed the basis for the book a "hacker's dictionary" ( How many bogometer's just triggered?) * MIT supports a poetry server with a great deal of classical and modern poetry. Cosmic is serving descriptions of government software packages. The Library of Congress has plans to make their catalog available on the protocol. * Weather maps and forecasts are made available by Thinking Machines as a repackaging of existing information. With some luck you can view satelite GIFS over Internet. HOW CAN I FIND OUT MORE ABOUT WAIS? Thinking Machines run a mailing list that has weekly postings on progress and new releases; to subscribe send an email note to wais-discussion-request@think.com. Follow relevant newsgroups Usenet. WAIS and biological issues have bee discussed on bionet.software, and there also is a brand new group: alt.wais which is attracting quiet alot of attention on Usenet. REFERENCES In compiling this news letter we have relied heavily on recent discussion in BioNet and on documents available from Thinking Machine Corp., mostly written by Brewster Kahle. Richard Marlon Stein: Browsing through terabytes: Wide-area information servers open a new frontier in personal and corporate information services. BYTE May 1991: 157-164. Brewster Kahle: Wide Area Information Server Concepts. Draft89-4 anonymous FTP /pub/wais/doc/wais-concepts.txt@quake.think.com Brewster Kahle & Art Medlar: An Information System for Corporate Users: Wide Area Information Servers. TMC Tech Report TMC199 Version 3. anonymous FTP /pub/wais/doc/wais-corp.txt@quake.think.com Brewster Kahle & 5 others: An executive information system for unstructured files: WideArea Information Servers. anonymous FTP /pub/wais/doc/ Brewster Kahle: Overview of Wide Area Information Servers. April 1991 anonymous FTP /pub/wais/doc/overview.txt@quake.think.com Comments in Bionet.general by Robert Jones (Thinking Machines Corporation) jones@think.com Rob "dogwash" Harper ******************** CLIP from the JARGON.SRC ****************** [From a quip in the "urgency" field of a very optional software change request, about 1982. It was something like, "Urgency: Wash your dog first."] n. A project of minimal priority, undertaken as an escape from more serious work. Also, to engage in such a project. Many games and much gets written this way... also scientific articles. *********************** END ************************