The Internet for Journalists

Notes for a series of introductory talks and hands-on explorations.

In order to understand the internet and other on-line services, you have to use them. These notes are intended to be read as a World-Wide Web document, so that you can directly access the information services they mention. I have deliberately not given the "addresses" of those services in printable form: I hate re-typing computer stuff as much as you soon will.


0.0) Introductions:

Have you:

Used computers for anything other than word-processing?
Sent an email message?
Used internet services to find someone's email address?
Checked your email this morning?
Found something you set out to find on the Web?
Updated your own Web page this morning?

Do not follow this link

If you checked one or fewer, try the basic introduction.

If you checked at least the first two, this course is for you. If you checked the last one, you're probably competition.



0.1) Why should you be concerned about the internet?

At some point in your career you are going to be freelance. Believe me.

When you are a freelance, you will need to specialise. In many specialist areas, the internet is the native form of communication of the people you are writing about. If you refuse to enter bars, it is extremely difficult to be a crime reporter; if you refuse to enter the internet, it will be extremely difficult to cover science - including the social sciences and criminology - technology, and an expanding list of other areas.

At many points in your career you will not be able to rely on the services of librarians, newspaper morgues and so on.

When the Internet search engine AltaVista was launched in November 1995 it had found about 20 million "pages" of information. A "page" may be some student's conceit, or it may be a book-length briefing which provides you with much of the information and most of the contacts for a story. (All will be clearer when you've found one for yourself.)

In mid-November 1996 it had found 68,173,788 pages. In mid-March 1999 it reported 95,717,259 - though it had reported higher numbers six months previously. Some people estimate that there are up to three times as many publicly accessible pages as that.

Media owners are getting all excited about "interactive multimedia" without, it is clear, fully understanding either term. It would seem to be to your advantage to understand. The key is that the internet is like a place, and you don't understand unless you've been there.



1) Outline of the internet

The internet is not Netscape, and it is not the World-Wide Web. Netscape is a program you can run on your computer to show nice graphic images from the World-Wide Web to camera crews desperate for something pictorial with a pulse.

The internet is a means of transport for information.

It is equivalent to Railtrack; it provides a common standard for "tracks" over which different services may run. Not all kinds of service reach all places. Understanding that there are different services, and which one to use when, gives you a major advantage when it comes to tracking down information.

1.1) Services running over the internet include:

  • Electronic mail -- sending a message to one or more named individuals
  • Mailing lists -- automated distribution of messages and responses to a set list of subscribers
  • "Usenet" discussion fora or "newsgroups" -- making messages and responses public to the entire net
  • Telnet -- a way of connecting directly to another computer, opening a window on your screen which is effectively a "wormhole" to the other machine -- for example a library catalogue.
  • File Transfer Protocol (FTP) -- a means of wandering around other peoples' computers and retrieving files which they have "published" - When you "visit" an FTP site using a Web browser it looks and works like a very, very boring Web page. Used mostly, these days, for putting files onto Web sites.
  • "Gopher" -- a catalogue of available files, and "Veronica", the Very Easy Rodent-Oriented Networked Information Cataloguer and Archiver, which indexes the one-line descriptions in the gopher catalogues. Rare by 1999. When accessed through Netscape these look a lot like rather boring pages on... the Web.
  • The notorious World-Wide Web -- hypertext, potentially linking everything from the complete works of Shakespeare to jokey takes on the Bacon Society and the Journal of Psychosomatic Metallurgy into a single hyper-document.
  • Human-maintained directories of the content of the Web.
  • Machine-made indexes of the content of the Web.

These last two are fundamentally different from each other. Human- maintained catalogues have the advantage that their contents are filtered and (usually) presented in an easy-to-understand form. Machine-generated indexes have the advantage that their contents are not filtered - computers are, so far, not capable of introducing political bias. Using them effectively is a learned skill and the main reason you're reading this, no?

1.2) It's broken?

You often get error messages when using internet services. Remember, though the internet transport machinery was designed to survive a nuclear attack, it was also designed to move messages between 20 five-star generals in a few minutes, not between 20 million geeks who drum their fingers on the table when they've had to wait 15 milliseconds.

These error messages are usually over-definite, and should be prefixed by "Errr... I'm only a computer, but I'm guessing..." In almost all cases, it's worth trying whatever you just did again, immediately. Notice that most messages which come from the "remote" computer are more reliable than those from your computer.

Above all, relax. The way to learn is to play. If you're not making mistakes, you're not doing it right. Now, make a more interesting mistake...

The commonest error messages are:

Message: From: Means: Action:
The server does not have a DNS entry Yours Tried looking up the page you asked for in the "phone book": failed Try the page once or twice more; if it still doesn't appear, check the URL for typos.
Socket is not connected Yours The page you asked for is on a machine which seems to be turned off Try the page once more, then try in 2 hours.
Connection reset by peer Yours The machine which was delivering the page is overloaded or has suffered a glitch Try the page once more, then try in 2 hours.
404
Not Found
Theirs The page no longer exists or has moved. Really Check for typos. Find the index page for the site, to see whether the page is still there but at another URL. If not, do another AltaVista search.
Access denied Theirs Piss off Pay them the money?
Site owner gone away?


2) How do you, as a journalist, use the internet?

As in the Real World™, the trick is to know what tool or service to use. Do you call a contact, go to a meeting or press conference, or hit the library? Do you send email, listen in on an on-line debate, or search the Web?

2.1) Effective use of email, and locating people

The most useful internet service for many journalists is good old-fashioned email. Often, the best way to find someone's email address is a phone call. There are a number of services which aim to locate people -- for example Yahoo! People Search. These generally work, however, only for people who have their own Web pages or have taken part in Usenet discussions.

If you know what organisation a person works for, you may be able to find an on-line directory of email addresses in that organisation. Often, the best way to locate such a directory is to make an informed guess at its "URL". Experience is the best guide to doing this, but this article and this summary sidebar may help. It is also a great time-saver to be able to understand "URL"s before you visit them.

Oh, "URL"? It's a "Uniform Resource Locator". Formally, a name which identifies a particular computer file uniquely. So it's the "address" of something you can fetch over the internet. This document is
http://www.poptel.org.uk/nuj/mike/lecture.htm
Dismantle this URL thus:

  • http://: it's Hyper-Text Transfer Protocol, which is to say the Web;
    other possibilities include
    • gopher:// a gopher menu or document
    • news: a "newsgroup"
    • mailto: someone's email "mailbox"
    • ftp:// retrieve a document with File Transfer Protocol
    • telnet:// a direct connection to another computer
  • www.poptel.org.uk: the unique name of the computer it lives on;
  • /nuj/mike/: where it is on that computer (what directory or folder it's in);
  • lecture: the name of the file;
  • .htm: it's Hyper-Text Markup Language, which is to say a Web page (it ought to be .html but I'm lazy about converting the file-names from my Windows machine);
    other possibilities for this "filename extension" include:
    • .jpg or .jpeg or .gif a direct reference to a picture
    • .txt plain text
    • .doc almost certainly a Microsoft Word document
    • .zip a compressed file to be unpacked on your DOS/Windows computer
    • .hqx or .sit a compressed file to be unpacked on your Macintosh computer
    • .pdf an Adobe Acrobat™ file - you need to download a reader program.
    • .ps or .eps a PostScript™ file; the free GhostScript/GhostView reader combination will display the content, if not the entire corporate design vision thang. (Latest version does .pdfs too.)

When contacting people who have a presence on the Web, it's a good idea first to take a look at the Web to see whether they've already answered your question. The Web was, after all, invented by academics tired of answering colleagues' questions in email. Then you call them or email them for the soundbite, knowing what to ask for.


2.2) Search strategies

When I start up my internet access program, it shows me a customised version of the AltaVista Advanced Search page. That sums up my attitude to using the Web: don't surf, search.

The reason I've set it up this way is that when I use an "easy, friendly" search engine, I typically get presented with 4000 pages to trawl through and sometimes 100,000. When I compose a neat "hard" search, I get maybe 10 to 30.

A neat search query is one which will find almost all the relevant documents and almost no other documents.

The canonical example is a search for stuff about computing and development issues (as in human and economic development).

"Development" is also a nerd-word; so an "easy" search for

computer development
will turn up all the documents containing either word, with the ones containing both words first, but thousands of them about computer hardware and programs.

A "hard" search for

(comput* NEAR develop*) AND NOT (program or hard*)
will be much more concise. There's a short guide to constructing hard searches in this sidebar.

As the Web expands, other search engines are trying different ways to achieve the narrowly-focussed results of a good AltaVista Advanced Search. HotBot's SuperSearch has a brave attempt at a fill-in-the-blanks approach. If you can do effective Advanced Searches, you can work out how use any of these. I just think of them as long-winded ways of typing Advanced Searches, with fewer options.

I don't have time to write the whole book on composing these searches (rather, offer me an advance) -- so (for now) you'll have to come see me wave my arms about. It would take a whole book to write down because it's about ways of thinking, not facts.

In case it's not obvious, using any of the search engines is quite different from using a subject catalogue. You are trying to guess words which will appear in the text you seek. These are not necessarily similar to the words which a librarian would use to describe these documents.

You can select documents very efficiently in a field you know well by choosing everyday or technical terms for the same thing - consider:
"genetic engineering" vs recombinant
and
law vs statute.

The Google search engine (which was still in "beta test" in August 1999) is interesting. It ranks pages according to the number of pages which link to them. This makes it very good indeed at finding home pages for organisations and individuals - or at least those which have had Web presences for a while. It did well, too, on Genetically Modified Organism* but averagely for press freedom.

Another new search engine, FAST, claims to index more pages than any other - 200M at launch in August 1999 with the goal of indexing 1000M pages by the end of the year. It is indeed impressively fast, but doesn't (yet) offer highly selective searching, so be prepared to be overwhelmed.

You can get a much more library-like experience from the subject directories. The hard part here is much like decoding a newspaper or agency report: understanding the bias of interest of the directory you're looking at. This is particularly important when deciding whether to attribute meaning to the absence of a result.

2.3) Evaluating what you find

If you stay calm and don't get carried away with the newness of the technology, you'll find this is just the same as evaluating material you find in the Real World™.

Stuff you pick up in a mailing list has the same value as stuff you pick up in (a conversation in) a bar. (It makes some difference whether it's the bar of King's College Cambridge or the Axe, Hackney Road...) Stuff you pick up in Usenet news is like... a bar in a bad neighbourhood, usually. It's not necessarily useless -- but it definitely needs some work to see whether it checks out.

When it comes to evaluating Web pages, learning to deconstruct URLs can save a lot of time. There is an enormous difference between
www.mit.edu/preprints/chomsky9701.html
and
www.coi.gov.uk/coi/depts/GHE/GHE.html
and
users.aol.com/members/KewlDoodz.html ... but as a journalist you don't take anything you read as Gospel and you check everything out... don't you?

Can you rely on email to check out what you find? Guess what? The answer is also exactly parallel to the answer in the Real World™.

If you're writing a piece about a technical development and you're sure there's no personal, political or philosophical conflict involved, then yes, email will do (from academics, it may even get a better quote than a phone call).

If, on the other hand, you need to talk to lawyers representing a company alleged to be selling arms to the Interahamwe -- then you need to go visit said lawyers so you can look them in the eyes and see them lie, don't you?

2.4) Sifting things to evaluate

The biggest problem can be that you feel swamped with too much information -- even when skilful searching presents you with only 20 to 200 leads to check out.

Speed-reading is virtually essential, and so is the rather different skill of skimming.

The goal of all speed-reading programmes is to train yourself to take in an entire line of information at a time. One method which works is to spend a few weeks reading with a bookmark, ruler or bus-ticket held above the line you are reading, moving it down as you read at your present speed. When you've got over the embarassment of doing this on the Tube, push. You may even feel the effect as a tug on your brain. If you have to re-read a line, slow down before you speed up again.

You may find it possible to skim an entire screen-full of leads by putting your eyes slightly out of focus and waiting for key phrases to leap out. You could practice with the plumbers' section of the Yellow Pages, finding those in your postal district without reading any of the entries.

The next key is to be decisive. When searching data on the internet, you're never going to find all the interesting stuff that's there. Half a second to decide that a lead is not worth following is more than enough. The ones you do follow are more than likely to lead you back to anything crucial which you mistakenly rejected, anyway.

Once you've decided to follow a link, the information can take an annoyingly long time to arrive, especially between 12:00 and 06:00 when substantial numbers of Americans are awake.

The solution to this is to follow several leads at once. Place your mouse pointer over the link at the end of this paragraph. On a Windows machine, hold down the right-hand mouse button; on a Mac, hold the button down and wait for the menu to appear. Move the pointer to the option New Window with this Link and let go. Depending on the capacity of your computer, you may be able to have from four to sixteen Web pages open at once; you can skim one while the others are arriving. A truly pointless example.

This page opens "external" links in a new window automatically. So, if this page fills your entire screen, clicking on an external link may cause nothing to appear to happen. It is, but invisibly behind all this. Select that other window from the Window or Communicator menu in Netscape, or flip over to the the other instance of Explorer...

Warning: if you open too many simultaneous sessions in Netscape, it will fall over and die. Every machine I've used in the past five years manages four at once.



3) Setting up a regular flow of information

This section is brief: in the course we deal with it by hunting down real examples for a lucky participant.

3.1) Newsgroups

Newsgroups are open discussion fora to which almost everyone on the internet can contribute.

In general, using a newsgroup as a source is like walking into the bar in Star Wars and bleating "does anyone know...?" In a few fields, mostly internet-related, newsgroups are a useful source. It's easier for you to visit them than for me to describe them.

There are over 50,000 newsgroups, but it's easy to find newsgroups by key-word.

Before leaping in and asking a question in a newsgroup, you must:

  1. Read the guidelines on newsgroups available from ftp://rtfm.mit.edu/pub/usenet/ news.announce.newusers/ and, if (as usually) RTFM is busy, from ftp://sunsite.doc.ic.ac.uk/usenet/ news.announce.newusers/.
  2. Read the newsgroup in question for at least a week
  3. Check the above sites to see whether the newsgroup has a FAQ (Frequently Asked Questions) list. Many of these are book-length.

FAQs are, at the minimum, lists of the questions which newsgroup contributors got fed up with answering. Many are useful reference works, accurately reflecting at least on the consensus of belief among, say, bus-spotters.

3.2) Mailing lists

Mailing lists are a means of exchanging email messages between people who have an interest in common. They're like 18th-century corresponding societies, but quicker.

Mailing lists often have a much higher quality of discussion than newsgroups, probably because joining them requires a little more thought and effort. They also almost all have off-months.

Mailing lists have two email addresses: one to which you send messages which will be redistributed to all the human subscribers to the list; and one to which you send housekeeping messages intended for the robot which manages the list, like a request to suspend the flow of messages while you're travelling.

For example, the cni-copyright list message address is

cni-copyright@cni.org

but the "listserver" address for subscribing, unsubscribing and so on is

listproc@cni.org

Methods of subscribing to and unsubscribing from mailing lists vary. If you have the "listserver" address but no instructions, send email to it containing the one-line message

help

-- this will probably get you instructions delivered to your mailbox within a few hours. Always keep these instructions, and any further instructions which you receive when you join.

One of the quicker ways to annoy subscribers to a mailing list is to send a message to all the humans which should have gone to the robot, for example one asking how to get off the list.



4) Resources and paid-for services

So far, we've looked mostly at the Web as a resource to trawl. There are many services which aim to provide one-stop shopping for information in selected fields.

4.1) Information about paper publications

You can access hundreds of university library catalogues to find a nearby copy of a book. Some sites index articles in journals, though coverage of these is (as far as I know) nowhere complete.

There are lists on the Web of all UK university libraries and of major European libraries -- see resources for this section.

Some of these still (Jan 2000) give you direct telnet access to the library's catalogue. This means that a window opens on your screen showing you what a user of an old-fashioned green-on-black computer terminal in the library sees. Do not expect clicking the mouse in this window to do any good at all. The old-fashioned computer terminal doesn't know that mice exist.

Reading the screens carefully will always give you an idea of what to do next -- on different library systems, you might start a catalogue search by author name by hitting 2; by hitting 2 and then <enter>; or even by typing AUTHOR and then hitting <enter>.

The exception to universal helpfulness is that many systems don't make it clear how to get out. If the opening screen or menu tells you, make a note (on paper!). If you are desperate to leave, closing the connection by shutting down the telnet session does no harm. (On a Windows computer, do this by selecting the Exit option under the File menu of the telnet window).

The UK academic NISS (National Information Services and Systems) service provides an alphabetic subject list.

For US publications, you can search the Library of Congress. You may find useful information on the Internet Public Library, but I haven't used it enough to give it marks out of 1000, and some sections are sponsored by interested parties.

4.2) Clippings, wires and news services

There are a number of paid-for information services on the Web. Most, unsurprisingly, provide a more comprehensive range of news about technology than anything else, and the news values of most are those of the City pages.

Some of the best of these are not part of the internet at all, but the commercial Outernet.

Among these the premier service for UK news is FT Profile. Among hundreds of databases, Profile offers full-text searching of all the UK broadsheets since Spring 1986, the past year's PA wire, and 88 US newspapers since 1980. Blag access wherever you can. It has been alleged that some people do newspaper subbing shifts more for the Profile access than for the direct pay. It's hard to discover what a personal subscription costs -- Profile is still in "we'll send round a sales rep" corporate-market mode -- but it's in the hundreds of pounds a year plus 60p per minute, or thousands a year flat-rate.

Compuserve offers some Profile access to its subscribers, with a prettier interface and a charging structure which I've never fathomed.

Other key Outernet services are Reuters Business Briefing and Reuters Business Alert. I can't even find a Web page publicising these: telephone Reuters. Dedicated software is available which will telephone Reuters' modems and find and display stories nicely.

Increasingly, these services are likely to move to internet and World-Wide Web distribution. (Reuters is currently offering separate Web-based services.) They will probably ask, like the existing Web news services, for your credit card number so that they can charge you a monthly subscription fee, plus charges of US$1 upwards each for premium stories.

Many of these services - particularly FT Profile and Northern Light - are in conflict with various journalists' unions over copyright. They are, or are alleged to be, selling freelances' articles without either permission or payment. Try looking up your own name...

Beyond the paid-for services listed opposite, there are industry-specific services in (for example) multimedia and movies. They're often expensive.

In my humble opinion, giving your credit card number in an internet email is less risky than giving it to book ballet tickets by phone. But most services offer you the option of telephoning or waiting for details to arrive by post if you prefer. Be it on your own head.

There are also advertiser-supported free news services.



5) Issues for (freelance) journalists

An increasing number of publications are putting articles - our work - onto the Web. Many already sell our work through databases such as FT Profile for 5 to 10 pounds a go per reader. Beware clauses in so-called commissioning letters which try to have you sign away all rights to your work, so they can pay you once and sell often.

If you're on staff, in the UK and the US the presumption is that your employer owns all rights in your work. Reflect that on the mainland of Europe staff journalists retain rights in their work.

Answer: in the UK, join the National Union of Journalists. You will find Web contacts for journalists' organisations in other countries on the NUJ pages; and information about the copyright fight nearby.

The Authors' Licensing and Collecting Society in the UK has set up its ByLine project as a prototype of a system for journalists to sell our own work, on our own terms, on the internet. Registration is free to members of the National Union of Journalists.