automation vs human effort in designing an index
|
"Indexing is a highly complex intellectual process involving the use of language in a specific and somewhat artificial way, and that it is also to a considerable extent a matter of intuition, the workings of which cannot be reduced to fixed rules. It is 'knowing what but not knowing how'." (Hans Wellisch)
|
In this day and age of 'full-text search' and computers that can perform tasks at the click of a button, do we need the human indexer to compile and order a list of words?
What does 'full-text search' do? It is a tool designed to find exact matches within a set of pages. It is the exact matches that are the problem. Consider the word 'ball'. A full-text search, depending on the software used and logarithms employed, could turn up a piece of sporting equipment, a gathering for the purpose of entertainment and dancing, a party decoration (balloon) and any number of other possibilities. And if you look up each mention the search returns, what sort of in-depth analysis of the subject will you find? Can a full-text search analyse and disregard passing mentions?
We use a highly rich and complex language to express ourselves. We use many different terms to mean the same or similar things. For example 'cinema'. This could be talked about by using the word films, motion pictures, movies. We could be talking about acting, directing, or the cinema audience. All these concepts need to be gathered, linked, drawn together, alternate phrases and words found and cross referenced, because a reader may not think to just look at 'cinema'. Or may miss a passage that is about cinema but does not contain the actual word at all.
Yes, Word and other programs offer an index facility that will produce a list of words with locators (page or paragraph numbers), ordered alphabetically, in double quick time. But how useful is it to the reader? Try an experiment. Take a Word document you have - preferably one you have not written but one you can edit. Create an index for a list of concepts you wish to find and see how useful it is. Is the layout useful? Has it found all the information you want? Is the information useful or just a passing reference or even a red herring?
Indexes require human intellect to be useful. 'Full-text search' in no way substitutes for this. For more detailed arguments see the article Human or computer produced indexes by James Lamb.
"An index is a consciously designed method of finding mentions of subjects, persons, or ideas. A Google search return is merely a software-derived order based on rankings of key words, embedded words, and other data designed to return the 'mostest.' Google is an incredible tool but it is comparing apples to a bushel of acorns to think a Google search is anything but an easily manipulated return of key phrases. Indexers, like bibliographers, are unsung heroes of enlightenment and progress. Overworked and poorly paid they delight in obsessively creating something that even a child can use and profit by." (unknown)
What does 'full-text search' do? It is a tool designed to find exact matches within a set of pages. It is the exact matches that are the problem. Consider the word 'ball'. A full-text search, depending on the software used and logarithms employed, could turn up a piece of sporting equipment, a gathering for the purpose of entertainment and dancing, a party decoration (balloon) and any number of other possibilities. And if you look up each mention the search returns, what sort of in-depth analysis of the subject will you find? Can a full-text search analyse and disregard passing mentions?
We use a highly rich and complex language to express ourselves. We use many different terms to mean the same or similar things. For example 'cinema'. This could be talked about by using the word films, motion pictures, movies. We could be talking about acting, directing, or the cinema audience. All these concepts need to be gathered, linked, drawn together, alternate phrases and words found and cross referenced, because a reader may not think to just look at 'cinema'. Or may miss a passage that is about cinema but does not contain the actual word at all.
Yes, Word and other programs offer an index facility that will produce a list of words with locators (page or paragraph numbers), ordered alphabetically, in double quick time. But how useful is it to the reader? Try an experiment. Take a Word document you have - preferably one you have not written but one you can edit. Create an index for a list of concepts you wish to find and see how useful it is. Is the layout useful? Has it found all the information you want? Is the information useful or just a passing reference or even a red herring?
Indexes require human intellect to be useful. 'Full-text search' in no way substitutes for this. For more detailed arguments see the article Human or computer produced indexes by James Lamb.
"An index is a consciously designed method of finding mentions of subjects, persons, or ideas. A Google search return is merely a software-derived order based on rankings of key words, embedded words, and other data designed to return the 'mostest.' Google is an incredible tool but it is comparing apples to a bushel of acorns to think a Google search is anything but an easily manipulated return of key phrases. Indexers, like bibliographers, are unsung heroes of enlightenment and progress. Overworked and poorly paid they delight in obsessively creating something that even a child can use and profit by." (unknown)