Monday, April 19, 2010

search technologies

Each of us has been faced with the problem of searching for information more than once. Irregardless of the data source we are using (Internet, file system on our hard drive, data base or a global information system of a big company) the problems can be multiple and include the physical volume of the data base searched, the information being unstructured, different file types and also the complexity of accurately wording the search query. We have already reached the stage when the amount of data on one single PC is comparable to the amount of text data stored in a proper library. And as to the unstructured data flows, in future they are only going to increase, and at a very rapid tempo. If for an average user this might be just a minor misfortune, for a big company absence of control over information can mean significant problems. So the necessity to create search systems and technologies simplifying and accelerating access to the necessary information, originated long ago. Such systems are numerous and moreover not every one of them is based on a unique technology. And the task of choosing the right one depends directly on the specific tasks to be solved in the future. While the demand for the perfect data searching and processing tools is steadily growing let's consider the state of affairs with the supply side.

Not going deeply into the various peculiarities of the technology, all the searching programs and systems can be divided into three groups. These are: global Internet systems, turnkey business solutions (corporate data searching and processing technologies) and simple phrasal or file search on a local computer. Different directions presumably mean different solutions.

Local search

Everything is clear about search on a local PC. It's not remarkable for any particular functionality features accept for the choice of file type (media, text etc.) and the search destination. Just enter the name of the searched file (or part of text, for example in the Word format) and that's it. The speed and result depend fully on the text entered into the query line. There is zero intellectuality in this: simply looking through the available files to define their relevance. This is in its sense explicable: what's the use of creating a sophisticated system for such uncomplicated needs.

Global search technologies

Matters stand totally different with the search systems operating in the global network. One can't rely simply on looking through the available data. Huge volume (Yandex for instance can boast the indexing capacity of more than 11 terabyte of data) of the global chaos of unstructured information will make the simple search not only ineffective but also long and labor-consuming. That's why lately the focus has shifted towards optimizing and improving quality characteristics of search. But the scheme is still very simple (except for the secret innovations of every separate system) - the phrasal search through the indexed data base with proper consideration for morphology and synonyms. Undoubtedly, such an approach works but doesn't solve the problem completely. Reading dozens of various articles dedicated to improving search with the help of Google or Yandex, one can drive at the conclusion that without knowing the hidden opportunities of these systems finding a relevant document by the query is a matter of more than a minute, and sometimes more than an hour. The problem is that such a realization of search is very dependent on the query word or phrase, entered by the user. The more indistinct the query the worse is the search. This has become an axiom, or dogma, whichever you prefer.

No comments:

Post a Comment