Researchers from Huazhong University of Science and Technology in China have started work on a project based on a distributed information retrieval system that promises to address future search engine scalability issues that are believed to be inevitable as the Internet continues to expand.
With the rapid increase of web pages, the coverage of search engines will become poorer and the update intervals will be much longer. If the current architecture of search engines is still in use, it will be an impossible mission to find the precise and comprehensive information in the future. This problem will be more serious when IPv6 technology is widely implemented in communication networks. The problem of "Too much information means no information" may become a disaster with information explosion. To solve this problem, there should be an efficient information management system for the Internet. Explained below is a new system called Domain Resource Integration System (DRIS) applied to a digital library project.
Origins of DRIS
There are increasing numbers of digital resources that are introduced into libraries. Just in our university library there have been one hundred kinds of resources such as IEEE, ACM, many Chinese digital journals, etc. We can also get information from many public resources like web search engines. It may also be increasingly inconvenient to find search results in hundreds of pages of Google, but when we go to the library, we will find it's just the beginning. You may have to search in dozens of different information resources one by one. You will also need to be very familiar with the query rules of every database. Then you may get the comprehensive and precise information you are looking for. It's a really difficult mission. The Internet information retrieval problem is even more serious in a digital library. Hence, to solve this problem, we need to complete two missions. First, we should build a system that can integrate all the resources. Current information retrieval sources on the Internet are quite fragmented and lack an efficient form of connection among them. Second, after finishing this resource integration system, some search mechanism will be needed to apply this integrated system in order to provide a unified search service for users.
Basic Ideas of DRIS
To finish the two missions mentioned above, we completely divide the Internet search engine into two parts: Internet information retrieval infrastructure and personal search system. This is the main difference between current search engines and DRIS. DRIS will be treated as the public information retrieval infrastructure, which will integrate all kinds of resources on the Internet. The personal engine system can organize, rank, and filter search results according to your personal information. DRIS will be its data source. Hence using this system may retrieve more precise information. So the basic idea of DRIS is that search should be the international function of the Internet and everyone should have his or her own personal search engine.
The architecture of DRIS
The final aim of DRIS is to integrate all the resources available on the Internet. Now there have been billions of web pages, millions of special databases and many other kinds of information resources on Internet. Gathering all the resources in a system and building a mirror database of the whole Internet may be an impossible undertaking. Even with enough storage and computer processing power, the update intervals and coverage of data would be difficult to ensure. Hence, a centralized architecture is not appropriate to build such an Internet information retrieval system. But most of current commercial search engines apply this centralized system. On the other hand, a distributed management could be much more effective in a large-scale system. Which would lead us to adopt a hierarchical distributed architecture to manage all the information on the Internet. The key issue will then be a correct form of dividing the Internet. Now there has been a successful hierarchical distributed system in use already and that is the Domain Name System (DNS). The basic architecture of DRIS is the same as DNS.
By Wang Liang, Ph.D
|Cybersquatting||Policy & Regulation|
|DNS Security||Registry Services|
|IP Addressing||White Space|
Minds + Machines