Search in DHT-based P2P
Complex Queries in DHT-based Peer-to-Peer Networks
Matthew Harren, etc. Ion Stoica Berkeley
Search is different from Querying. Search is a limited form of querying. Search is finding files whose names contain a given string, but Query allows combinations and correlations among things found. e.g. search music by "Bach", but query by "bach's chorales". This paper is about Query.
This paper shows Poor Query language is a serious limitations in P2P network. It thinks P2P's Poor scaling limition has been largely solved by DHT: DHT is extremely scalable, i.e. lookup can be resolved in log n ( or n exp(a) for small a).
For search, DHT has no problem. Hash indexes can be used directly for textual similarity searched. e.g. For example, a file with ID I and filename "Beethovens 9th" could be split into twelve trigrams: Bee, eet, eth, tho, hov, ove, ven, ens, ns%, s%9, %9t, 9th (where '%' represents the space character). For each such n-gram gi, the pair (gi; I) is inserted into the hash index, keyed by gi. One can build an index over n-grams for various values of n; it is typical to use a mixture of bigrams and trigrams. Then strict substring search or regual expression search can be done based on it.
The search can be expressed by SQL:
SELECT H.fileID, H.fileName
FROM hashtable H
WHERE H.text IN (<list-of-n-grams-in-search>)
GROUP BY H.fileID
HAVING COUNT(*) >= <#-of-n-grams-in-search>
AND H.fileName LIKE <substring expression>
So it thinks, the support of tranditional relational DB operation is necessary in P2P infrastrucutre. In P2P, the file/service attributes are distributed, the peers need cooperate to finish a query request.
So it introduce a three-tier software node implementation architecture for P2P Query. It also argue to implement an hierarchical table-related name space on DHT flat ID space, and use multicast routing to facilitate the query interaction between peers.
The planed supported DB operations are: selection, projection, join, grouping and
aggregation, and sorting. It also mentioned the characteristic of online query.
[Feeling]
It is more related to research about query of DB system. Knowledge is lack for me to get more insight for it. Anyway, the method for Hash search has told me search in DHT is no problem. That's great!
[Future Reading]
Haas, P. J., and Hellerstein, J. M. Online Query Processing: A Tutorial. In Proc. ACM-
SIGMOD International Conference on Management of Data (Santa Barbara, May 2001). Notes posted online at http://control.cs.berkeley.edu.Hellerstein, J. M., Haas, P. J., and Wang, H. J. Online Aggregation. In Proc. ACM SIGMOD International Conference on Management of Data (1997).
Witten, I. H., Moffat, A., and Bell, T. C. Man-aging Gigabytes: Compressing and Indexing Documents and Images, second ed. Morgan Kaufmann, 1999.
标签: 未分类

0 Comments:
发表评论
<< Home