en

Fastgrep

INTRODUCTION
“fastgrep” is a multithread version of “grep” which is a useful string patterns matching tool in Linux/UNIX. “fastgrep” is based on Tgrep(threaded recursive grep) developed by Ron Winacott. Tgrep supports all but the –w word search options of the normal “grep”, also it supports some it's own options. The chief difference between tgrep and grep is that, the former can search the target string pattern in subdirectories recursively using multithread when it meets a directory. However, the main disadvantage of tgrep is that it is not suited for large file, and the reason is that when searching in such files, tgrep still in single thread. Nowadays, it’s a data-intensive era, so file’s scale is larger and larger, and how to search target strings and locate the pattern fast in large files becomes more and more important. “fastgrep” does some improvements to tgrep, and for the large file, it has better performance than tgrep. 

Feature

The following is some key features of fastgrep:

There are three kinds of thread in fastgrep: cascade thread, search thread, sub-search thread

  > search thread: it is for the string pattern matching in a regular common file. When fastgrep meets a regular file, one search thread is created.
  > cascade thread: it is for the management of the search threads. When fastgrep meets a directory, one cascade thread is created, and it recursively searching in subdirectories.
  > sub-search thread: used in search threads.  Search thread first checks the size of the file. When the file size is larger than a certain value (eg. 4M), search thread divides the file into blocks and then creates sub-search threads for searching patterns in each block. 

Dataflow of fastgrep

The main thread puts the target file/directory into the work queue, while detects in the work queue. When a directory is detected, it creates a cascade thread and then puts the directory into the work queue; when a regular file is detected in the queue, a search thread is created.
The search thread first checks the size of the file. When size is larger than a certain value (eg. 4M), search thread divides the file into blocks and then creates sub-search threads for searching patterns in each block.

Tags:
Created by admin on 2009/10/26 12:07
Last modified by admin on 2009/10/26 12:07

XWiki Enterprise 2.7.1.34853 - Documentation