OpenSolaris
Collectives
Discussions
Documentation
Download
Source Browser
Free CD
Log-in
|
en
Project opengrok
:
Documentation
>
Internals
Top Menu
Show
:
Comments
Attachments
History
Information
Print
:
Print
Print preview
Export as PDF
Export as RTF
Export as HTML
Export as XAR
Wiki code for
Internals
Hide Line numbers
1: == OpenGrok Internals == 2: 3: Chandan B.N, Feb 2006 4: 5: OpenGrok contains these main packages: 6: 7: * org.opensolaris.opengrok.analysis: Responsible for analyzing programs, source files, archives like ZIP, tar, documents like man pages, xml and html files, images etc., 8: * org.opensolaris.opengrok.index: Builds or updates the Lucene index, recursively going down the directory tree. 9: * org.opensolaris.opengrok.search: Has utilities that provide interface for search results, matched context etc. 10: * org.opensolaris.opengrok.history: Abstraction for source code version control. 11: * org.opensolaris.opengrok.web: Utility routines used by the webapp. 12: 13: While the source is the ultimate reference guide for how it works, here is a brief illustration of certain mechanisms. 14: 15: === Analysis === 16: 17: Think of the package as a separate department having engineers (called Analyzers) who are subject matter experts in each type of program or file type. CAnalyzer knows something about C programming language. The ELF Analyzer knows how to read a ELF symbol and string table from an ELF (executable) file. 18: 19: Central to analysis, is the Analyzer Guru. He knows all the Analyzers by name. Most analyzers sit in static offices. Hiring a new Analyzer for each analysis work was a bit expensive. 20: 21: [[image:opengrok-analysis.png]] 22: 23: When the indexer thinks he needs to analyze a file to be indexed, he calls Analyzer Guru. Analyzer Guru knows exactly who to send the file to. He creates a blank [[Lucene Document>>http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Document.html]] and a [[FileInputStream>>http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileInputStream.html]] and gives it to the appropriate Analyzer. He knows the Analyzers by name, because they would initially tell him the file extensions and magic numbers of file types they are experts in. 24: 25: On getting a document, an Analyzer sends it to his boss (super) to get it filled with more Lucene fields. A boss sends it to his boss, and so on. Higher up the management chain, they do less (specialized) work. For example while CAnalyzer knows a great deal about C keywords and comments and knows how to generate hyper-text cross-reference a C file, his boss Plain Analyzer can only read plain text files. He has poor punctuation skills and ignores most of the punctuation marks that he thinks are unnecessary. He knows what websites and email addresses are, so when he is asked to cross reference a file, he just hyper-links the URLs he can recognize. 26: 27: Plain Analyzer also does an important job. He out sources finding definitions in a program file to Exuberant Ctags. He sends out the file to exuberant ctags, which tells him exactly what symbols are considered definitions. Plain Analyzer adds that information to the Lucene Document. 28: 29: His boss, File Analyzer is the boss of the department. Every one reports to him. All he does is to stamp the Lucene document with file date and name and sends it back. 30: 31: This package quite modular. It can be extended to analyze any program type. To add an Analyzer for a new type of programming language or file type, just extend or copy one of the suitable Analyzers and introduce his name to Analyzer Guru. 32: 33: To get the version control history Analyzer Guru, calls his old friend the good old History Guru. This guy is much older than Analyzer Guru, and his silky long beard almost touches the ground. He has several assistants called History Readers. They just read the version control log history for a given file and directory. Currently there are assistants who can read Subversion, CVS and SCCS logs. To be able to support a new type of source code version control, just hire a new assistant reader who can read the logs. 34: 35: Analyzer Guru on getting his Document returned back with different information like definitions, full-text, symbols, history, sends it back to the Indexer. 36: 37: === Index Update === 38: 39: The index is the inverted index of all files in the source trees. For every unique word in the source tree it contains a list of files where the word can be found. While tools like cscope and ctags also build indexes, they can’t incrementally update it. They just rebuild it from scratch. OpenGrok uses the modern indexing methods and can incrementally update its index. (i.e update only the changed/added/deleted files since last index build) 40: 41: To know what files changed or got created or deleted we keep a sorted list of files in the index tagged with the files last modified date (converted in to a alphanumeric string such that dateString(date1) < dateString(date2) if date1 < date2. We traverse the file tree depth first at each stage sorting the child nodes, by which we get a sorted list of file paths. 42: 43: It boils down to finding the diffs between two sorted lists (list of files on disk Vs list of files in the index). Left hand side is the file tree whose index on the right side needs to be updated. To begin with assume we had a correctly indexed tree Yesterday. 44: 45: [[image:opengrok-index1.png]] 46: 47: Let us say Today we removed a makefile and added a new file foo.c and modified frotz.c. During first pass we see that makefile-yesterday and frotz.c-today on our index-list did not match tree-traversal list. So we delete those lines (i.e documents) from the index. 48: 49: [[image:opengrok-index2.png]] 50: 51: During second pass we see that foo.c-today and frotzc-today are new and we add the documents to the index. 52: 53: [[image:opengrok-index3.png]] 54: 55: The Lucene inverted index can be either opened to add more documents or delete existing documents at a time. To update a document you must delete it first, close the index and add it again. To optimize for faster updates, we first delete in the index all the changed or deleted files in the source tree. In the second pass we add all the documents not in the index. 56: 57: {{html}} 58: <script type="text/javascript"> 59: var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); 60: document.write(unescape("%3Cscript src=’" + gaJsHost + "google-analytics.com/ga.js’ type=’text/javascript’%3E%3C/script%3E")); 61: </script> 62: <script type="text/javascript"> 63: try { 64: var pageTracker = _gat._getTracker("UA-3665334-1"); 65: pageTracker._trackPageview(); 66: } catch(err) {}</script> 67: {{/html}}
Search
Project opengrok Pages
Discussions
Issues
Tentative Roadmap
Files / Downloads
Documentation
Supported Revision Control Systems
How to build OpenGrok from source
How to install OpenGrok
Internals
Story of OpenGrok
Collectives
Community Group
Academic and Research
Accessibility
Advocacy
Appliances
Approachability
Architecture Process and Tools
BrandZ
Chinese Users
Community Advisory Board
Databases
Desktop
Device Drivers
Distribution
Documentation
DTrace
Emerging Platforms
Fault Management
Games on OpenSolaris
HA Clusters
HPC Developer
Installation and Packaging
Internationalization and Localization
Laptop
Logical Domains
Modular Debugger (MDB)
Networking
NFS
Observability
OpenSolaris Governing Board (OGB)
OpenSolaris Printing
OS/Net (ON)
Performance
Power Management
PowerPC
Security
Service Management Facility (smf(5))
Software Porters
Solaris Volume Manager
Storage
Systems Administration Community Group
Testing
Tools Home
Unix File Systems (UFS)
Website Community
X Window System
Xen
ZFS
Zones
Project
ADSL Modem Enhancement
ARC Process Definition
ARM Platform Port
Automatic Data Migration
BIND Update
Bluetooth Stack & Drivers
Brocade FC HBA - Initiator
Brocade FC HBA - Target
Brussels - unified network link configuration
Caiman, Solaris Install Revisited
Celeste
Český portál
Chime Visualization Tool for DTrace
CIFS client for Solaris
CIFS Server
Clearview: Network Interface Coherence
Cluster Agent: Informix Dynamic Server
Cluster Agent: OpenSolaris Container
Cluster Agent: OpenSolaris xVM
Cluster Agent: Oracle E-Business Suite
Cluster agent: PostgreSQL
Cluster Agent: Samba
Cluster Agent: Tomcat
CMT
Coarse Data Flow Parallelism
Colorado: Open HA Cluster on OpenSolaris
Command Assistant
Common Array Manager
Companion - /opt/sfw: Free and Open Source software
COMSTAR: Common Multiprotocol SCSI Target
Content
Contest
CPU Observability
Credentials Process Groups
Crossbow: Network Virtualization and Resource Control
Crypto KMS Agent Toolkit
Cryptographic Framework
Data Migration Manager
Data Tethers
Deutsches Portal
Device Detection Tool
Device Driver Utility
Device Manager
Device Mapper
Direct Rendering Infrastructure & 3D drivers
DTrace Guide
Duckwater: Simplified name services management
Easy Tools
Emancipation
Emulex Fibre Channel Device Driver
Emulex Advanced Ethernet Device Driver
Enable/Enhance Solaris support for Intel Platform
Enhance the support of USB webcams
Enhanced SMF Profiles
Enhancements for AMD-based Platforms
Erlang DTrace Integration
Ethernet bridge module for Solaris
Evaluate Conary
Events Registry
Ext3 file system support
F/OSS Package Base
Facilitation
Fibre Channel over Ethernet
Fine Grained Access Policy (FGAP)
Fingerprint Authentication
Flexible Mandatory Access Control
Forensic Tools
Fully Open X Project
Fuse on Solaris
gcore
Generic Machine Check Architecture Improvements
Google SOC
HA-JBoss
HA-MySQL
Hadoop Live CD
Hitachi
HoneyComb Fixed Content Storage
HPC Stack
Image Packaging System
Improved Performance MIB
Indiana
Innovation Awards
Input Method
Intel Graphics
Interrupt Resource Management
IP Datapath Refactoring
IP over Infiniband
IPsec Tunnel Reform
iSCSI Extensions for Remote DMA (iSER)
iSNS Server
JeOS - Just enough Operating System
JKstat - a java binding for libkstat
Journaled File System (JFS)
K Desktop Environment
Kerberos
Kernel Sockets
Kernel SSL Enhancements
Key Management Framework
Korn Shell 93 integration/migration project
Labeled IPsec
LatencyTOP
Layer 2 Filtering
LDoms Manager
Lending
libMicro - portable microbenchmarks
Link Layer Discovery
Live Media: Technologies for distributions running from CD and other media
Locale Data
lofi compression and cryptography support
lx64 brand
Media Management System
Mega_sas
Mexico
MilaX minimal Live Distribution
MIPS Platform Port
Mozilla DTrace
MRSL.NONsharedDevice
Multi-lingual Glossary
Multi-pathing software (MPxIO)
Multiple disk sector size support
Multiple DOI
Muskoka: An open repository for OpenSolaris technical content
Navigator
Nemo: A Framework for High-Performance Networking
Network Auto-Magic
Network Data Management Protocol
Network MIBs
Network Storage
Network Time Protocol (NTP)
Nevada Globalization
New Design of 4over6 Mechanism Based on OpenSolaris
NFS RDMA transport update and performance analysis
NFS Server in non-Global Zones
NFS version 4.1 pNFS
NFSv4 namespace extensions
Nightingale: Port Songbird to OpenSolaris
NPort ID Virtualization (NPIV)
NUMA
Object Storage Device (OSD) support for Solaris
OHACGE Script Based Plug-in
ON/Nevada (ONNV) Project
Open Development Infrastructure
Open HA Cluster Utilities
Open Sound System
OpenGrok
OpenPegasus CIM Server
OpenRTI
OpenSolaris Busybox
OpenSolaris Desktop
OpenSolaris Hispano
OpenSolaris Security Audit
OpenSolaris support for the QEMU processor emulator: host and guest
PEF: Packet Event Framework
Performance Wrappers
Pkgfactory
Polski Portal
Portail Francophone
Portal Brasil
Portals
Power Management Usability Interfaces
Presto: Automatic Printing Configuration
Printable Many Page Solaris Manuals
Promise SuperTrak RAID HBA Driver
QLogic Converged Network Adapter GLDv3 NIC Driver
Quagga Routing Protocol Suite Integration
RAID Configuration Utility
RBridge (IETF TRILL) support
RDMA Offload Framework
Reno: Login Process Enhancements for Interop
Resource Management
s10brand
SAM/QFS
SCM Migration Project
SCSI RDMA Protocol
SDcard Drivers
Sensor Abstraction Layer
Session Initiation Protocol
SFW
Shell: bourne shell, korn shell, C shell, etc.
Sierra: Intel WiFi Chipsets Support
Simple Panels
SM-HBA Based SAS HBA Management
SMF Documentation
Solaris iSCSI Target
Solaris PowerPC Port
SourceJuicer
Sparks: name service switch/nscd enhancements
Squashfs
Star integration/migration project
Starfish
Starter Kit
Storage Power Management
Sun Security Toolkit
Sun StorageTek Availability Suite
Support for OpenFabrics User Verbs / API on OpenSolaris OS
Support gcc4/GCCfss in Solaris
Suspend/Resume
SVR4 Packaging
Systemz
Tamarack: Removable Media Enhancements in Solaris
Tesla: OpenSolaris Enhanced Power Management
Test Development
Tickless Kernel Architecture
TIPC
Trademarks
Trusted networking interface policy database for Trusted Extensions
Trusted Platform Module support
Use Case
Validated Execution Project
Virtual Console
Virtual Network Machines
Visual Panels
Visualization for HPC
Volo
VRRP: Virtual Router Redundancy Protocol Implementation
VSCAN service
Web Stack
Website
Winchester: Schema mapping and ID mapping for AD Interoperability
Wireless USB Support
Wireless Wide Area Network
X Consolidation
x86 Generic FMA Topology Enumerator
Xen Gate
Xfce: A lightweight desktop environment
ZFS Boot and Install
ZFS on disk encryption support
Zone Manager
Zone Statistics
Русский портал
البوابة العربية
भारतीय पोर्टल
中国门户
日本ポータル
한국 포탈
User Group
Adelaide
Argentina
Arizona
Atlanta
Baltimore-Washington
Bangalore
Bangkok
Bangladesh
Beijing
Bélem
Berlin
Bhimavaram
Bloomington
Campus Ambassadors
Capital Region
Cardiff
Charlotte
Chengdu
Chennai
Chihuahua
Chile
Cleveland
Colombia
Columbus
Connecticut
Cracow
Czech
Dallas/Ft. Worth
Danish
Delaware
Edinburgh
Egypt
Finland
Florida
Front Range
FuZhou
Great Lakes
Greece
Hangzhou
Hawaii
HeFei
Houston
Hyderabad
Indonesia
Irish
Israel
Italian
Jinan
Kabul
Kansas City
Latvia
London
Madurai
Manchester
Mato Grosso
Melbourne
Minas Gerais
Minnesota
Montreal
Moscow
Mumbai
Munich
NEA
Netherlands
New England
New York City
New Zealand
NIT Hamirpur
Noroeste
Oklahoma City
Osnabrück
Peru
Philadelphia
Piaski
Pittsburgh
Porto Alegre
Puget Sound
Pune
Queensland
Research Triangle Park
Romania
Russia
San Antonio
San Diego
San Francisco
São Paulo
Scottish
Serbia
Shanghai
Shenzhen
Silicon Valley
Singapore
Slovak
South African
Southern Connecticut
St. Louis
Sweden
Switzerland
Sydney
Szczecin
Taiwan
Tecum
Thames Valley
Tokyo
Toronto
Trondheim
Tulsa
Turkey
Ukraine
University of Melbourne
Vale do Paraíba
Vancouver
Venezuela
Welsh - Cymru
Wisconsin
Xi'an
Subsites
Code Reviews
Code Repositories
Package Search
Bugster
Bugzilla
Test Machines
Planet
Mailing Lists
Elections & Polls
ARC Case Logs
Source Juicer
Package Factory
User Authentication