guides / advanced-search

Advanced Search

Master gnaw's advanced search capabilities for complex queries and large codebases.

Natural Language Search

Chat Mode

Use natural language to describe your search:

gnaw --chat "Find all error messages in log files"
```bash Terminal gnaw --chat "Find all lines containing the word ERROR in any .log file in the logs directory" ```
Searching for error messages in log files...
Found 3 matches in 2 files:
- logs/app.log:2 matches
- logs/error.log:1 match

Dry Run

See what command would be generated:

gnaw --chat "Show me lines with WARNING in server.log" --dry-run

AI-Powered Semantic Search

Agent Mode Setup

First, build an index of your codebase:

gnaw agent index build

Semantic Queries

Search for code by meaning, not just text:

# Find authentication-related code
gnaw agent ask "authentication functions"

# Find error handling patterns
gnaw agent ask "error handling" --type rs

# Search in specific directory
gnaw agent ask "database queries" --dir src/db
```bash Terminal gnaw agent ask "authentication functions" ```
Query: authentication functions

→ src/auth.rs (score: 0.95)
  Lines 15-25
  • Function definition matches query
  • Contains authentication keywords
  • High semantic similarity
  pub fn authenticate_user(token: &str) -> Result<User, AuthError> {
      // Authentication logic here
  }

Advanced Agent Options

# JSON output for programmatic use
gnaw agent ask "API endpoints" --json

# Filter by file type
gnaw agent ask "error handling" --type rs,go,js

# Limit results
gnaw agent ask "database queries" --limit 5

# Search in specific directory
gnaw agent ask "authentication" --dir src/auth

Complex Pattern Matching

Extended Regular Expressions

# Match multiple patterns
gnaw -E "ERROR|WARN|FATAL" test.log

# Character classes
gnaw -E "[0-9]{4}-[0-9]{2}-[0-9]{2}" dates.txt

# Quantifiers
gnaw -E "a{2,4}" text.txt

Advanced Regex Patterns

# Email addresses
gnaw -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" emails.txt

# IP addresses
gnaw -E "([0-9]{1,3}\.){3}[0-9]{1,3}" network.log

# URLs
gnaw -E "https?://[^\s]+" urls.txt

Multi-line Patterns

# Find function definitions with context
gnaw -A 5 "def " *.py

# Find error blocks
gnaw -B 2 -A 2 "ERROR" logs/

Performance Optimization

Streaming Large Files

# Stream results for large files
gnaw --stream "pattern" large_file.log

# Process in chunks
gnaw "pattern" large_file.log | head -100

Parallel Processing

gnaw automatically uses all available CPU cores. For optimal performance:

# Check CPU usage
htop

# Monitor performance
RUST_LOG=debug gnaw "pattern" large_file.log

Memory Management

# Limit memory usage for very large files
gnaw "pattern" huge_file.log | head -1000

Advanced Output Options

Custom Formatting

# Show filename and line number
gnaw -Hn "pattern" file.txt

# Show only matching parts
gnaw -o "pattern" file.txt

# Invert match
gnaw -v "pattern" file.txt

JSON with Custom Fields

gnaw --json "pattern" file.txt | jq '.results[] | {file: .file_name, line: .line_number, content: .line}'

Integration Patterns

Shell Scripts

#!/bin/bash
# Advanced log analysis
ERROR_COUNT=$(gnaw --raw -c "ERROR" /var/log/app.log)
WARN_COUNT=$(gnaw --raw -c "WARN" /var/log/app.log)

if [ "$ERROR_COUNT" -gt 10 ]; then
    echo "High error count: $ERROR_COUNT"
    gnaw -C 2 "ERROR" /var/log/app.log | mail -s "High Error Count" admin@example.com
fi

CI/CD Integration

#!/bin/bash
# Check for security issues
if gnaw -l "password.*=" src/; then
    echo "Potential security issue: hardcoded passwords"
    exit 1
fi

# Check for TODO comments
TODO_COUNT=$(gnaw --raw -c "TODO" src/)
if [ "$TODO_COUNT" -gt 5 ]; then
    echo "Too many TODO comments: $TODO_COUNT"
    exit 1
fi

Monitoring Scripts

#!/bin/bash
# Real-time error monitoring
tail -f /var/log/app.log | gnaw --stream "ERROR" | while read line; do
    echo "$(date): $line" >> /var/log/errors.log
    # Send alert
    curl -X POST -H "Content-Type: application/json" \
         -d '{"text":"Error detected: '$line'"}' \
         https://hooks.slack.com/services/YOUR/WEBHOOK/URL
done

Advanced Configuration

Custom Configuration

# gnaw.toml
[regex.complex]
size_limit_mb = 100
dfa_size_limit_mb = 100
nest_limit = 200

[tuning]
tiny = "64 KiB"
small = "8.6 MiB"
medium = "256 MiB"
large = "2 GiB"

Environment Variables

# Set log level
RUST_LOG=debug gnaw "pattern" file.txt

# Set thread count
RAYON_NUM_THREADS=4 gnaw "pattern" file.txt

Troubleshooting Advanced Searches

- Ensure index is built: `gnaw agent index status` - Rebuild index if needed: `gnaw agent index build` - Check available memory - Simplify patterns when possible - Use fixed string search for exact matches - Consider breaking complex patterns into simpler ones - Reduce `io_chunk_size_bytes` in configuration - Use streaming output for large files - Process files in smaller chunks

Best Practices

  1. Use Semantic Search for Code Understanding: Agent mode is best for understanding large codebases
  2. Combine Traditional and AI Search: Use traditional search for simple patterns, AI search for complex queries
  3. Optimize for Your Use Case: Configure gnaw based on your typical file sizes and patterns
  4. Monitor Performance: Use profiling tools to identify bottlenecks
For best results, combine traditional pattern matching with AI-powered semantic search based on your specific needs.