Optimized Page Fetching
The OpenConnector SynchronizationService now includes an optimized page fetching implementation that dramatically improves performance when synchronizing paginated data sources by eliminating recursive overhead.
Overview
The original pagination implementation used recursive function calls which created significant overhead and performance bottlenecks. Each page fetch required a new function call stack, leading to inefficient memory usage and slower processing.
The optimized fetching implementation addresses this by:
- Iterative processing: Simple loop-based approach eliminates recursive overhead
- Efficient pagination detection: Smart detection of pagination patterns without extra API calls
- Reduced function call overhead: Single method handles all page fetching
- Performance monitoring: Detailed timing metrics for optimization analysis
Performance Impact
Before Optimization (Recursive)
- 21 objects across 3 pages: ~2,600ms fetch time
- Recursive function calls: Each page required a new function call stack
- Memory overhead: Inefficient memory usage with deep call stacks
After Optimization (Iterative)
- Expected improvement: 30-50% reduction in fetch time
- Simple loop processing: Single function handles all pages iteratively
- Reduced overhead: Eliminates recursive function call overhead
Technical Implementation
Core Methods
fetchAllPages()
Main entry point that uses optimized iterative processing instead of recursive calls.
private function fetchAllPages(
Source $source,
string $endpoint,
array $config,
Synchronization $synchronization,
int $currentPage,
bool $isTest = false,
?bool $usesNextEndpoint = null,
?bool $usesPagination = true
): array
fetchAllPagesOptimized()
Implements the optimized iterative fetching strategy:
- Uses a simple for loop instead of recursive calls
- Fetches pages one by one with minimal overhead
- Efficiently determines next page information
- Combines results without function call overhead
getNextPageInfo()
Determines next page URL and configuration based on pagination pattern analysis.
fetchSinglePage()
Handles individual page fetching with proper error handling and response parsing.
Pagination Support
The system supports multiple pagination patterns:
Next Endpoint URLs
{
'next': 'https://api.example.com/data?page=2',
'results': [...]
}
Page Number Parameters
{
'page': 1,
'total_pages': 5,
'results': [...]
}
Error Handling and Fallbacks
Automatic Fallback
When parallel fetching fails, the system automatically falls back to sequential processing:
try {
return $this->fetchAllPagesParallel(...);
} catch (\Exception $e) {
error_log('Parallel page fetching failed, falling back to sequential: ' . $e->getMessage());
return $this->fetchAllPagesSequential(...);
}
Timeout Protection
- 30-second timeout for parallel operations
- 50-page safety limit to prevent infinite loops
- Rate limit detection and handling
Configuration
Automatic Detection
The system automatically detects:
- Pagination method (next URLs vs page numbers)
- Total page count
- API response patterns
Safety Limits
- Maximum 50 pages per synchronization
- 30-second timeout for parallel operations
- Minimum 10 objects per page assumption for last page detection
Performance Monitoring
Timing Metrics
The system now includes detailed timing information:
{
'timing': {
'stages': {
'fetch_objects': {
'duration_ms': 800,
'description': 'Fetching objects from external source (with parallel page fetching)',
'objects_fetched': 21,
'rate_limited': false,
'fetch_method': 'parallel_optimized'
}
}
}
}
Performance Analysis
- Objects per second: Calculated throughput metric
- Efficiency ratio: Comparison of processing vs fetch time
- Slowest stage identification: Automatic bottleneck detection
Best Practices
API Considerations
- Rate limiting: Ensure your API can handle concurrent requests
- Connection limits: Consider server connection pool limits
- Caching: Implement appropriate caching strategies
Monitoring
- Watch timing metrics: Monitor 'fetch_objects' duration improvements
- Check error logs: Look for fallback activations
- Analyze patterns: Identify optimal pagination sizes
Troubleshooting
- High timeout rates: May indicate API rate limiting
- Frequent fallbacks: Could suggest network issues
- Inconsistent performance: Check for variable page sizes
Expected Results
Performance Improvements
For the example case (21 objects, 3 pages):
- Before: 2,600ms fetch time
- Expected after: 1,800-2,000ms fetch time
- Improvement: 25-35% reduction
Scalability Benefits
- Linear improvement: More pages = greater time savings
- Network optimization: Better bandwidth utilization
- Resource efficiency: Reduced total synchronization time
Compatibility
Supported APIs
- JSON APIs with standard pagination
- REST APIs with next/previous links
- APIs with page number parameters
Fallback Support
- XML APIs: Automatic fallback to sequential
- Custom pagination: Graceful degradation
- Rate-limited APIs: Intelligent retry handling
Migration Notes
Automatic Activation
- No configuration required: Parallel fetching is enabled by default
- Transparent operation: Existing synchronizations work unchanged
- Backward compatibility: Full support for existing configurations
Monitoring Migration
- New timing fields: Additional metrics in synchronization logs
- Performance baselines: Establish new performance expectations
- Error monitoring: Watch for new error patterns during transition