I just finished tweaking Sphinx (and bunch of other stuff) for a client and sphinx was being terrible with full indexing! So bad on the first time that it brought the site down for a good 15 minutes and maybe another half hour of crawling speeds on the first run. This client’s db is bigger than the average, but something definitely not gigantic either. I write this little helper so others can benefit as well.
The pull query creates a gigantic lock on several tables. And once the pull is complete, it creates a giant resource hog, maxing out the drive utilization writing all the indexing results.
Solution to making sphinx run well came in 4 parts.
1. Removing table locks
InnoDB to the rescue! Make sure you are running innodb on all referenced tables, not just the key ones. As myisam uses table locks, the pulling of the data can cause a lock that prevents all other actions by your forum.
2. Using Ranged Queries
Rather than pulling everything at once and killing your ram, disk, locks, etc., you can slice the query down to smaller size.
This is done by adding range metrics
sql_query_range = select min(pid), max(pid) from posts sql_range_step = 1000
And adding limits through WHERE clause in the query.
WHERE p.pid >= $start AND p.pid <= $end
The same can be done for tags and archive posts. Though your delta query is likely small enough that it will complete in 1 step.
Setting the range step to 1000 limits the query to do at most 1000 queries at a time. Between this time, other queries (not related to this) can be executed and reduce your queue load. If you look at the WHERE query, you’ll see notice it’s a fixed number variable rather than a LIMIT clause. So, even if there are zero results in this set, it will still count as one.
The amount of time spent between these queries are set through
sql_ranged_throttle = 200 #in milliseconds
It’s worth noting that ranged queries should use primary keys as their limiters and not other key types as databases tend to be clumped by primary key. So fetching a 1000 consecutive rows will be lot easier than fetching 1000 rows spread out.
You can see the full example here: http://pastebin.com/Sm3Y8qEg
Please do not copy/paste that in to your configuration and expect it to work. It’s just a sample devoid of any possible db prefixes, connection info.
3. Limiting Indexer
The indexer itself also writes a ton. Plain lot of data. It can be as big or even a lot bigger than your entire database it’s indexing. Even if you are using an SSD writing too much data too fast can cause your system to effectively stall even if it doesn’t max out your IOPS. For example, even if your SSD can do 100k+ IOPS, it may still be able to only write 100MB/s. So, we limit the indexer’s performance.
max_iops = 50 max_iosize = 1048576
By creating a chunk of single IO operation to be max of 1MB and set to maximum of 50 IO operation per second, we effectively create a limit of 50MB/s write limit. By doing so, your disk is still available to serve other needs at the same time.
4. Wildcard support.
Simply the lack there of.
1 2 3
#infix_fields = post, title #min_infix_len = 3 #enable_star = 1
In IPB ACP Sphinx settings, there is an option to support wildcards. While great to have, the amount of difference in performance this changes is extreme. If the database is big enough, they need to be disabled.
It’s also worth noting that enable_star option is now considered deprecated. And if you are using a new sphinx version (2.2x or higher), it may not be an existing option entirely. It’s just on all the time.
Small side note for others with really large number of forums.
You’ll need to increase max_filter_values as each of your forum will be added to the query. This limiter is just anti-crazy request. So, even if set too high, it doesn’t negatively affect you unless you happen to have a bad request flying out.
max_filter_values = 10000
This goes inside the searchd config.
Hope that helps!