Graham (shadesogrey) wrote in mysql,
Graham
shadesogrey
mysql

Efficient full-text searchs on large sets of data

A database application I've written uses a table with around 600,000 rows. Each row has a text field 500-5000 characters long. I periodically need to find all the rows containing a particular phone number, name, or address, ie.'123-4567', 'john smith', '1950 Main St N'

I'm doing this using
SELECT * FROM `tb_archive` WHERE `text` LIKE '%john smith%' ORDER BY `date` DESC

The problem is that it is too slow. Most searches take 30-60 seconds. If multiple searches are done the server response begins to slow to a crawl for other users.

I've looked at mysql built in full-text indexing - but I'm not sure if it can work since I only need exact matches, don't care about relavence, and often search for numbers and short words.

Any suggestions as to how I can do this more efficiently?
  • Post a new comment

    Error

    default userpic
  • 3 comments
I'm no MySQL optimization expert, but it almost sounds like you need a complete restructuring. Breaking out the phone, name, address, etc to their own fields, and searching for that specific one.

At that point, you can 'SELECT `date`, `name` FROM `tb_archive` WHERE `name` LIKE '%john smith%' ORDER BY `date` DESC"

Excluding the giant `text` field in the results would also speed up returns at that point.

It would take a lot of work, but it'd be worth it in the long run, IMHO.
It's slow because there's no index that can be used. Fulltext indexing can be used to get exact matches only. You can change the default of 4 characters being the minimum size of a "word" if you want.

It won't help with numbers, but it will make most of your searches faster.

Without having seen what you're trying to do, I'd say that what you REALLY want to do is have more structured data. Why are you putting people's demographic information all mushed together in a text field instead of having separate fields for name, address and phone number? You can still do efficient searches by doing a little bit of intelligent parsing and using UNION queries....

You can make the fields varchar in those cases, instead of using a text field. See http://forums.mysql.com/read.php?24,105964,105964 and an article I wrote, http://www.pythian.com/news/7129/text-vs-varchar/ for more info on the difference.

As an example, if the search consists of a number it's either a phone number or the numerical part of an address (house number). You don't need to search the "name" field for that, so you can search the fields for phone number and address only. If you are looking for a string it's either a name or part of the address, in which case you can use fulltext search.

If you insist on keeping your structure, look into using a good fulltext search such as Sphinx. It's free, designed for use with MySQL and super fast.

http://sphinxsearch.com/
5-minute lightning talk on Sphinx: http://www.youtube.com/watch?v=H1qkxPzdMM8
25-minute presentation on Sphinx: http://www.youtube.com/watch?v=KY_EKcuboaI