Php – How would you go about making a search algorithm for a CRM

algorithmsMySQLPHPsearch

I know this question is kind of broad, but all I really need is some best-practice code structure or a link to a good tutorial.

I am working on a CRM that runs on php and mysql. Currently, our search query looks something like this:

SELECT * FROM contacts WHERE name LIKE '%ric flair%'

Now, that's really dumbed down, but that's what it amounts to. In fact, we are using prepared statements, firing off several 30+ line queries, and joining lots of tables to verify ownership and all the rest.

But if the user types rick flair or ric m flair, the search query will not find him. Now, I could split the string into three search terms, %ric%, %m%, %flair%, and work from there, but then I'm going to get every guy named Ric (and Rick for that matter), and then Ric Flair could be anywhere among the search results.

I feel like I'm doing this like an amateur, and when I look up info about search algorithms, all I can find is people wanting to be Google.

Any pro advice on this would be appreciated.

To Clarify

This is for a global search function that looks through contacts, phones, emails, tasks, sales opportunities, conversations, and notes, looking for anything that has or is connected to your search string (in this case, Ric Flair).

The main goal here is to have a "smarter" search, such that if you type rick flair or ric m. flair, it will still have a good chance of returning the contact ric flair at the top, even though you might have people named Rick in you contact list. Maybe that's because the last name matched, and that's more important? Or am I overthinking this?

Best Answer

You may want to have a look at Lucene. It's written in Java but you can use it from PHP through either its Zend port or Solr, although I think Solr might not be a solution since you probably want something embedded within your CMS.

The idea is that you can build an index from all the searchable data and then search that index. It has the advantage that it can be much faster than database queries. You can also implement a scoring system, so that some records show up higher in the results list if they are more relevant to the user. One possible downside is that you have to update the index each time the data is updated.

If the Lucene approach doesn't fit your scenario, you could go with a mix of PHP and database code. A view or a stored procedure could act as a surrogate index, aggregating the searchable data. You'd need to split strings by whitespaces and this could make it rather slow, but will get you there. The PHP code will be responsible for building the WHERE clause (you can have a look at how Solr implements query strategies for suggestions as source of inspiration). Assuming you have Id, Name and Score as index fields and two contact records with Id=1, FirstName="Ric", LastName="Flair" and Id=2, FirstName="Ric", LastName="Flare" the index records would be looking something like this:

+--------------+-------+
| Id | Name    | Score |
+----+---------+-------+
| 1  | Ric     |   1   |
+----+---------+-------+
| 1  | Flair   |   1   |
+----+---------+-------+
| 2  | Ric     |   1   |
+----+---------+-------+
| 2  | Flare   |   1   |
+----+---------+-------+

An example could look like this (SQL Server):

DECLARE @index TABLE (
    Id INT NOT NULL,
    Name NVARCHAR(50) NOT NULL,
    Score INT NOT NULL
)

INSERT INTO @index (Id, Name, Score) VALUES (1, 'Ric', 1)
INSERT INTO @index (Id, Name, Score) VALUES (1, 'Flair', 1)
INSERT INTO @index (Id, Name, Score) VALUES (2, 'Ric', 1)
INSERT INTO @index (Id, Name, Score) VALUES (2, 'Flare', 1)

SELECT
    Id,
    SUM(Score) AS Score
FROM
    @index
WHERE
    Name = 'Ric'
    OR Name = 'Flair'
GROUP BY
    Id
ORDER BY
    Score DESC

Ric Flair would have a higher score since it matches on both values, thus popping first in the search results.

The index could also contain title and summary fields to be used as values to be displayed in the search results page. Or you could join the results with a view that preselects those values.

You can toy around with the conditions in the WHERE clause or with the Score field (you could give different scores to different type of records or properties) to get a more refined search experience.

Related Topic