Oracle Text is a feature of the Oracle database that allows developers to index large portions of data and provide users a search-engine like view into that data. It comes with a powerful set of functionality that allows the experienced or knowledgeable user to craft very specific queries. But what about everybody else? Most users want the search functionality to be basic and/or follow what they have become accustom to with the various online-search engines.
Today I will show you how to provide a sub-set of Oracle text features (specifically using the context index and contains clause) that mimic the major features of most online search tools. This will allow you to use Oracle Text in your application without forcing users into a new set of search expressions.
Before we get started with the search expressions, we need to build and populate a test table. This test table uses a clob, but you could also index a blob column or search against a multi-column index like the one I detail here.
Today I will show you how to provide a sub-set of Oracle text features (specifically using the context index and contains clause) that mimic the major features of most online search tools. This will allow you to use Oracle Text in your application without forcing users into a new set of search expressions.
Before we get started with the search expressions, we need to build and populate a test table. This test table uses a clob, but you could also index a blob column or search against a multi-column index like the one I detail here.
create table simple_search_ex
(
ident number,
content clob, );
insert into simple_search_ex (ident, content) values (1, 'This is the first row, it is red');
insert into simple_search_ex (ident, content) values (2, 'This is the second row, it is blue');
insert into simple_search_ex (ident, content) values (3, 'This is the third row, it is red');
insert into simple_search_ex (ident, content) values (4, 'This is the fourth row, it is green');
content clob, );
insert into simple_search_ex (ident, content) values (1, 'This is the first row, it is red');
insert into simple_search_ex (ident, content) values (2, 'This is the second row, it is blue');
insert into simple_search_ex (ident, content) values (3, 'This is the third row, it is red');
insert into simple_search_ex (ident, content) values (4, 'This is the fourth row, it is green');
Now that we have the table built, we need to add the Oracle text index. I'm going to use the sync on commit parameter so that updating the column will update the index
create index simple_search_ex_idx on simple_search_ex(content)
indextype is ctxsys.context PARAMETERS ('SYNC ( ON COMMIT )');
indextype is ctxsys.context PARAMETERS ('SYNC ( ON COMMIT )');
Once we have the test data setup, we can get down to business. Oracle Text has a ton of complex search options and operators that you can exploit (see the documentation for the full list). To keep it simple for my users, I'm going to stick with the basics. My search will allow the user of "AND", "OR", "NOT", "()" and expressions in double quotes or surrounded by braces "{}" will be evaluated as a single word. Finally spaces will translate to "AND" in my algorithm.
Limiting myself to the simple options means we can transform the user's input into the appropriate expression for the CONTAINS clause with a few regexp_replace calls. We simply want to transform the user's English search expression into one that Oracle Text will allow for. I've commented the code below, the only really tricky part is ensuring that we do not replace characters inside of quotes. That requires a few extra replacements to work properly.
Before we can create the function, we must create a pl/sql table to hold the tokenized input:
create or replace type vc_lst as table of varchar2(4000);
Now we can create the function:
create or replace function translate_search_expression(p_input varchar2) return varchar2 is
v_procedure_c constant varchar2(30) := 'translate_search_expression';
v_location_i integer; -- This variable tracks our location within this procedure.
v_input varchar2(31000);
v_word_lst vc_lst := vc_lst();
v_return varchar2(31000);
l_n number;
begin
-- Set Tracing Information.
v_location_i := 500;
dbms_application_info.set_client_info('Session="'||v('APP_SESSION')||'"');
dbms_output.put_line(v_procedure_c);
v_location_i := 1000;
v_return := p_input;
v_location_i := 1500;
--Replace "some text" with {some text}
v_return := regexp_replace(v_return, '"(.*?)"', '{\1}',1,0,'i');
v_return := regexp_replace(v_return, '"', '', 1,0,'i');
v_location_i := 1600;
--Replace ( and ) with ' ( ' and ' ) ' if not wrapped in spaces
v_return := regexp_replace(v_return, '([^\s])\(([^\s])', '\1 ( \2',1,0);
v_return := regexp_replace(v_return, '([^\s])\)([^\s])', '\1 ) \2',1,0);
v_location_i := 2000;
--Split the search expression on spaces
loop
l_n := instr( v_return, ' ' );
exit when (nvl(l_n,0) = 0);
v_word_lst.extend;
v_word_lst( v_word_lst.count ) := ltrim(rtrim(substr(v_return,1,l_n-1)));
v_return := substr( v_return, l_n+1 );
end loop;
--Replace non-supported oracle text whitespace or operators
if (v_word_lst.count > 0) then
for i in v_word_lst.first..v_word_lst.last loop
--Underscore is seen as a wildcard usually, wrap it so it is literal
if (instr(v_word_lst(i), '_') > 0) then
v_word_lst(i) := '{'||v_word_lst(i)||'}';
--the dash (-) is normally minus (lowers score if second word exists)
elsif (instr(v_word_lst(i), '-') > 0) then
v_word_lst(i) := '{'||v_word_lst(i)||'}';
--ensure the comma is literal, not accumulate
elsif (instr(v_word_lst(i), ',') > 0) then
v_word_lst(i) := '{'||v_word_lst(i)||'}';
--ensure equal sign is literal not equivilance
elsif (instr(v_word_lst(i), '=') > 0) then
v_word_lst(i) := '{'||v_word_lst(i)||'}';
--ensure semi-colon is literal, not near
elsif (instr(v_word_lst(i), ';') > 0) then
v_word_lst(i) := '{'||v_word_lst(i)||'}';
--ensure explination point is literal not soundex
elsif (instr(v_word_lst(i), '!') > 0) then
v_word_lst(i) := '{'||v_word_lst(i)||'}';
--ensure dolar sign is literal not stem
elsif (instr(v_word_lst(i), '$') > 0) then
v_word_lst(i) := '{'||v_word_lst(i)||'}';
end if;
end loop;
end if;
v_location_i := 3000;
--Put string back together with spaces
v_return := '';
if (v_word_lst is not null and v_word_lst.count > 0) then
for i in v_word_lst.first..v_word_lst.last loop
v_return := v_return || ' ' || v_word_lst(i);
end loop;
--trim off excess delimiters at beginning and end
v_return := ltrim(rtrim(v_return, ' '), ' ');
end if;
v_location_i := 3500;
--Replace keyword " AND " with & unless it exists inside of {}
v_return := regexp_replace(v_return, '(\{.*?\})|\s+AND\s+|&', '\1&',1,0, 'i');
--Now replace "(&" with " (" and "&)" with ") " and remove any leading or trailing spaces from the string
v_return := regexp_replace(v_return, '\s*\((&|\s)*', ' (',1,0);
v_return := regexp_replace(v_return, '(&|\s)*\)\s*', ') ',1,0);
v_return := ltrim(rtrim(v_return, ' '));
--dbms_output.put_line('after & replacement: '||v_return);
--Replace keyword " OR " with | unless it is inside {}
v_return := regexp_replace(v_return, '(\{.*?\})|\s+OR\s+|\|', '\1|',1,0, 'i');
--dbms_output.put_line('after | replacement: '||v_return);
--Replace keyword " NOT " with ~ unless it is inside {}
v_return := regexp_replace(v_return, '(\{.*?\})|\s+NOT\s+|~', '\1~',1,0, 'i');
--dbms_output.put_line('after ~ replacement: '||v_return);
--Now replace any remaining spaces with &, unless they are inside {}
v_return := regexp_replace(v_return, '(\{.*?\})|\s+', '\1&',1,0, 'i');
--dbms_output.put_line('after " " replacement: '||v_return);
--Here we must cleanup the condition where we'll get }~|& due to above regex
v_return := regexp_replace(v_return, '}[&~\|]+&', '}&',1,0,'i');
--dbms_output.put_line('after && replacement: '||v_return);
--Here we must cleanup the condition where we'll get }~| due to above regex
v_return := regexp_replace(v_return, '}[~\|]+\|', '}|',1,0,'i');
--In some cases we'll see &|& or &~&, so fix those
v_return := regexp_replace(v_return, '&\|&', '|',1,0,'i');
v_return := regexp_replace(v_return, '&~&', '~',1,0,'i');
--In some cases we'll see &&( or )&&, so fix those
v_return := regexp_replace(v_return, '&&\(', '&(',1,0,'i');
v_return := regexp_replace(v_return, '\)&&', ')&',1,0,'i');
--In some cases we'll see |&( or )&|, so fix those
v_return := regexp_replace(v_return, '\|&\|\(', '|(',1,0,'i');
v_return := regexp_replace(v_return, '\)&\|', ')|',1,0,'i');
--In some cases we'll see ~&( or )&~, so fix those
v_return := regexp_replace(v_return, '~&\(', '~(',1,0,'i');
v_return := regexp_replace(v_return, '\)&~', ')~',1,0,'i');
--Finally ensure we don't have any operatiors at the beginning or end and return the expression
v_return := regexp_replace(v_return, '^[~&\|]+', '',1,1,'i');
v_return := regexp_replace(v_return, '[~&\|]+$', '',1,1,'i');
return v_return;
EXCEPTION
when OTHERS then
return p_input;
END translate_search_expression;
/
/
The following examples demonstrate how this code will translate your users' search expressions into the actual expression.
SQL> select translate_search_expression('hello world') results from dual;
RESULTS
--------------------------------------------------------------------------------
hello&world
SQL> select translate_search_expression('"hello world"') results from dual; RESULTS -------------------------------------------------------------------------------- {hello world}
SQL> select translate_search_expression('this~(is history) OR demo') results from dual; RESULTS -------------------------------------------------------------------------------- this~(is&history)|demo
SQL> select translate_search_expression('"hello world"') results from dual; RESULTS -------------------------------------------------------------------------------- {hello world}
SQL> select translate_search_expression('this~(is history) OR demo') results from dual; RESULTS -------------------------------------------------------------------------------- this~(is&history)|demo
To use this in your query, simply use the results of the function as the search expression portion of the CONTAINS clause like so:
SQL> select ident, content
from simple_search_ex
where contains(content, translate_search_expression('blue or green'), 1);
IDENT CONTENT --------------- ---------------------------------------------- 2 This is the second row, it is blue 4 This is the fourth row, it is green
IDENT CONTENT --------------- ---------------------------------------------- 2 This is the second row, it is blue 4 This is the fourth row, it is green
Now you should be able to fully enable Oracle Text searches in your pl/sql or Apex application while allowing your users to search as they are used-to, while still providing the power of AND/OR/NOT expressions. I hope you have found this useful.
No comments:
Post a Comment
Please keep comments and discourse respectful and on topic. We all need to do our part to ensure courteous discourse on the web.