Regex – How to extract value between two strings with regexp_replace in Oracle

oracleplsqlregex

I want to replace the value found within REGEXP_REPLACE in Oracle. I used the regex101.com tool to debug my regular expression, and it's highlights the result as well, but if I put this expression into my select, it does nothing… it's not replacing the string I want. So, my question is, why is this pattern WRONG in Oracle PL/SQL?

select REGEXP_REPLACE('SOME XML DATAS', '/(?<=</FIRST_TAG>)(.*)(?=</LAST_TAG>)/s', '<REPLACE_TAG xsi:nil="1"/>') from dual

As you can see I search between two closing tags.

Sample XML:

<?xml version='1.0' encoding='UTF-8'?>
<LAST_TAG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:bla-bla-bla">
 <DEFAULT_LANGUAGE>en</DEFAULT_LANGUAGE>
 <DEBUG>0</DEBUG>
 <DEBUG_LEVEL>MEDIUM</DEBUG_LEVEL>
 <DEBUG_FILE>bla-bla.log</DEBUG_FILE>
 <FIRST_TAG>
  <LOGICAL_PRINTER>
   <ID>PRINTER1</ID>
   <PHYSICAL_PRINTER>Dummy_Printer</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT01</ID>
   <PHYSICAL_PRINTER>PRINT01</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT012</ID>
   <PHYSICAL_PRINTER>PRINT02</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT015</ID>
   <PHYSICAL_PRINTER>PRINT05</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT019</ID>
   <PHYSICAL_PRINTER>PRINT019</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT010</ID>
   <PHYSICAL_PRINTER>PRINT010</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT_ID01</ID>
   <PHYSICAL_PRINTER>\\111.111.111.111\PRINT011</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT_ID03</ID>
   <PHYSICAL_PRINTER>\\111.111.111.111\PRINT013</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT_ID04</ID>
   <PHYSICAL_PRINTER>\\111.111.111.111\PRINT014</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>PRINT_ID05</ID>
   <PHYSICAL_PRINTER>\\111.111.111.111\PRINT015</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>BUDCOLOR</ID>
   <PHYSICAL_PRINTER>\\url\printer</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
  <LOGICAL_PRINTER>
   <ID>NAMENAME</ID>
   <PHYSICAL_PRINTER>\\url\printer</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
 </FIRST_TAG>
 <LOCALE_LIST>
  <LOCALE>
   <DISPLAY_NAME>English (United States)</DISPLAY_NAME>
   <COUNTRY>US</COUNTRY>
   <LANGUAGE>en</LANGUAGE>
   <VARIANT xsi:nil="1"/>
  </LOCALE>
 </LOCALE_LIST>
</LAST_TAG>

Thank you very much!

Best Answer

The relevant section from the fine manual: Using Regular Expressions in Database Applications.

The idea is to match text around the text-to-be-replaced and use backreferences to match subexpressions in a replace string.

with data(id, str) as (
  select 1, 'foo is bar' from dual union all
  select 2, 'foo was bar' from dual union all
  select 3, 'foo might be or might not be bar' from dual
)
select 
 id
,str
,regexp_replace(str, '^(foo).+(bar)$', '\1***\2') as str2
from data
;

Another example with OP example data. Note the use of match parameter n:

'n' allows the period (.), which is the match-any-character character, to match the newline character. If you omit this parameter, then the period does not match the newline character.

The parameter is required because the string contains newline charaters. The following example will correctly eliminate LOCALE_LIST-element. Warning: please note manipulating stringified XML with a regular expression is not a recommended practice.

with data(id, str) as (
  select 1, q'[<?xml version='1.0' encoding='UTF-8'?>
<LAST_TAG xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:bla-bla-bla">
 <DEFAULT_LANGUAGE>en</DEFAULT_LANGUAGE>
 <FIRST_TAG>
  <LOGICAL_PRINTER>
   <ID>PRINTER1</ID>
   <PHYSICAL_PRINTER>Dummy_Printer</PHYSICAL_PRINTER>
  </LOGICAL_PRINTER>
 </FIRST_TAG>
 <LOCALE_LIST>
  <LOCALE>
   <DISPLAY_NAME>English (United States)</DISPLAY_NAME>
   <VARIANT xsi:nil="1"/>
  </LOCALE>
 </LOCALE_LIST>
</LAST_TAG>]' from dual
)
select 
 id
,regexp_replace(str, '(.+</FIRST_TAG>).+(</LAST_TAG>)', '\1***\2', 1, 1, 'n') as str2
from data
;