Regex – How to find an element by matching exact text of the element in Capybara

capybararegexruby

I have following two elements in HTML

<a href="/berlin" >Berlin</a>
<a href="/berlin" >Berlin Germany </a>

I am trying to find the element by using following Capybara method

find("a", :text => "berlin")

Above will return two elements because both contains text berlin.

Is there a way to match exact text in Capybara ?

Best Answer

Use a regexp instead of a string for the value of the :text key:

find("a", :text => /\ABerlin\z/)

Check out the 'Options Hash' section of the Method: Capybara::Node::Finders#all documentation.

PS: text matches are case sensitive. Your example code actually raises an error:

find("a", :text => "berlin")
# => Capybara::ElementNotFound:
#    Unable to find css "a" with text "berlin"

Edit: 2019-09-10

As you can see the way to iterate over multiple matches was not very intuitive. This lead to the proposal of the String.prototype.matchAll method. This new method is expected to ship in the ECMAScript 2020 specification. It gives us a clean API and solves multiple problems. It has been started to land on major browsers and JS engines as Chrome 73+ / Node 12+ and Firefox 67+.

The method returns an iterator and is used as follows:

const string = "something format_abc";
const regexp = /(?:^|\s)format_(.*?)(?:\s|$)/g;
const matches = string.matchAll(regexp);
    
for (const match of matches) {
  console.log(match);
  console.log(match.index)
}

As it returns an iterator, we can say it's lazy, this is useful when handling particularly large numbers of capturing groups, or very large strings. But if you need, the result can be easily transformed into an Array by using the spread syntax or the Array.from method:

function getFirstGroup(regexp, str) {
  const array = [...str.matchAll(regexp)];
  return array.map(m => m[1]);
}

// or:
function getFirstGroup(regexp, str) {
  return Array.from(str.matchAll(regexp), m => m[1]);
}

In the meantime, while this proposal gets more wide support, you can use the official shim package.

Also, the internal workings of the method are simple. An equivalent implementation using a generator function would be as follows:

function* matchAll(str, regexp) {
  const flags = regexp.global ? regexp.flags : regexp.flags + "g";
  const re = new RegExp(regexp, flags);
  let match;
  while (match = re.exec(str)) {
    yield match;
  }
}

A copy of the original regexp is created; this is to avoid side-effects due to the mutation of the lastIndex property when going through the multple matches.

Also, we need to ensure the regexp has the global flag to avoid an infinite loop.

I'm also happy to see that even this StackOverflow question was referenced in the discussions of the proposal.

Python – Using BeautifulSoup to find a HTML tag that contains certain text

from BeautifulSoup import BeautifulSoup
import re

html_text = """
<h2>this is cool #12345678901</h2>
<h2>this is nothing</h2>
<h1>foo #126666678901</h1>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>
"""

soup = BeautifulSoup(html_text)


for elem in soup(text=re.compile(r' #\S{11}')):
    print elem.parent

Prints:

<h2>this is cool #12345678901</h2>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>

Best Answer

Related Solutions

Javascript – How to access the matched groups in a JavaScript regular expression

Edit: 2019-09-10

Python – Using BeautifulSoup to find a HTML tag that contains certain text

Related Topic