My first question here, would be awesome to find an answer. I am new to using nokogiri.
Here is my problem. I have something like this in the HTML head on a target site (here a techcrunch post):
<meta content="During my time at TechCrunch I've seen thousands of startups and written about hundreds of them. I sure as hell don't know all ..." name="description"/>
I would now like to have a script to run through the meta tags, locate the one with the name attribute "description" and get what is in the content attribute.
I have tried something like this
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.techcrunch.com/2009/10/11/the-underutilized-power-of-the-video-demo-to-explain-what-the-hell-you-actually-do/"
doc = Nokogiri::HTML(open(url))
posts = doc.xpath("//meta")
posts.each do |link|
a = link.attributes['name']
b = link.attributes['content']
end
after which I could select the link where the attribute name is equal to description – but this code returns nil for a and b.
I played around with
posts = doc.xpath("//meta")
, posts = doc.xpath("//meta/*")
, etc. but still nil.
Best Answer
The problem is not with the xpath, as it seems the document does not parse. You can check that with
puts doc
, it does not contain the full input. It seems to be a problem with parsing comments (I suspect either invalid HTML or a bug in libxml2).In your case I would use a regular expression as workaround. Given that
<meta>
tags are simple enough that might work, eg/<meta name="([^"]*)" content="([^"]*)"/