Extracting URLs from an Emacs buffer

elisp

How can I write an Emacs Lisp function to find all hrefs in an HTML file and extract all of the links?

Input:

<html>
 <a href="http://www.stackoverflow.com" _target="_blank">StackOverFlow</a>
 <h1>Emacs Lisp</h1>
 <a href="http://news.ycombinator.com" _target="_blank">Hacker News</a>
</html>

Output:

http://www.stackoverflow.com|StackOverFlow
http://news.ycombinator.com|Hacker News

I've seen the re-search-forward function mentioned several times during my search. Here's what I think that I need to do based on what I've read so far.

(defun extra-urls (file)
 ...
 (setq buffer (...
 (while
        (re-search-forward "http://" nil t)
        (when (match-string 0)
...
))

Best Answer

If there is at most one link per line and you don't mind some very ugly regular expression hacking, run the following code on your buffer:

(defun getlinks ()
  (beginning-of-buffer)
  (replace-regexp "^.*<a href=\"\\([^\"]+\\)\"[^>]+>\\([^<]+\\)</a>.*$" "LINK:\\1|\\2")
  (beginning-of-buffer)
  (replace-regexp "^\\([^L]\\|\\(L[^I]\\)\\|\\(LI[^N]\\)\\|\\(LIN[^K]\\)\\).*$" "")
  (beginning-of-buffer)
  (replace-regexp "
+" "
")
  (beginning-of-buffer)
  (replace-regexp "^LINK:\\(.*\\)$" "\\1")
)

It replaces all links with LINK:url|description, deletes all lines containing anything else, deletes empty lines, and finally removes the "LINK:".

Detailed HOWTO: (1) Correct the bug in your example html file by replacing <href with <a href, (2) copy the above function into Emacs scratch, (3) hit C-x C-e after the final ")" to load the function, (4) load your example HTML file, (5) execute the function with M-: (getlinks).

Note that the linebreaks in the third replace-regexp are important. Don't indent those two lines.

Related Solutions

(repeat-last-command) in Emacs

Repeat functionality is provided by the repeat.el Emacs Lisp package, which is included with standard Emacs distributions. From repeat.el's documentation:

This package defines a command that repeats the preceding command, whatever that was, including its arguments, whatever they were. This command is connected to the key C-x z. To repeat the previous command once, type C-x z. To repeat it a second time immediately after, type just z. By typing z again and again, you can repeat the command over and over.

To see additional information about the repeat command, type C-h F repeat RET from within Emacs.

How to emulate Vim’s * search in GNU Emacs

Based on your feedback to my first answer, how about this:

(defun my-isearch-word-at-point ()
  (interactive)
  (call-interactively 'isearch-forward-regexp))

(defun my-isearch-yank-word-hook ()
  (when (equal this-command 'my-isearch-word-at-point)
    (let ((string (concat "\\<"
                          (buffer-substring-no-properties
                           (progn (skip-syntax-backward "w_") (point))
                           (progn (skip-syntax-forward "w_") (point)))
                          "\\>")))
      (if (and isearch-case-fold-search
               (eq 'not-yanks search-upper-case))
          (setq string (downcase string)))
      (setq isearch-string string
            isearch-message
            (concat isearch-message
                    (mapconcat 'isearch-text-char-description
                               string ""))
            isearch-yank-flag t)
      (isearch-search-and-update))))

(add-hook 'isearch-mode-hook 'my-isearch-yank-word-hook)

Best Answer

Related Solutions

(repeat-last-command) in Emacs

How to emulate Vim’s * search in GNU Emacs

Related Topic