Try tail:
tail -n +2 "$FILE"
-n x
: Just print the last x
lines. tail -n 5
would give you the last 5 lines of the input. The +
sign kind of inverts the argument and make tail
print anything but the first x-1
lines. tail -n +1
would print the whole file, tail -n +2
everything but the first line, etc.
GNU tail
is much faster than sed
. tail
is also available on BSD and the -n +2
flag is consistent across both tools. Check the FreeBSD or OS X man pages for more.
The BSD version can be much slower than sed
, though. I wonder how they managed that; tail
should just read a file line by line while sed
does pretty complex operations involving interpreting a script, applying regular expressions and the like.
Note: You may be tempted to use
# THIS WILL GIVE YOU AN EMPTY FILE!
tail -n +2 "$FILE" > "$FILE"
but this will give you an empty file. The reason is that the redirection (>
) happens before tail
is invoked by the shell:
- Shell truncates file
$FILE
- Shell creates a new process for
tail
- Shell redirects stdout of the
tail
process to $FILE
tail
reads from the now empty $FILE
If you want to remove the first line inside the file, you should use:
tail -n +2 "$FILE" > "$FILE.tmp" && mv "$FILE.tmp" "$FILE"
The &&
will make sure that the file doesn't get overwritten when there is a problem.
You can access capturing groups like this:
var myString = "something format_abc";
var myRegexp = /(?:^|\s)format_(.*?)(?:\s|$)/g;
var myRegexp = new RegExp("(?:^|\s)format_(.*?)(?:\s|$)", "g");
var match = myRegexp.exec(myString);
console.log(match[1]); // abc
And if there are multiple matches you can iterate over them:
var myString = "something format_abc";
var myRegexp = new RegExp("(?:^|\s)format_(.*?)(?:\s|$)", "g");
match = myRegexp.exec(myString);
while (match != null) {
// matched text: match[0]
// match start: match.index
// capturing group n: match[n]
console.log(match[0])
match = myRegexp.exec(myString);
}
Edit: 2019-09-10
As you can see the way to iterate over multiple matches was not very intuitive. This lead to the proposal of the String.prototype.matchAll
method. This new method is expected to ship in the ECMAScript 2020 specification. It gives us a clean API and solves multiple problems. It has been started to land on major browsers and JS engines as Chrome 73+ / Node 12+ and Firefox 67+.
The method returns an iterator and is used as follows:
const string = "something format_abc";
const regexp = /(?:^|\s)format_(.*?)(?:\s|$)/g;
const matches = string.matchAll(regexp);
for (const match of matches) {
console.log(match);
console.log(match.index)
}
As it returns an iterator, we can say it's lazy, this is useful when handling particularly large numbers of capturing groups, or very large strings. But if you need, the result can be easily transformed into an Array by using the spread syntax or the Array.from
method:
function getFirstGroup(regexp, str) {
const array = [...str.matchAll(regexp)];
return array.map(m => m[1]);
}
// or:
function getFirstGroup(regexp, str) {
return Array.from(str.matchAll(regexp), m => m[1]);
}
In the meantime, while this proposal gets more wide support, you can use the official shim package.
Also, the internal workings of the method are simple. An equivalent implementation using a generator function would be as follows:
function* matchAll(str, regexp) {
const flags = regexp.global ? regexp.flags : regexp.flags + "g";
const re = new RegExp(regexp, flags);
let match;
while (match = re.exec(str)) {
yield match;
}
}
A copy of the original regexp is created; this is to avoid side-effects due to the mutation of the lastIndex
property when going through the multple matches.
Also, we need to ensure the regexp has the global flag to avoid an infinite loop.
I'm also happy to see that even this StackOverflow question was referenced in the discussions of the proposal.
Best Answer
The key to getting this to work is to tell
sed
to exclude what you don't want to be output as well as specifying what you do want.This says:
-n
)p
)In general, in
sed
you capture groups using parentheses and output what you capture using a back reference:will output "bar". If you use
-r
(-E
for OS X) for extended regex, you don't need to escape the parentheses:There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated:
outputs "a bar a".
If you have GNU
grep
(it may also work in BSD, including OS X):or variations such as:
The
-P
option enables Perl Compatible Regular Expressions. Seeman 3 pcrepattern
orman 3 pcresyntax
.