Google-sheets – Complete reference documentation for Google Spreadsheets formulae

formulasgoogle sheets

As a programmer, I find myself somewhat frustrated when trying to use Google Spreadsheets, because there seems to be no definitive description of the syntax that I can use for formulae. For instance, I've read that I can use F3:F to refer to the F column from row 3 downwards, but I can't find where it mentions this in the Google reference documentation, and it doesn't always seem to work. I'm probably doing it wrong, but that's the point of this question.

I do not want to trawl through five different tutorials to try to guess the rules! Is there a concise description of the grammar and semantics available somewhere? If not, perhaps someone could summarize it here.

Best Answer

This is some text excerpted from here. I've posted it here as a straw man. If anyone that knows how this differs from the formulas in Google Spreadsheets could make a comment, then perhaps we can end up with a good answer.

Formulas

Formulas allow you to perform calculations within table cells. Every formula MAY begin with a namespace prefix specifying the syntax and semantics used within the formula. When there is no prefix, the default formula syntax and semantics as described below are used. This is followed by an equal sign and then the formula itself. Implementations MAY accept multiple formula syntaxes, and they MAY accept various extensions to the default formula syntax. However, all implementations that accept formulas MUST accept the default formula syntax and semantics as described here. They MAY also accept extensions to it.

A document implementing the strict schema MAY NOT use a formula namespace prefix (since there is no guarantee that other receiving systems would be able to process it), and MAY NOT use any extensions to the semantics and syntax described below.: In the default formula syntax, after the initial equal sign a formula must be an expression. An expression may be a number, a constant string, a named range, a pair of expressions connected by a binary operator, an expression prefixed by a unary operator, a logical operator, a function call, a cell address, or an expression surrounded by parentheses. A function call and logical operator may have zero or more semicolon-separated parameters, and each parameter MUST be an expression. The syntax for each of these is as follows:

  • Numbers. Numbers are written and read in this format using the "C" locale (using the "." decimal separator and no thousands separator), using setlocale(LC_NUMERIC, "C") or equivalent. Numbers can end in %, which divides that number by 100. The “%” does not change the meaning of other operators, so 2+10% is 2.1 (not 2.2). Note that leading - and + signs are allowed as unary operators, described below. Writers MUST write numbers that match the pattern (note that it must begin with a digit):
[0-9]+(.[0-9]+)?([eE][+-]?[0-9]+)?%?
Readers MUST be able to read these numbers, as well as accept numbers that begin with a leading “.”, so they must be able to read numbers in the form:
((.[0-9]+)|([0-9]+(.[0-9]+)?([eE][+-]?[0-9]+)?))%?Constant Strings.. Constant strings are surrounded by double-quotes; to embed a double-quote, the double-quote character is used twice. Strings are stored in UTF-8 format. Note that since all content is stored as XML, all double-quotes in the formula are actually stored as " in the XML. Constant strings match the pattern: \"([^"]|\"\")*\"

  • Named ranges/Fields. Named ranges/fields refer to a separated defined value or set of values (in a spreadsheet, typically referring to a cell address or set of addresses). Names are not case-sensitive, so “a” and “A” refer to the same range. Implementations must accept at least named ranges that match the following pattern: [A-Za-z][A-Za-z0-9_]*

  • Operators. Ordinary infix and prefix operators are accepted. These have the following associativity and precedence (from lowest to highest priority):

Associativity Operator(s)        Comments
left          <,=,>,<=,>=,<>     Less than, equal to, greater than,
                                     less than or equal to, greater than or equal to,
                                     not equal to.
left          +,-,&                  Add, subtract, string concatenation. Note that
                                     unary (prefix) + and – has a different priority.
left          *,/                    Multiply, divide. Division does not truncate, so
                                     1 / 2 is equal to 0.5.
right         ^                      Power (2^3 is 8).  Readers SHOULD also accept “**”.
none          +,-                    Prefix unary operators, e.g., -5 or -[.A1].
                                     Note that these have a difference precedence than
                                     add and subtract.

Precedence can be overridden by using parentheses, so “=2+3*4” computes 14 while “=(2+3)*4” computes 20. Note that +, -, *, /, ^ convert any string or binary values they use into numbers before computing; note that & (string concatenation) converts any values into strings before concatenating them.Logical operators. Logical operators have the same syntax as function calls; their names are case-insensitive, parameters are separated by semicolons, and their name MUST be followed by parentheses. The logical operators are:


Operator           Parameter count           Comment
TRUE()             0                         This is a boolean constant, though its syntax makes it appear like a function
FALSE()            0                         This is a boolean constant
NOT(expression)    1                         If expression is TRUE() returns FALSE(), else returns TRUE()
AND(e1; e2 [; e]*) 2 or more                 If all expressions are TRUE() returns TRUE(), else returns FALSE()
OR(e1; e2 [; e]*)  2 or more                 If all expressions are FALSE() returns FALSE(), else returns TRUE()
IF(condition; true_exp; false_exp)
                   3                         Evaluates condition. If it's true, return true_exp, else return false_exp

Implementations of AND(), OR(), and IF() must short-circuit, that is, they must evaluate left-to-right in turn, and only evaluate the expressions they must evaluate to compute the result. An implementation may choose to evaluate more, but only when the expressions have no side-effects. Implementations of AND() and OR() SHOULD accept an arbitrary number of parameters, but MUST accept at least 30 in each use. The operations NOT(), AND(), and OR(), as well as the condition in IF(), are intended for boolean values; if expressions of other types are used, an implementation SHOULD NOT consider 0 as false and any other numeric value as true, and SHOULD NOT consider a zero-length string as false and any other string value as true. If an error value is computed for an expression, then that first error is the result of the logical operation.


Function calls. A function call has a function name matching the pattern [A-za-z][A-Za-z0-9_]* followed by an opening parenthesis, zero or more parameters, and a closing parenthesis. Parameters are separated by a semicolon (not a comma), though readers MAY optionally accept function calls using commas as separators as well. Function names are case-insensitive, so “sum” and “SUM” are the same function. If there are parameters, each must be an expression and none can be empty, so X(;) is not a legal function call while RAND() is perfectly legal. If a parameter is optional, it MAY be omitted, but if omitted its separator MUST be omitted as well (function specifications should state which parameters are optional, and what omitting them means). Typical implementations will have many built-in functions, and most implementations also support one or more ways to create user-defined functions. Common functions include:

  • SUM(list) - sums up all the numbers in the range(s) of list.
  • COUNT(list) - counts the number of numbers in the range(s) of list
  • AVERAGE(list) - Computes the average, equal to SUM(list) / COUNT(list)
  • MIN(list) - minimum numeric value of list
  • MAX(list) - maximum value of list
  • ROUND(n, count) – round n to count digits (if count omitted, count=0)
  • INT(n) – round n down to nearest integer.
  • ISEVEN(n) – returns TRUE() if n is even, else returns FALSE().
  • SUBSTITUTE(text; searchtext; newtext; occurrence) – substitutes newtext for searchtext in text, occurrence number of times (if occurrence omitted, all times).

Addresses of cells that contain numbers. The addresses can be relative or absolute. A relative address consists of a column letter and a row number. Prefixing the row letter or the column number with a $ makes the row or column absolute.

Whitespace (space, tab, newline, and carriage return) is ignored in the default formulas syntax, except in the contents of string constants and as a separator for multiple cell range addresses in a cell range address list.

User interfaces of implementations MAY choose to accept and display formulas differently from how they are exchanged in this data format. For example, they MAY accept and display numbers using the format of the current locale, they MAY always use a particular locale for numeric formats, they MAY use commas instead of semicolons for parameter separators, and they MAY accept and display cell addresses without requiring the use of square brackets. But implementation user interfaces SHOULD accept the default format as input as well where possible, e.g., implementations SHOULD accept numbers that meet the “C” locale requirements (as well as the current locale's), and SHOULD accept bracketed cell addresses. Also, implementation user interfaces SHOULD correct likely mistakes, possibly with a dialogue. For example, if an implementation normally displays parameter separators as a semicolon, an implementation SHOULD attempt to detect if a user is using commas as function parameter separators instead of semicolons, and correct it.

The following is an example of a simple formula:

=sum(A1:A5)

This formula calculates the sum of the values of all cells in the range “.A1:.A5”. The function is “sum”. The parameters are marked by a “(“ at the start and a “)” at the end. If a function contains more than one parameter, the parameters are separated by a “;”. The following is a variation of the formula shown above:

=sum(A1;A2;A3;A4;A5)

The result of this formula is the same. The components that you use in the formula depend on the application that you are using.