Remove NA values from a vector

maxminmissing-datanar

I have a huge vector which has a couple of NA values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA values.

How can I remove the NA values so that I can compute the max?

Best Answer

Trying ?max, you'll see that it actually has a na.rm = argument, set by default to FALSE. (That's the common default for many other R functions, including sum(), mean(), etc.)

Setting na.rm=TRUE does just what you're asking for:

d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)

If you do want to remove all of the NAs, use this idiom instead:

d <- d[!is.na(d)]

A final note: Other functions (e.g. table(), lm(), and sort()) have NA-related arguments that use different names (and offer different options). So if NA's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.

Related Solutions

Sql – Max function in SQL Server that takes two values like Math.Max in .NET

If you're using SQL Server 2008 (or above), then this is the better solution:

SELECT o.OrderId,
       (SELECT MAX(Price)
        FROM (VALUES (o.NegotiatedPrice),(o.SuggestedPrice)) AS AllPrices(Price))
FROM Order o

All credit and votes should go to Sven's answer to a related question, "SQL MAX of multiple columns?"
I say it's the "best answer" because:

It doesn't require complicating your code with UNION's, PIVOT's, UNPIVOT's, UDF's, and crazy-long CASE statments.
It isn't plagued with the problem of handling nulls, it handles them just fine.
It's easy to swap out the "MAX" with "MIN", "AVG", or "SUM". You can use any aggregate function to find the aggregate over many different columns.
You're not limited to the names I used (i.e. "AllPrices" and "Price"). You can pick your own names to make it easier to read and understand for the next guy.
You can find multiple aggregates using SQL Server 2008's derived_tables like so:
SELECT MAX(a), MAX(b) FROM (VALUES (1, 2), (3, 4), (5, 6), (7, 8), (9, 10) ) AS MyTable(a, b)

R – How to join (merge) data frames (inner, outer, left, right)

By using the merge function and its optional parameters:

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)

~~Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable.~~ I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.

You can merge on multiple columns by giving by a vector, e.g., by = c("CustomerId", "OrderId").

If the column names to merge on are not the same, you can specify, e.g., by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2" where CustomerId_in_df1 is the name of the column in the first data frame and CustomerId_in_df2 is the name of the column in the second data frame. (These can also be vectors if you need to merge on multiple columns.)

Best Answer

Related Solutions

Sql – Max function in SQL Server that takes two values like Math.Max in .NET

R – How to join (merge) data frames (inner, outer, left, right)

Related Topic