>>> ["foo", "bar", "baz"].index("bar")
1
Reference: Data Structures > More on Lists
Caveats follow
Note that while this is perhaps the cleanest way to answer the question as asked, index
is a rather weak component of the list
API, and I can't remember the last time I used it in anger. It's been pointed out to me in the comments that because this answer is heavily referenced, it should be made more complete. Some caveats about list.index
follow. It is probably worth initially taking a look at the documentation for it:
list.index(x[, start[, end]])
Return zero-based index in the list of the first item whose value is equal to x. Raises a ValueError
if there is no such item.
The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.
Linear time-complexity in list length
An index
call checks every element of the list in order, until it finds a match. If your list is long, and you don't know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index
a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000)
is roughly five orders of magnitude faster than straight l.index(999_999)
, because the former only has to search 10 entries, while the latter searches a million:
>>> import timeit
>>> timeit.timeit('l.index(999_999)', setup='l = list(range(0, 1_000_000))', number=1000)
9.356267921015387
>>> timeit.timeit('l.index(999_999, 999_990, 1_000_000)', setup='l = list(range(0, 1_000_000))', number=1000)
0.0004404920036904514
Only returns the index of the first match to its argument
A call to index
searches through the list in order until it finds a match, and stops there. If you expect to need indices of more matches, you should use a list comprehension, or generator expression.
>>> [1, 1].index(1)
0
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> next(g)
0
>>> next(g)
2
Most places where I once would have used index
, I now use a list comprehension or generator expression because they're more generalizable. So if you're considering reaching for index
, take a look at these excellent Python features.
Throws if element not present in list
A call to index
results in a ValueError
if the item's not present.
>>> [1, 1].index(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 2 is not in list
If the item might not be present in the list, you should either
- Check for it first with
item in my_list
(clean, readable approach), or
- Wrap the
index
call in a try/except
block which catches ValueError
(probably faster, at least when the list to search is long, and the item is usually present.)
Best Answer
The other answers show you how to make a list of data.frames when you already have a bunch of data.frames, e.g.,
d1
,d2
, .... Having sequentially named data frames is a problem, and putting them in a list is a good fix, but best practice is to avoid having a bunch of data.frames not in a list in the first place.The other answers give plenty of detail of how to assign data frames to list elements, access them, etc. We'll cover that a little here too, but the Main Point is to say don't wait until you have a bunch of a
data.frames
to add them to a list. Start with the list.The rest of the this answer will cover some common cases where you might be tempted to create sequential variables, and show you how to go straight to lists. If you're new to lists in R, you might want to also read What's the difference between
[[
and[
in accessing elements of a list?.Lists from the start
Don't ever create
d1
d2
d3
, ...,dn
in the first place. Create a listd
withn
elements.Reading multiple files into a list of data frames
This is done pretty easily when reading in files. Maybe you've got files
data1.csv, data2.csv, ...
in a directory. Your goal is a list of data.frames calledmydata
. The first thing you need is a vector with all the file names. You can construct this with paste (e.g.,my_files = paste0("data", 1:5, ".csv")
), but it's probably easier to uselist.files
to grab all the appropriate files:my_files <- list.files(pattern = "\\.csv$")
. You can use regular expressions to match the files, read more about regular expressions in other questions if you need help there. This way you can grab all CSV files even if they don't follow a nice naming scheme. Or you can use a fancier regex pattern if you need to pick certain CSV files out from a bunch of them.At this point, most R beginners will use a
for
loop, and there's nothing wrong with that, it works just fine.A more R-like way to do it is with
lapply
, which is a shortcut for the aboveOf course, substitute other data import function for
read.csv
as appropriate.readr::read_csv
ordata.table::fread
will be faster, or you may also need a different function for a different file type.Either way, it's handy to name the list elements to match the files
Splitting a data frame into a list of data frames
This is super-easy, the base function
split()
does it for you. You can split by a column (or columns) of the data, or by anything else you wantThis is also a nice way to break a data frame into pieces for cross-validation. Maybe you want to split
mtcars
into training, test, and validation pieces.Simulating a list of data frames
Maybe you're simulating data, something like this:
But who does only one simulation? You want to do this 100 times, 1000 times, more! But you don't want 10,000 data frames in your workspace. Use
replicate
and put them in a list:In this case especially, you should also consider whether you really need separate data frames, or would a single data frame with a "group" column work just as well? Using
data.table
ordplyr
it's quite easy to do things "by group" to a data frame.I didn't put my data in a list :( I will next time, but what can I do now?
If they're an odd assortment (which is unusual), you can simply assign them:
If you have data frames named in a pattern, e.g.,
df1
,df2
,df3
, and you want them in a list, you canget
them if you can write a regular expression to match the names. Something likeGenerally,
mget
is used to get multiple objects and return them in a named list. Its counterpartget
is used to get a single object and return it (not in a list).Combining a list of data frames into a single data frame
A common task is combining a list of data frames into one big data frame. If you want to stack them on top of each other, you would use
rbind
for a pair of them, but for a list of data frames here are three good choices:(Similarly using
cbind
ordplyr::bind_cols
for columns.)To merge (join) a list of data frames, you can see these answers. Often, the idea is to use
Reduce
withmerge
(or some other joining function) to get them together.Why put the data in a list?
Put similar data in lists because you want to do similar things to each data frame, and functions like
lapply
,sapply
do.call
, thepurrr
package, and the oldplyr
l*ply
functions make it easy to do that. Examples of people easily doing things with lists are all over SO.Even if you use a lowly for loop, it's much easier to loop over the elements of a list than it is to construct variable names with
paste
and access the objects withget
. Easier to debug, too.Think of scalability. If you really only need three variables, it's fine to use
d1
,d2
,d3
. But then if it turns out you really need 6, that's a lot more typing. And next time, when you need 10 or 20, you find yourself copying and pasting lines of code, maybe using find/replace to changed14
tod15
, and you're thinking this isn't how programming should be. If you use a list, the difference between 3 cases, 30 cases, and 300 cases is at most one line of code---no change at all if your number of cases is automatically detected by, e.g., how many.csv
files are in your directory.You can name the elements of a list, in case you want to use something other than numeric indices to access your data frames (and you can use both, this isn't an XOR choice).
Overall, using lists will lead you to write cleaner, easier-to-read code, which will result in fewer bugs and less confusion.