Getting some sort of modification date in a cross-platform way is easy - just call os.path.getmtime(path)
and you'll get the Unix timestamp of when the file at path
was last modified.
Getting file creation dates, on the other hand, is fiddly and platform-dependent, differing even between the three big OSes:
- On Windows, a file's
ctime
(documented at https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx) stores its creation date. You can access this in Python through os.path.getctime()
or the .st_ctime
attribute of the result of a call to os.stat()
. This won't work on Unix, where the ctime
is the last time that the file's attributes or content were changed.
- On Mac, as well as some other Unix-based OSes, you can use the
.st_birthtime
attribute of the result of a call to os.stat()
.
On Linux, this is currently impossible, at least without writing a C extension for Python. Although some file systems commonly used with Linux do store creation dates (for example, ext4
stores them in st_crtime
) , the Linux kernel offers no way of accessing them; in particular, the structs it returns from stat()
calls in C, as of the latest kernel version, don't contain any creation date fields. You can also see that the identifier st_crtime
doesn't currently feature anywhere in the Python source. At least if you're on ext4
, the data is attached to the inodes in the file system, but there's no convenient way of accessing it.
The next-best thing on Linux is to access the file's mtime
, through either os.path.getmtime()
or the .st_mtime
attribute of an os.stat()
result. This will give you the last time the file's content was modified, which may be adequate for some use cases.
Putting this all together, cross-platform code should look something like this...
import os
import platform
def creation_date(path_to_file):
"""
Try to get the date that a file was created, falling back to when it was
last modified if that isn't possible.
See http://stackoverflow.com/a/39501288/1709587 for explanation.
"""
if platform.system() == 'Windows':
return os.path.getctime(path_to_file)
else:
stat = os.stat(path_to_file)
try:
return stat.st_birthtime
except AttributeError:
# We're probably on Linux. No easy way to get creation dates here,
# so we'll settle for when its content was last modified.
return stat.st_mtime
Read all text from a file
Java 11 added the readString() method to read small files as a String
, preserving line terminators:
String content = Files.readString(path, StandardCharsets.US_ASCII);
For versions between Java 7 and 11, here's a compact, robust idiom, wrapped up in a utility method:
static String readFile(String path, Charset encoding)
throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
Read lines of text from a file
Java 7 added a convenience method to read a file as lines of text, represented as a List<String>
. This approach is "lossy" because the line separators are stripped from the end of each line.
List<String> lines = Files.readAllLines(Paths.get(path), encoding);
Java 8 added the Files.lines()
method to produce a Stream<String>
. Again, this method is lossy because line separators are stripped. If an IOException
is encountered while reading the file, it is wrapped in an UncheckedIOException
, since Stream
doesn't accept lambdas that throw checked exceptions.
try (Stream<String> lines = Files.lines(path, encoding)) {
lines.forEach(System.out::println);
}
This Stream
does need a close()
call; this is poorly documented on the API, and I suspect many people don't even notice Stream
has a close()
method. Be sure to use an ARM-block as shown.
If you are working with a source other than a file, you can use the lines()
method in BufferedReader
instead.
Memory utilization
The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.
The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.
For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM. The third method, using a Stream<String>
is one way to do this, if your input "records" happen to be individual lines. (Using the readLine()
method of BufferedReader
is the procedural equivalent to this approach.)
Character encoding
One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.
The StandardCharsets
class defines some constants for the encodings required of all Java runtimes:
String content = readFile("test.txt", StandardCharsets.UTF_8);
The platform default is available from the Charset
class itself:
String content = readFile("test.txt", Charset.defaultCharset());
Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.
Best Answer
It appears there is no logically straight forward method to do such a thing.
But there is a work around (with a substantial amount of work for such basic functionaliy) that can be found at codeproject.
If anyone finds a tidy way of doing it, let me know, I'd be intrigued.