Honestly 400ms isn't too shabby for a Magento CE installation although I don't know what kind of page that is (Category / product / cart etc).
If you want a quick way to speed up the shop look into using a Full Page Cache extension like Brim FPC.
It's easy to install and configure and it works more effective than the Magento default cache.
Another thing you can do is to switch from file based caching to memory based caching. Since the RAM memory is faster in reading and writing than the disk it will save you some time.
Check out this article for that.
And last but not least: the database. In a lot of cases this will prove to be a bottle neck. Check your slow query log or use a tool like mytop to see how busy the MySQL server is.
Moving it from the localhost to a seperate server that can be finetuned specifically for MySQL will also give you better performance
There are two difficult things in computer science:
- Naming things
- Cache invalidation.
Hole punching falls into category #2 :)
General
The best approach is to start at the lower points of the stack and optimize up to the frontend of Magento.
Database and Filesystem
Should always be the first areas to focus on. Because. I/O.
MyTop is a handy Linux based perl script that will mimic the Linux 'top' command and give you insight on the state of your MySQL instance(s).
Htop is a more robust top, The strace feature can help determine ins/outs of a process to find potential bottlenecks.
Iotop is another tool to consider for monitoring I/O.
Other handy utility scripts like mysqltuner.pl and mysql tunning primer can offer insight into your MySQL runtime variables and offer advice to help. Keep in mind these are meant to be guides as the best approach is always an evaluation of requirements and tuning based on known data gathered. Blindly doing so can cause more damage at times than good. And prematurely running these without at least 24 hours of mysql runtime variables may offer bad advice.
Keep in mind Percona, MariaDB and standard MySQL should work with all of the above. Favoring Percona as a MySQL fork, since Magento is so heavy on InnoDB and XtraDB offers many tools and enhancements to the db engine.
Apache or Nginx
Still using Apache as it has served many others well, myself included. I have used and configured Nginx as well. While it does offer some advantages there is a learning curve. While the two are both popular options, it does offer some advantages over Apache, one would be a smaller memory footprint. However a slim downed Apache running PHP-FPM will have a similar memory footprint.
Case in point:
Since this article was about performance, I should point out that one
of the easiest ways to help apache get out of its own way is to not
use .htaccess files. Put what you'd put there in your Directory
stanzas, set AllowOverride to "None" and you end up not asking apache
to traverse the whole document path to figure out if it needs to pay
attention to .htaccess or not. This is a basic, simple tuning hint
that many people seem to miss.
To help facilitate this check out:
Utilizing a CDN to help take the ease off of either will help obviously but will have added benefit on frontend optimization since most end users browsers will be able to connect to both servers with the same number of connection limits. This also frees up Apache from not having to jump through checks and such just to serve up a simple static image. Lighthttpd is an option if you want to run a static web server just for content besides a CDN.
PHP
PHP-FPM and APC. Use them, strip out any unneeded or unrequired PHP modules not needed for Magento.
Magento codebase
AOE_TemplateHints is great to determine if your blocks are caching properly:
AOE_Profiler is good for profiling, be sure and enable its DB layer profiling (in a local/dev environment obviously). This in conjunction with the mytop tool mentioned previously makes finding bad behaving SQL an easier task.
3rd Party modules & Custom code
Some very good best practices for optimization from Magento themselves is a good read, and to keep in mind when reviewing 3rd party modules before using them. (there are lots of bad behaving ones IMO).
A tool Magniffer from Magento ECG will help easily identify bad behaving code based on the PDF provided above. It is symfony/php-parser based however but installable via composer.
Varnish
As an advocate of Varnish being the author was a FreeBSD kernel dev, it offers some crazy sub second load times. However if you even have some of the slightest differences in your templates that isn't out of box, you will spend time configuring varnish / magento to holepunch the content you need. Most I've seen will simply AJAX'ify the needed items uncached from Varnish.
There are a number of Magento modules to help facilitate this hole punching and caching:
Ultimately this should be at the last end of your optimization journey, and MAY require some customization to get things right.
Magento CE FPC
So far the best CE FPC I have found is: Lesti::FPC
it is a very well put together (all observer based) open-source and free FPC for Community.
At the end of the day use your own testing and judgement.
Some further reading:
Best Answer
It's probably not fair to compare performance to massive ecommerce sites like amazon, but the techniques they use can never the less still be applied to your Magento store.
The first thing it's important to note is that these sites are not actually loading all of their content as quickly as it seems like they are, they are actually flushing a part of the page to the browser very quickly, but before the page is actually entirely built, then flushing more pieces of the page as they are generated by the server. You might have heard of this approach as BigPipe which is an approach 'invented' by FaceBook.
Normally what happens is that the entire full page HTML for a page is built by the server and delivered to the browser at which point the browser starts to render and display the page. It's only at this point that assets can load, so images, JS, CSS and so on as the URL's for these assets are of course contained in that HTML.
So the browser will only start rendering page contents once the server has sent it some HTML to actually display. In a complex system like a Magento store, building the HTML for the page takes the server quite some time to do as many thousands of lines of code must be executed and along with many requests made to the database. This will take the server differing amounts of time to do depending on hosting and the server spec but it's never going to be 'superfast'.
But what if we could just build part of the page quickly, and send this to the browser first so the page begins to render at that point, then send the rest of the parts of the page once they have been built? Well that's exactly what BigPipe does and what sites like Amazon and FaceBook use to have their pages start rendering very quickly in the browser.
Have a look at the 'waiting' and 'receiving' times for this request (screenshot from firebug, the stats are for the first request at the top, ignore the rest):
The 'waiting' time is the time it takes the server to generate HTML and send it to the browser so the browser can start rendering the page. In this case the 'waiting' time is very short, just 49ms. So the time the first byte of data is sent to the server (TTFB) is very quick. The 'receiving' time is the take it takes for the browser to receive all of the data for that request from the server. Oddly you will notice that it's very much longer than the waiting time at 1.29s. So even though the server first starts sending data after just 49ms, it takes a further 1.29s to send all of the HTML to the page. This kind of profile of a request is a tell tale sign of using a BigPipe approach which flushes a part of a page down the connection to the browser to achieve a good TTFB, then keeps the connection open until the rest of the page has been built and also flushed down the connection.
Compare that request to this one from the Magento demo site:
You can immediately see the total reverse of proportionate timings for 'waiting' and 'receiving'. Here the waiting time is much longer compared to a very short 'receiving' time. This indicates that the server is generating the entire pages worth of HTML before delivering it to the browser. We can therefore conclude that the TTFB is much slower as so the page will begin to render in the browser after a much longer period of time - 299ms vs just 49ms for Amazon. The receiving time is very short at just 2ms as at this point all the server has to do is deliver HTML as text to the browser as the page has been completely built. Interestingly you may notice that it takes around 1.3s to build the entire pages HTML on Amazon, but only around 300ms on the Magento demo - so Magento is actually quicker, it's because of BigPipe that it appears the reverse.
So 'superfast' performing websites may in fact not be performing as fast as you think, with the performance in fact just being 'perceived' - though the end result as far as the user browsing the site is concerned is still a 'fast website'.
You can of course also go down the full page caching route with Magento although I'm not sure than any solutions except ours use BigPipe. Full page caching is a good solution to achieve good TTFB, and there are plenty of both paid and free extensions out there, feel free to check out ours, Evolved Caching, which we really feel is a great option.
In simple terms full page caching will store the entire pages worth of HTML for a request and then serve that cached HTML the next time a request for that page comes in. Most solutions pull the full page HTML, populate it with dynamic content (i.e. mini cart, header links etc) and then deliver the complete HTML to the browser. This gives a decent TTFB, but Evolved Caching instead pulls the cached full page HTML, delivers it to the browser immediately and then populates the page with dynamic content either via BigPipe or AJAX. This gives an excellent TTFB as you are delivering full page HTML to the browser totally outside the Magento framework (which is slow to initalise).
So both BigPipe and full page caching are both approaches you may want to look at (although BigPipe without using Evolved Caching would likely need to be a custom build).
Anyway, all of the above deals just with the first request to the server which generates the HTML for the page, after that all further requests are after assets on the page like images, JS and CSS or for external services. Each of these requests take a finite time to complete so the more requests the page makes, the longer it takes for the page to finish loading entirely. you should try to minimise the number of assets on your pages as much as is realistic, and a CDN will help by spreading requests across multiple domains (meaning the browser can request more assets at once as the request limit if per domain).
Having said all this, over and above everything you need to make sure the hosting you have your site running on is sufficiently highly specced to run your Magento store well. What spec this needs to be exactly and what you need to spend on it is very specific to each particular store and the traffic that store sees, but essentially the larger the load the webserver has to deal with the higher the spec the hosting needs to be.
When the store is of sufficient size you can also look at other options like a master/slave database setup, database sharding, splitting admin to another server and so on - Magento is pretty scalable. Remember that the hosting is the foundation for the entire store - if it's not right then no amount of BigPipe, caching or any other implementation will give you a store which performs well.
Hopefully some points for consideration. Let me know if you want any more details on anything (although someone else would be more qualified than me to answer hosting related queries).