I think I have came across a few times when I reformat some segment of code, it performed the way I wanted it to other than the other poorly formatted segment of code? Does code format affect performance? Or is that a myth?
Does code format affect performance
code formattingperformance
Related Solutions
First off, your team needs to pick a formatting convention and stick with it. You need to come to an agreement and have everyone stick to it so you don't have people fighting over what things should look like. This should not just be something you do on your own.
As for your real question. Formatting code is not a bad thing. What is bad is making major formatting changes in the same commit as code changes. When your team comes to consensus about how things should be formatted, make one pass thru the code and format everything. Check that in by itself. The commit message will make it clear that the changes are just white space and not functional. Then when you need to make functional changes, they are in a different commit so they can be clearly seen.
API/CSV
Ask those websites if they provide an API, or, if you don't need an up-to-date information or the information you need doesn't change frequently, if they can sell/give you for free the data itself (for example in an CSV file). Some small websites may have fancier ways to access data, like a CSV file for the older information, and an RSS feed for the changed one.
Those websites would probably be happy to help you, since providing you with an API would reduce their own CPU and bandwidth usage by you.
Profile
Screen scrapping is really ugly when it comes to performance and scaling. You may be limited by:
- your machine performance, since parsing, sometimes an invalid HTML file, takes time,
- your network speed,
- their network speed usage, i.e. how fast can you access the pages of their website depending on the restrictions they set, like the DOS protection and the number of requests per second for screen scrappers and search engine crawlers,
- their machine performance: if they spend 500 ms. to generate every page, you can't do anything to reduce this delay.
If, despite your requests to them, those websites cannot provide any convenient way to access their data, but they give you a written consent to screen scrape their website, then profile your code to determine the bottleneck. It may be the internet speed. It may be your database queries. It may be anything.
For example, you may discover that you spend too much time finding with regular expressions the relevant information in the received HTML. In that case, you would want to stop doing it wrong and use a parser instead of regular expressions, then see how this improve the performance.
You may also find that the bottleneck is the time the remote server spends generating every page. In this case, there is nothing to do: you may have the fastest server, the fastest connection and the most optimized code, the performance will be the same.
Do things in parallel:
Remember to use parallel computing wisely and to always profile what you're doing, instead of doing premature optimization, in hope that you're smarter than the profiler.
Especially when it comes to using network, you may be very surprised. For example, you may believe that making more requests in parallel will be faster, but as Steve Gibson explains in episode 345 of Security Now, this is not always the case.
Legal aspects
Also note that screen scrapping is explicitly forbidden by the conditions of use (like on IMDB) on many websites. And if nothing is said on this subject in conditions of use, it doesn't mean that you can screen scrape those websites.
The fact that the information is available publicly on the internet doesn't give you the right to copy and reuse it this way neither.
Why? you may ask. For two reasons:
Most websites are relying on advertisement and marketing. When people use one of those websites directly, they waste some CPU/network bandwidth of the website, but in response, they may click on an ad or buy something sold on the website. When you screen scrape, your bot waste their CPU/network bandwidth, but will never click on an ad or buy something.
Displaying the information you screen scrapped on your website can have even worse effects. Example: in France, there are two major websites selling hardware. The first one is easy and fast to use, has a nice visual design, better SEO, and in general is very well done. The second one is a crap, but the prices are lower. If you screen scrape them and give the raw results (prices with links) to your users, they will obviously click on the lower price every time, which means that the website with pretty design will have less chances to sell the products.
People made an effort in collecting, processing and displaying some data. Sometimes they paid to get it. Why would they enjoy seeing you pulling this data conveniently and for free?
Best Answer
In a compiled language, any superfluous whitespaces, comments or other elements without syntactical meaning do not survive the tokenization step of the compilation, so it doesn't make a difference at all for the resulting binary (at least for the executable parts - some compilers might embed original sourcecode in the generated binary for debugging purposes, but those parts of the binary aren't executed).
In purely interpreted languages, the interpreter needs to parse all whitespaces. So formating can actually slow down the interpreter a tiny bit. But more advanced* interpreters usually compile the code in-memory to an optimized representation ("bytecode") before they execute it . In this step, whitespaces are usually also stripped, so it shouldn't matter for runtime either.
*in this case "more advanced" means "anything you would use in a 2014 production environment".