The circuit is okay (not ideal for quality but it will work), but there's one small issue if you want to feed the output to your Arduino. As shown, the output will swing below ground (i.e. it will be biased at 0V) and your Arduinos analog input will only accept positive voltages.
The output with the above circuit will be something like this:

If your supply is 5V, you need to bias the output to 2.5V to get the maximum swing from your input signal.
Adding a voltage divider after the capacitor will do this:
The voltage divider is made from R2 and R4, and it biases (read "holds") the TO_ADC
node at 2.5V so the ADC pin sees the full swing of the signal. Without it the ADC would only see the positive half of the signal, because we have no negative power supply present.
The formula for a voltage divider is:


So for the voltage divider formed from R2 and R4, with the 5V supply we get:
5V * (R4 / (R2 + R4) which equals:
5V * (100kΩ / (100kΩ + 100kΩ) = 5V / 0.5 = 2.5V at the middle (Vout in the above example diagram, which is the TO_ADC
node in our circuit)
Then the output will be more like this (depending on your ADCs input impedance it may not work well though - this is the bit that is simulated by Radc and Cadc, I'll check this shortly):

There are other options also, I will try and post an improved circuit shortly.
Okay, here's an option which controls the transistor gain properly (using the emitter resistor with AC bypass) and outputs a lower impedance signal that swings around ~2.5V (V+ is 5V - the capacitors do not have to be as large as 10uF, you can still use 100nF if you wish for your input capacitor):

Radc and Cadc
Radc and Cadc are not components you need to add (so you can ignore them if/when you make the circuit), they represent your microcontrollers analogue input pin characteristics. Some microcontroller ADCs can have quite low input impedances which can load your signal and attenuate it (so basically you end up with a lower reading than you expected)
So when we simulate, it's good to add this simulated loading to make sure the signal will not be affected too badly.
Simulation (note simulated ADC loading also):
We can see this handles a 20mV input pretty well, if we input 20mV to the original circuit (even without any loading), we get some distortion due to the uneven gain (note flattened edges on negative swing):

There are still better options and variations (the above may need the values tweaking a little) A simple opamp circuit would be one, but it depends on how concerned you are about the sound quality whether you would want to bother. If you're happy with a bit of distortion, then the first circuit with a suitable method of biasing will be fine.
Your problem may be that the LM358 is not designed to drive a speaker as a load. You will need to add an output stage with a low output impedance of ~4 to 32 ohms or less, depending on your speaker's impedance. Here is an example of a typical configuration using BJTs as the output stage:

Note how the feedback loop has been modified to incorporate the output stage. A trimpot is used in this schematic to adjust the volume.
My second hypothesis, is in the event that you have a high impedance set of headphones (around 600 ohms), is that your gain is too high and your are railing out. If that's the case, then you can either reduce your gain or replace your gain resistors with a trimpot and try adjusting the volume until it sounds better. If the "rett rett" sound you described is a loud sound, then I would recommend skipping my first suggestion and try this first instead.
Best Answer
The critical part is the opamp. The LM358 is not an RRIO (Rail-to-Rail I/O) opamp, which here means that the output voltage will be a few volts shy of the supply voltage, so don't expect it to go higher than about 3V. On the lower side there's no problem, the datasheet specifies an output low level of 5mV typical.
So if you keep the amplification to a decent level (controlled by R5) this should work. You can also set the virtual ground to 1.5V (halfway the output range) instead of 2.5V, by choosing R3=22k. This will give you an output voltage swing of 3Vpp, instead of 1Vpp.
As an alternative to the LM358 you might use a rail-to-rail output opamp, like the LMV321. You can then leave R3=10k. (Rail-to-rail input isn't required since the input stays around the virtual ground.)