Android – How to affect Delphi XEx code generation for Android/ARM targets

androidandroid-ndkarmdelphillvm

Update 2017-05-17. I no longer work for the company where this question originated, and do not have access to Delphi XEx. While I was there, the problem was solved by migrating to mixed FPC+GCC (Pascal+C), with NEON intrinsics for some routines where it made a difference. (FPC+GCC is highly recommended also because it enables using standard tools, particularly Valgrind.) If someone can demonstrate, with credible examples, how they are actually able to produce optimized ARM code from Delphi XEx, I'm happy to accept the answer.

Embarcadero's Delphi compilers use an LLVM backend to produce native ARM code for Android devices. I have large amounts of Pascal code that I need to compile into Android applications and I would like to know how to make Delphi generate more efficient code. Right now, I'm not even talking about advanced features like automatic SIMD optimizations, just about producing reasonable code. Surely there must be a way to pass parameters to the LLVM side, or somehow affect the result? Usually, any compiler will have many options to affect code compilation and optimization, but Delphi's ARM targets seem to be just "optimization on/off" and that's it.

LLVM is supposed to be capable of producing reasonably tight and sensible code, but it seems that Delphi is using its facilities in a weird way. Delphi wants to use the stack very heavily, and it generally only utilizes the processor's registers r0-r3 as temporary variables. Perhaps the craziest of all, it seems to be loading normal 32 bit integers as four 1-byte load operations. How to make Delphi produce better ARM code, and without the byte-by-byte hassle it is making for Android?

At first I thought the byte-by-byte loading was for swapping byte order from big-endian, but that was not the case, it is really just loading a 32 bit number with 4 single-byte loads.* It might be to load the full 32 bits without doing an unaligned word-sized memory load. (whether it SHOULD avoid that is another thing, which would hint to the whole thing being a compiler bug)*

Let's look at this simple function:

function ReadInteger(APInteger : PInteger) : Integer;
begin
  Result := APInteger^;
end;

Even with optimizations switched on, Delphi XE7 with update pack 1, as well as XE6, produce the following ARM assembly code for that function:

Disassembly of section .text._ZN16Uarmcodetestform11ReadIntegerEPi:

00000000 <_ZN16Uarmcodetestform11ReadIntegerEPi>:
   0:   b580        push    {r7, lr}
   2:   466f        mov r7, sp
   4:   b083        sub sp, #12
   6:   9002        str r0, [sp, #8]
   8:   78c1        ldrb    r1, [r0, #3]
   a:   7882        ldrb    r2, [r0, #2]
   c:   ea42 2101   orr.w   r1, r2, r1, lsl #8
  10:   7842        ldrb    r2, [r0, #1]
  12:   7803        ldrb    r3, [r0, #0]
  14:   ea43 2202   orr.w   r2, r3, r2, lsl #8
  18:   ea42 4101   orr.w   r1, r2, r1, lsl #16
  1c:   9101        str r1, [sp, #4]
  1e:   9000        str r0, [sp, #0]
  20:   4608        mov r0, r1
  22:   b003        add sp, #12
  24:   bd80        pop {r7, pc}

Just count the number of instructions and memory accesses Delphi needs for that. And constructing a 32 bit integer from 4 single-byte loads… If I change the function a little bit and use a var parameter instead of a pointer, it is slightly less convoluted:

Disassembly of section .text._ZN16Uarmcodetestform14ReadIntegerVarERi:

00000000 <_ZN16Uarmcodetestform14ReadIntegerVarERi>:
   0:   b580        push    {r7, lr}
   2:   466f        mov r7, sp
   4:   b083        sub sp, #12
   6:   9002        str r0, [sp, #8]
   8:   6801        ldr r1, [r0, #0]
   a:   9101        str r1, [sp, #4]
   c:   9000        str r0, [sp, #0]
   e:   4608        mov r0, r1
  10:   b003        add sp, #12
  12:   bd80        pop {r7, pc}

I won't include the disassembly here, but for iOS, Delphi produces identical code for the pointer and var parameter versions, and they are almost but not exactly the same as the Android var parameter version.
Edit: to clarify, the byte-by-byte loading is only on Android. And only on Android, the pointer and var parameter versions differ from each other. On iOS both versions generate exactly the same code.

For comparison, here's what FPC 2.7.1 (SVN trunk version from March 2014) thinks of the function with optimization level -O2. The pointer and var parameter versions are exactly the same.

Disassembly of section .text.n_p$armcodetest_$$_readinteger$pinteger$$longint:

00000000 <P$ARMCODETEST_$$_READINTEGER$PINTEGER$$LONGINT>:

   0:   6800        ldr r0, [r0, #0]
   2:   46f7        mov pc, lr

I also tested an equivalent C function with the C compiler that comes with the Android NDK.

int ReadInteger(int *APInteger)
{
    return *APInteger;
}

And this compiles into essentially the same thing FPC made:

Disassembly of section .text._Z11ReadIntegerPi:

00000000 <_Z11ReadIntegerPi>:
   0:   6800        ldr r0, [r0, #0]
   2:   4770        bx  lr

Best Answer

We are investigating the issue. In short, it depends on the potential mis-alignment (to 32 boundary) of the Integer referenced by a pointer. Need a little more time to have all of the answers... and a plan to address this.

Marco Cantù, moderator on Delphi Developers

Also reference Why are the Delphi zlib and zip libraries so slow under 64 bit? as Win64 libraries are shipped built without optimizations.

In the QP Report: RSP-9922 Bad ARM code produced by the compiler, $O directive ignored?, Marco added following explanation:

There are multiple issues here:

As indicated, optimization settings apply only to entire unit files and not to individual functions. Simply put, turning optimization on and off in the same file will have no effect.

Furthermore, simply having "Debug information" enabled turns off optimization. Thus, when one is debugging, explicitly turning on optimizations will have no effect. Consequently, the CPU view in the IDE will not be able to display a disassembled view of optimized code.

Third, loading non-aligned 64bit data is not safe and does result in errors, hence the separate 4 one byte operations that are needed in given scenarios.

Related Solutions

Android – How to lazy load images in ListView in Android

Here's what I created to hold the images that my app is currently displaying. Please note that the "Log" object in use here is my custom wrapper around the final Log class inside Android.

package com.wilson.android.library;

/*
 Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
*/
import java.io.IOException;

public class DrawableManager {
    private final Map<String, Drawable> drawableMap;

    public DrawableManager() {
        drawableMap = new HashMap<String, Drawable>();
    }

    public Drawable fetchDrawable(String urlString) {
        if (drawableMap.containsKey(urlString)) {
            return drawableMap.get(urlString);
        }

        Log.d(this.getClass().getSimpleName(), "image url:" + urlString);
        try {
            InputStream is = fetch(urlString);
            Drawable drawable = Drawable.createFromStream(is, "src");


            if (drawable != null) {
                drawableMap.put(urlString, drawable);
                Log.d(this.getClass().getSimpleName(), "got a thumbnail drawable: " + drawable.getBounds() + ", "
                        + drawable.getIntrinsicHeight() + "," + drawable.getIntrinsicWidth() + ", "
                        + drawable.getMinimumHeight() + "," + drawable.getMinimumWidth());
            } else {
              Log.w(this.getClass().getSimpleName(), "could not get thumbnail");
            }

            return drawable;
        } catch (MalformedURLException e) {
            Log.e(this.getClass().getSimpleName(), "fetchDrawable failed", e);
            return null;
        } catch (IOException e) {
            Log.e(this.getClass().getSimpleName(), "fetchDrawable failed", e);
            return null;
        }
    }

    public void fetchDrawableOnThread(final String urlString, final ImageView imageView) {
        if (drawableMap.containsKey(urlString)) {
            imageView.setImageDrawable(drawableMap.get(urlString));
        }

        final Handler handler = new Handler() {
            @Override
            public void handleMessage(Message message) {
                imageView.setImageDrawable((Drawable) message.obj);
            }
        };

        Thread thread = new Thread() {
            @Override
            public void run() {
                //TODO : set imageView to a "pending" image
                Drawable drawable = fetchDrawable(urlString);
                Message message = handler.obtainMessage(1, drawable);
                handler.sendMessage(message);
            }
        };
        thread.start();
    }

    private InputStream fetch(String urlString) throws MalformedURLException, IOException {
        DefaultHttpClient httpClient = new DefaultHttpClient();
        HttpGet request = new HttpGet(urlString);
        HttpResponse response = httpClient.execute(request);
        return response.getEntity().getContent();
    }
}

Android – How to get screen dimensions as pixels in Android

If you want the display dimensions in pixels you can use getSize:

Display display = getWindowManager().getDefaultDisplay();
Point size = new Point();
display.getSize(size);
int width = size.x;
int height = size.y;

If you're not in an Activity you can get the default Display via WINDOW_SERVICE:

WindowManager wm = (WindowManager) context.getSystemService(Context.WINDOW_SERVICE);
Display display = wm.getDefaultDisplay();

If you are in a fragment and want to acomplish this just use Activity.WindowManager (in Xamarin.Android) or getActivity().getWindowManager() (in java).

Before getSize was introduced (in API level 13), you could use the getWidth and getHeight methods that are now deprecated:

Display display = getWindowManager().getDefaultDisplay(); 
int width = display.getWidth();  // deprecated
int height = display.getHeight();  // deprecated

For the use case, you're describing, however, a margin/padding in the layout seems more appropriate.

Another way is: DisplayMetrics

A structure describing general information about a display, such as its size, density, and font scaling. To access the DisplayMetrics members, initialize an object like this:

DisplayMetrics metrics = new DisplayMetrics();
getWindowManager().getDefaultDisplay().getMetrics(metrics);

We can use widthPixels to get information for:

"The absolute width of the display in pixels."

Example:

Log.d("ApplicationTagName", "Display width in px is " + metrics.widthPixels);

API level 30 update

final WindowMetrics metrics = windowManager.getCurrentWindowMetrics();
 // Gets all excluding insets
 final WindowInsets windowInsets = metrics.getWindowInsets();
 Insets insets = windowInsets.getInsetsIgnoreVisibility(WindowInsets.Type.navigationBars()
         | WindowInsets.Type.displayCutout());

 int insetsWidth = insets.right + insets.left;
 int insetsHeight = insets.top + insets.bottom;

 // Legacy size that Display#getSize reports
 final Rect bounds = metrics.getBounds();
 final Size legacySize = new Size(bounds.width() - insetsWidth,
         bounds.height() - insetsHeight);

Best Answer

Related Solutions

Android – How to lazy load images in ListView in Android

Android – How to get screen dimensions as pixels in Android

Related Topic