Code Painters The Art of Coding

13Feb/116

Reverse engineering Android applications – part I

Have you ever wondered how difficult or easy is it to reverse engineer an Android application? The task is surprisingly easy with all the tools available these days. Let's try to reverse engineer a sample open-source application available for download from Android Market - Barcode Scanner. For all the experiments I've used my HTC Desire.

Obtaining the APK file

The first step is to obtain our victim's APK file. Let's try to do it manually first. All that's needed is the name of the the application package (com.google.zxing.client.android in our case), which serves as a unique application ID (I've simply extracted it from the Android Market URL).

Android phone stores all the APKs under /data/app/ (for applications installed on the phone), so the following command can be used to grab the file (note the -1 in the name, it can be -2 as well - try both):

czajnik@czajnik:~/reveng$ adb pull /data/app/com.google.zxing.client.android-1.apk
1061 KB/s (508210 bytes in 0.467s)

Another way is to use ASTRO File Manager's application backup option. Open the file manager, go to application's menu, select Tools, then Application Manager/Backup, select the applications you want to retrieve and press Backup. That's it, now you can grab the APK files from your SD card's /backup folder. This can't be any easier.

Contents

APK file is simply a zip file, so extracting the contents is as easy as:

czajnik@czajnik:~/reveng$ unzip com.google.zxing.client.android-1.apk
Archive:  com.google.zxing.client.android-1.apk
  inflating: META-INF/MANIFEST.MF    
  inflating: META-INF/ZXING.SF       
  inflating: META-INF/ZXING.RSA      
  inflating: assets/html/about1d.html  
  inflating: assets/html/about2d.html  
  inflating: assets/html/index.html  
  inflating: assets/html/scanning.html  
  inflating: assets/html/sharing.html  
  inflating: assets/html/style.css   
  inflating: assets/html/whatsnew.html  
 extracting: assets/images/big-1d.png  
 extracting: assets/images/big-qr.png  
 extracting: assets/images/contact-results-screen.jpg  
 extracting: assets/images/demo-no.png  
 extracting: assets/images/demo-yes.png  
 extracting: assets/images/scan-example.png  
 extracting: assets/images/scan-from-phone.png  
 extracting: assets/images/search-book-contents.jpg  
 extracting: res/drawable/launcher_icon.png  
 extracting: res/drawable/share_via_barcode.png  
 extracting: res/drawable/shopper_icon.png  
  inflating: res/layout/bookmark_picker_list_item.xml  
  inflating: res/layout/capture.xml  
  inflating: res/layout/encode.xml   
  inflating: res/layout/help.xml     
  inflating: res/layout/main.xml     
  inflating: res/layout/network.xml  
  inflating: res/layout/search_book_contents.xml  
  inflating: res/layout/search_book_contents_header.xml  
  inflating: res/layout/search_book_contents_list_item.xml  
  inflating: res/layout/share.xml    
 extracting: res/raw/beep.ogg        
  inflating: res/xml/preferences.xml  
  inflating: AndroidManifest.xml     
 extracting: resources.arsc          
 extracting: res/drawable-hdpi/icon.png  
 extracting: res/drawable-hdpi/launcher_icon.png  
 extracting: res/drawable-hdpi/shopper_icon.png  
  inflating: res/layout-ldpi/capture.xml  
  inflating: res/layout-land/encode.xml  
  inflating: res/layout-land/share.xml  
  inflating: classes.dex

Let's take a closer look at the contents of our APK. The important parts are:

  • assets/ - this directory contains application assets, stored without any modifications
  • res/ - directory with application resources (layouts, drawables, etc.)
  • AndroidManifest.xml - application's manifest, well known to every Android developer
  • resources.arsc - compiled resources file (see below)
  • classes.dex - the most important file for us, contains application's code

Note, that XML files (manifest, layouts, other XML resources) are stored in a binary format. The files can be decoded using AXMLPrinter2.jar:

czajnik@czajnik:~/reveng$ for i in `find . -type f -name '*xml'` ; do 
    java -jar ~/AXMLPrinter2.jar $i > $i.txt && mv $i.txt $i ; done
czajnik@czajnik:~/reveng$ more res/layout/main.xml 
<?xml version="1.0" encoding="utf-8"?>
<LinearLayout
	xmlns:android="http://schemas.android.com/apk/res/android"
	android:orientation="1"
	android:layout_width="-1"
	android:layout_height="-1"
	>
	<TextView
		android:layout_width="-1"
		android:layout_height="-2"
		android:text="Hello World, CaptureActivity"
		>
	</TextView>
</LinearLayout>

The code

Now is the time to dig into the most interesting file, classes.dex, containing ale the application code compiled and converted into Dalvik Executable format. The simplest thing to do is to dump the file using dexdump (part of the Android SDK):

czajnik@czajnik:~/reveng$ dexdump -d classes.dex > classes.dasm

The output is a Dalvik assembly listing, fairly enough for some simpler tasks. Analyzing it obviously requires some knowledge of the Dalvik opcodes. If assembly level is enough, smali and dedexer tools are also worth trying.

If assembly-level analysis is too hard, there's a great tool called dex2jar, which converts classes.dex back into a jar file with regular Java classes inside, the jar file can then be decompiled using any good Java decompiler (JD being my favorite). Usage is pretty straightforward:

czajnik@czajnik:~/reveng$ ../dex2jar-0.0.7.8-SNAPSHOT/dex2jar.sh classes.dex 
version:0.0.7.8-SNAPSHOT
4 [main] INFO pxb.android.dex2jar.v3.Main - dex2jar classes.dex -> classes.dex.dex2jar.jar
Done.

Now classes.dex.dex2jar.jar can be decompiled using regular Java decompiler.

Lost in translation

Note, that the decompiler output isn't perfect - some methods fail to decompile, or the source code requires manual corrections in order to be compilable again. I believe it's mostly caused by dex2jar imperfections, as JD decompiler usually does a decent job.

Another source of difficulties is the code obfuscation, very common in Android applications. Needless to say, code obfuscation is there for the very reason of making reverse engineering harder (it usually also introduces some optimizations, yet protecting from reverse engineering is probably more important). The best way to deal with obfuscated Java code (once it is compilable) is using some good IDE with refactoring capabilities to successively rename classes and methods until the code is easier to follow. Some information, like class and variable names, is gone forever, of course.

Last but not least, symbolic constant names are also lost after decompiling, like in this sample:

        Intent localIntent1 = new Intent("android.intent.action.SEND", localUri2);
        Intent localIntent2 = localIntent1.addFlags(524288);
        String str = this.activity.getResources().getString(2131230758);

In case of Android code, loosing the names of resource IDs is especially painful, it requires extra work to match the code with the application resources again. This task will be described in the second part of my mini tutorial.

Share:
  • Facebook
  • Digg
  • del.icio.us
  • Twitter
  • LinkedIn
  • Google Bookmarks
  • Reddit
  • StumbleUpon
Comments (6) Trackbacks (0)
  1. Hello, thank for your work ! it help me a lot but can you explain me please how to modify the java source code ?

    • I have to apologize that there’s no second part of the tutorial yet, however I’m way too busy recently to post on my blog. The answer to your question depends on what you exactly want to modify, and how complex the code itself is. One way is to fully recover the original source code and patch it as any other Java application. Most of the times it’s too much time consuming, however. For quick hacks the best way is to patch the code at the assembly level, but well, this requires Dalvik understanding and some hacking experience in general.

  2. Hi,

    Can you give me some pointers how to match the code with the application resources again, after decompiling the code.

    Tnx

    • Never mind, found it. They are in generated R.class …
      eg …

      public static final int albums_delete = 2131165200;

      Anyway tnx for this tutorial.

      • I’m happy it was useful for you – actually it is the most popular page of my tiny blog 🙂 I see people trying to get the 2nd part by changing the URL manually, but all they get is 404, unfortunately – I’m way too busy developing software to continue the tutorial.

  3. I am a developer of android barcode generator,your blog help me a lot.As we know, Android apps are written in Java. In Java, no matter what you do, it is impossible to protect compiled code from decompilation or reverse-engineering.How to lock compiled Java classes to prevent decompilation? suggests.


Leave a comment

No trackbacks yet.