Thursday, November 02, 2006

RE: The Compile Process

Creating a program involves writing [source] code and 'compiling' it into an executable program. This is a programmer's job, in a nutshell, but a lot goes into this process than may be suggested by the previous statement. To be good at software reverse engineering, you must understand how programs are made to be able to disassemble them properly.
For most programs, programmers use a development environment (IDE) to develop code in a language of their choice. My favorite IDEs include Dev-C++ (for C++ development, uses the gcc compiler), Eclipse/JBuilder/Sun Studio/NetBeans (for Java development). IDEs contain tools that convert high-level source code to machine-native instructions.
To build a program, two processes happen: compiling and linking. Compiling involves resolving syntax errors and creating object files. Linking involves using the object files to create a single executable file. In more detail the process in much more involved: from source code -> preprocessing -> parsing -> translation -> assembly -> linking -> loading. The process is mostly the same whether you compile on Windows or Unix - tools may differ. For example, gcc vs. cl.exe (compiler, parser, translator), or as vs. masm.exe (assembers), or ld/collect2 vs. link.exe (linkers). Microsoft compilers may additionally require .lib or .def files in addition to .dll files when linking though.
Interpreted programs like Java or Visual Basic ones are easier to disassemble because everything is initially compiled into bytecodes (for Java). For Java, the runtime environment (JRE) takes care of verifying bytecode, loading classes, and JIT-compiling before executing the resulting native code.
My focus will initially be on reverse-engineering Java programs since I am already a Java programmer. I'd like to work on larger programs thereafter, iTunes on top of my list.