]>
Commit | Line | Data |
---|---|---|
7d488e4d | 1 | <?xml version="1.0" encoding="UTF-8"?> |
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" | |
3 | "file:///opt/local/share/xml/docbook/4.5/docbookx.dtd"> | |
4 | <article> | |
5 | <articleinfo> | |
6 | <title>Udis86 Manual</title> | |
7 | <author><firstname>Vivek</firstname><surname>Thampi</surname></author> | |
8 | <authorinitials>vmt</authorinitials> | |
9 | <pubdate>2009</pubdate> | |
10 | <titleabbrev>Udis86 Manual</titleabbrev> | |
11 | <revhistory> | |
12 | <revision> | |
13 | <revnumber>1.0</revnumber> | |
14 | <date>02 May 2009</date> | |
15 | </revision> | |
16 | </revhistory> | |
17 | </articleinfo> | |
18 | ||
19 | ||
20 | <section><title>Introduction</title> | |
21 | Udis86 is a disassembler engine that interprets and decodes a | |
22 | stream of binary machine code bytes as opcodes defined in the | |
23 | x86 and x86-64 class of Instruction Set Archictures. The core | |
24 | component of this project is libudis86 which provides a clean | |
25 | and simple interface to disassemble binary code, and to inspect | |
26 | the disassembly to various degrees of details. The library is | |
27 | designed to aid software projects that entail analysis and | |
28 | manipulation of all flavors of x86 binary code. | |
29 | </section> | |
30 | ||
31 | <section><title>Getting Started</title> | |
32 | ||
33 | <section><title>Building and Installing udis86</title> | |
34 | udis86 is developed for unix-like environments, and like most | |
35 | software, the basic steps towards building and installing | |
36 | it are as follows. | |
37 | <screen> | |
38 | <prompt>$</prompt> <userinput>./configure</userinput> | |
39 | <prompt>$</prompt> <userinput>make</userinput> | |
40 | <prompt>$</prompt> <userinput>make install</userinput> | |
41 | </screen> | |
42 | Depending on your choice of install location, you may need to | |
43 | have root privileges to do a make install. The install scripts | |
44 | copy the necessary header and library files to appropriate | |
45 | locations on the system. | |
46 | </section> | |
47 | ||
48 | <section><title>Interfacing with libudis86: A Quick Example</title> | |
49 | The following code is an example of a program that interfaces | |
50 | with libudis86 and uses the API to generate assembly language | |
51 | output for 64-bit code, input from STDIN. | |
52 | <example><title>libudis86 Usage Example</title> | |
53 | <programlisting>#include <stdio.h> | |
54 | #include <udis86.h> | |
55 | ||
56 | int main() | |
57 | { | |
58 | ud_t ud_obj; | |
59 | ||
60 | ud_init(&ud_obj); | |
61 | ud_set_input_file(&ud_obj, stdin); | |
62 | ud_set_mode(&ud_obj, 64); | |
63 | ud_set_syntax(&ud_obj, UD_SYN_INTEL); | |
64 | ||
65 | while (ud_disassemble(&ud_obj)) { | |
66 | printf("\t%s\n", ud_insn_asm(&ud_obj)); | |
67 | } | |
68 | ||
69 | return 0; | |
70 | } </programlisting></example> | |
71 | To compile the program (using gcc): | |
72 | <screen> <prompt>$</prompt> <userinput>gcc -ludis86 example.c -o example</userinput> </screen> | |
73 | This example should give you an idea of how this library can be used. The following | |
74 | sections describe, in detail, the complete API of libudis86. | |
75 | </section> | |
76 | </section> | |
77 | ||
78 | <section><title>libudis86 Programming Interface</title> | |
79 | <section><title>ud_t: udis86 object</title> | |
80 | libudis86 is reentrant, and to maintain that property it does not use static data. | |
81 | All data related to the disassembly are stored in a single object, called the udis86 | |
82 | object <literal>ud_t</literal> (<literal>struct ud</literal>). So, to use libudis86 | |
83 | you must create an instance of this object, | |
84 | <programlisting> ud_t ud_obj; </programlisting> | |
85 | and initialize it, | |
86 | <programlisting> ud_init(&ud_obj); </programlisting> | |
87 | You can create multiple such objects and use with the library, each one maintaining | |
88 | it's own disassembly state. | |
89 | </section> | |
90 | ||
91 | <section><title>Examining Instructions</title> | |
92 | ||
93 | <para>libudis86 exposes decoded instructions in an intermediate form meant | |
94 | to be useful for programs that want to examine them. This intermediate form | |
95 | is available as values of certain fields of the <literal>ud_t</literal> | |
96 | udis86 object used to disassemble the instruction, as described below.</para> | |
97 | ||
98 | <section><title>Instruction Pointer</title> | |
99 | <para>The program counter (eip/rip) value at which the instruction was | |
100 | decoded, is available in <literal>ud_obj.pc</literal></para> | |
101 | </section> | |
102 | ||
103 | <section><title>Instruction Prefixes</title> | |
104 | <para>Prefix bytes that affect the disassembly of the instruction | |
105 | are availabe in the following fields, each of which corressponding | |
106 | to particular type or class of prefixes. | |
107 | <itemizedlist id="hhllo"> | |
108 | <listitem><literal>ud_obj.pfx_rex</literal> - 64-bit mode REX prefix</listitem> | |
109 | <listitem><literal>ud_obj.pfx_seg</literal> - Segment register prefix</listitem> | |
110 | <listitem><literal>ud_obj.pfx_opr</literal> - Operand-size prefix (66h)</listitem> | |
111 | <listitem><literal>ud_obj.pfx_adr</literal> - Address-size prefix (67h)</listitem> | |
112 | <listitem><literal>ud_obj.pfx_lock</literal> - Lock prefix</listitem> | |
113 | <listitem><literal>ud_obj.pfx_rep</literal> - Rep prefix</listitem> | |
114 | <listitem><literal>ud_obj.pfx_repe</literal> - Repe prefix</listitem> | |
115 | <listitem><literal>ud_obj.pfx_repne</literal> - Repne prefix</listitem> | |
116 | </itemizedlist> | |
117 | These fields default to <literal>UD_NONE</literal> if the respective | |
118 | prefixes were not found. | |
119 | </para> | |
120 | </section> | |
121 | ||
122 | <section><title>Instruction Mnemonic</title> | |
123 | <para>The instruction mnemonic in the form of an enumerated constant | |
124 | (<literal>enum ud_mnemonic_code</literal>) is available in | |
125 | <literal>ud_obj.mnemonic</literal>. As a convention all mnemonic | |
126 | constants are composed by prefixing standard instruction mnemonics | |
127 | with <literal>UD_I</literal>. For example, | |
128 | <literal>UD_Imov</literal>, | |
129 | <literal>UD_Ixor</literal>, | |
130 | <literal>UD_Ijmp</literal>, etc. | |
131 | </para> | |
132 | </section> | |
133 | ||
134 | <section><title>Instruction Operands</title> | |
135 | <para> | |
136 | The intermediate form for instruction operands are availabe as | |
137 | an array of objects of type <literal>struct ud_operand</literal>. | |
138 | Given a udis86 object <literal>ud_obj</literal>, the | |
139 | <literal>n</literal>th operand is availabe in | |
140 | <literal>ud_obj.operand[n]</literal>. | |
141 | </para> | |
142 | <para> | |
143 | <literal>struct ud_operand</literal> has the following fields, | |
144 | <itemizedlist> | |
145 | <listitem><literal>type</literal></listitem> | |
146 | <listitem><literal>size</literal></listitem> | |
147 | <listitem><literal>base</literal></listitem> | |
148 | <listitem><literal>index</literal></listitem> | |
149 | <listitem><literal>scale</literal></listitem> | |
150 | <listitem><literal>offset</literal></listitem> | |
151 | <listitem><literal>lval</literal></listitem> | |
152 | </itemizedlist> | |
153 | </para> | |
154 | ||
155 | <para> | |
156 | The <literal>type</literal> and <literal>size</literal> fields | |
157 | determine the type and size of the operand, respectively. The | |
158 | possible types of operands are, | |
159 | </para> | |
160 | ||
161 | <itemizedlist> | |
162 | ||
163 | <listitem><literal>UD_NONE</literal> | |
164 | <para> | |
165 | No operand. | |
166 | </para> | |
167 | </listitem> | |
168 | ||
169 | <listitem><literal>UD_OP_MEM</literal> | |
170 | <para> | |
171 | Memory operand. The intermediate form normalizes all memory | |
172 | address equations to the scale-index-base form. The address | |
173 | equation is availabe in | |
174 | <literal>base</literal>, | |
175 | <literal>index</literal>, and | |
176 | <literal>scale</literal>. | |
177 | If the <literal>offset</literal> field has a non-zero value | |
178 | (one of 8, 16, 32, and 64), <literal>lval</literal> will | |
179 | contain the memory offset. Note that <literal>base</literal> | |
180 | and <literal>index</literal> fields contain the base and | |
181 | index register of the address equation, in the form of an | |
182 | enumerated constant <literal>enum ud_type</literal>. | |
183 | <literal>scale</literal> contains an integer value that | |
184 | the index register must be scaled by. | |
185 | </para> | |
186 | </listitem> | |
187 | ||
188 | <listitem><literal>UD_OP_PTR</literal> | |
189 | <para>A Segmet:Offset pointer operand. | |
190 | <literal>size</literal> can have two values 32 (for 16:16 seg:off) | |
191 | and 48 (for 16:32 seg:off). The value is available in | |
192 | <literal>lval</literal> (<literal>lval.ptr.seg</literal> and <literal>lval.ptr.off</literal>.) | |
193 | </para> | |
194 | </listitem> | |
195 | ||
196 | <listitem><literal>UD_OP_IMM</literal> | |
197 | <para> | |
198 | Immediate operand. Value available in <literal>lval</literal>. | |
199 | </para> | |
200 | </listitem> | |
201 | ||
202 | <listitem><literal>UD_OP_JIMM</literal> | |
203 | <para> | |
204 | Immediate operand to branch instruction (relative offsets). | |
205 | Value available in <literal>lval</literal>. | |
206 | </para> | |
207 | </listitem> | |
208 | ||
209 | <listitem><literal>UD_OP_CONST</literal> | |
210 | <para> | |
211 | Implicit constant operand. | |
212 | Value available in <literal>lval</literal>. | |
213 | </para> | |
214 | </listitem> | |
215 | ||
216 | <listitem> | |
217 | <literal>UD_OP_REG</literal> | |
218 | <para> | |
219 | Operand is a register. The specific register is contained in | |
220 | <literal>base</literal> in the form of an enumerated constant, | |
221 | <literal>enum ud_type</literal>. | |
222 | </para> | |
223 | </listitem> | |
224 | </itemizedlist> | |
225 | ||
226 | <para>The <literal>lval</literal> is a union data structure that | |
227 | aggregates integer fields of different sizes, that store values | |
228 | depending on the type of operand. | |
229 | <itemizedlist> | |
230 | <listitem><literal>lval.sbyte</literal> - Signed Byte</listitem> | |
231 | <listitem><literal>lval.ubyte</literal> - Unsigned Byte</listitem> | |
232 | <listitem><literal>lval.sword</literal> - Signed Word</listitem> | |
233 | <listitem><literal>lval.uword</literal> - Unsigned Word</listitem> | |
234 | <listitem><literal>lval.sdword</literal> - Signed Double Word</listitem> | |
235 | <listitem><literal>lval.udword</literal> - Unsigned Double Word</listitem> | |
236 | <listitem><literal>lval.sqword</literal> - Signed Quad Word</listitem> | |
237 | <listitem><literal>lval.uqword</literal> - Unsigned Quad Word</listitem> | |
238 | <listitem><literal>lval.ptr.seg</literal> - Pointer Segment in Segment:Offset</listitem> | |
239 | <listitem><literal>lval.ptr.off</literal> - Pointer Offset in Segment:Offset </listitem> | |
240 | </itemizedlist> | |
241 | </para> | |
242 | ||
243 | <para>The following enumerated constants (<literal>enum ud_type</literal>) | |
244 | are possible values for <literal>base</literal> and <literal>index</literal>. | |
245 | Note that a value of <literal>UD_NONE</literal> simply means that the | |
246 | field is not valid for the current instruction. | |
247 | </para> | |
248 | ||
249 | <programlisting> | |
250 | UD_NONE, | |
251 | ||
252 | /* 8 bit GPRs */ | |
253 | UD_R_AL, UD_R_CL, UD_R_DL, UD_R_BL, | |
254 | UD_R_AH, UD_R_CH, UD_R_DH, UD_R_BH, | |
255 | UD_R_SPL, UD_R_BPL, UD_R_SIL, UD_R_DIL, | |
256 | UD_R_R8B, UD_R_R9B, UD_R_R10B, UD_R_R11B, | |
257 | UD_R_R12B, UD_R_R13B, UD_R_R14B, UD_R_R15B, | |
258 | ||
259 | /* 16 bit GPRs */ | |
260 | UD_R_AX, UD_R_CX, UD_R_DX, UD_R_BX, | |
261 | UD_R_SP, UD_R_BP, UD_R_SI, UD_R_DI, | |
262 | UD_R_R8W, UD_R_R9W, UD_R_R10W, UD_R_R11W, | |
263 | UD_R_R12W, UD_R_R13W, UD_R_R14W, UD_R_R15W, | |
264 | ||
265 | /* 32 bit GPRs */ | |
266 | UD_R_EAX, UD_R_ECX, UD_R_EDX, UD_R_EBX, | |
267 | UD_R_ESP, UD_R_EBP, UD_R_ESI, UD_R_EDI, | |
268 | UD_R_R8D, UD_R_R9D, UD_R_R10D, UD_R_R11D, | |
269 | UD_R_R12D, UD_R_R13D, UD_R_R14D, UD_R_R15D, | |
270 | ||
271 | /* 64 bit GPRs */ | |
272 | UD_R_RAX, UD_R_RCX, UD_R_RDX, UD_R_RBX, | |
273 | UD_R_RSP, UD_R_RBP, UD_R_RSI, UD_R_RDI, | |
274 | UD_R_R8, UD_R_R9, UD_R_R10, UD_R_R11, | |
275 | UD_R_R12, UD_R_R13, UD_R_R14, UD_R_R15, | |
276 | ||
277 | /* segment registers */ | |
278 | UD_R_ES, UD_R_CS, UD_R_SS, UD_R_DS, | |
279 | UD_R_FS, UD_R_GS, | |
280 | ||
281 | /* control registers*/ | |
282 | UD_R_CR0, UD_R_CR1, UD_R_CR2, UD_R_CR3, | |
283 | UD_R_CR4, UD_R_CR5, UD_R_CR6, UD_R_CR7, | |
284 | UD_R_CR8, UD_R_CR9, UD_R_CR10, UD_R_CR11, | |
285 | UD_R_CR12, UD_R_CR13, UD_R_CR14, UD_R_CR15, | |
286 | ||
287 | /* debug registers */ | |
288 | UD_R_DR0, UD_R_DR1, UD_R_DR2, UD_R_DR3, | |
289 | UD_R_DR4, UD_R_DR5, UD_R_DR6, UD_R_DR7, | |
290 | UD_R_DR8, UD_R_DR9, UD_R_DR10, UD_R_DR11, | |
291 | UD_R_DR12, UD_R_DR13, UD_R_DR14, UD_R_DR15, | |
292 | ||
293 | /* mmx registers */ | |
294 | UD_R_MM0, UD_R_MM1, UD_R_MM2, UD_R_MM3, | |
295 | UD_R_MM4, UD_R_MM5, UD_R_MM6, UD_R_MM7, | |
296 | ||
297 | /* x87 registers */ | |
298 | UD_R_ST0, UD_R_ST1, UD_R_ST2, UD_R_ST3, | |
299 | UD_R_ST4, UD_R_ST5, UD_R_ST6, UD_R_ST7, | |
300 | ||
301 | /* extended multimedia registers */ | |
302 | UD_R_XMM0, UD_R_XMM1, UD_R_XMM2, UD_R_XMM3, | |
303 | UD_R_XMM4, UD_R_XMM5, UD_R_XMM6, UD_R_XMM7, | |
304 | UD_R_XMM8, UD_R_XMM9, UD_R_XMM10, UD_R_XMM11, | |
305 | UD_R_XMM12, UD_R_XMM13, UD_R_XMM14, UD_R_XMM15, | |
306 | ||
307 | /* eip/rip */ | |
308 | UD_R_RIP </programlisting> | |
309 | ||
310 | </section> | |
311 | </section> | |
312 | ||
313 | <section><title>Function Reference</title> | |
314 | <itemizedlist> | |
315 | ||
316 | <listitem> | |
317 | <literal>void ud_init (ud_t* ud_obj)</literal> | |
318 | <para> | |
319 | <literal>ud_t</literal> object initializer. This function must be called on a | |
320 | udis86 object before it can used anywhere else. | |
321 | </para> | |
322 | </listitem> | |
323 | ||
324 | <listitem> | |
325 | <literal>void ud_set_input_hook(ud_t* ud_obj, int (*hook)())</literal> | |
326 | <para> This function sets the input source for the library. To retrieve each byte in | |
327 | the stream, libudis86 calls back the function pointed to by <literal>hook</literal>. | |
328 | The hook function, defined by the user code, must return a single byte of code | |
329 | each time it is called. To signal end-of-input, it must return the constant, | |
330 | <literal>UD_EOI</literal>.</para> | |
331 | </listitem> | |
332 | ||
333 | <listitem> | |
334 | <literal>void ud_set_user_opaque_data(ud_t* ud_obj, void* opaque);</literal> | |
335 | <para>Associates a pointer with the udis86 object to be retrieved and used in user | |
336 | functions, such as the input hook callback function.</para> | |
337 | </listitem> | |
338 | ||
339 | <listitem> | |
340 | <literal>void* ud_get_user_opaque_data(ud_t* ud_obj);</literal> | |
341 | <para>This function returns any pointer associated with the udis86 object, using | |
342 | the <literal>ud_set_opaque_data</literal> function.</para> | |
343 | </listitem> | |
344 | ||
345 | <listitem> | |
346 | <literal>void ud_set_input_buffer(ud_t* ud_obj, unsigned char* buffer, size_t size);</literal> | |
347 | <para>Sets the input source for the library to a buffer of fixed size.</para> | |
348 | </listitem> | |
349 | ||
350 | <listitem> | |
351 | <literal>void ud_set_input_file(ud_t* ud_obj, FILE* filep);</literal> | |
352 | <para>This function sets the input source for the library to a file pointed to by the | |
353 | passed FILE pointer. Note that the library does not perform any checks, assuming | |
354 | the file pointer to be properly initialized.</para> | |
355 | </listitem> | |
356 | ||
357 | <listitem> | |
358 | <literal>void ud_set_mode(ud_t* ud_obj, uint8_t mode_bits);</literal> | |
359 | <para>Sets the mode of disassembly. Possible values are 16, 32, and 64. By default, the | |
360 | library works in 32bit mode.</para> | |
361 | </listitem> | |
362 | ||
363 | <listitem> | |
364 | <literal>void ud_set_pc(ud_t*, uint64_t pc);</literal> | |
365 | <para>Sets the program counter (EIP/RIP). This changes the offset of the assembly output | |
366 | generated, with direct effect on branch instructions.</para> | |
367 | </listitem> | |
368 | ||
369 | <listitem> | |
370 | <literal>void ud_set_syntax(ud_t*, void (*translator)(ud_t*));</literal> | |
371 | <para>libudis86 disassembles one instruction at a time into an intermediate form that | |
372 | lets you inspect the instruction and its various aspects individually. But to generate the | |
373 | assembly language output, this intermediate form must be translated. This function sets | |
374 | the translator. There are two inbuilt translators,</para> | |
375 | ||
376 | <itemizedlist> | |
377 | <listitem><literal>UD_SYN_INTEL</literal> - for INTEL (NASM-like) syntax.</listitem> | |
378 | <listitem><literal>UD_SYN_ATT</literal> - for AT&T (GAS-like) syntax.</listitem> | |
379 | </itemizedlist> | |
380 | ||
381 | <para>If you do not want libudis86 to translate, you can pass | |
382 | <literal>NULL</literal> to the function, with no more translations | |
383 | thereafter. This is particularly useful for cases when you only | |
384 | want to identify chunks of code and then create the assembly output | |
385 | if needed.</para> | |
386 | <para>If you want to create your own translator, you must pass a pointer to function | |
387 | that accepts a pointer to ud_t. This function will be called by libudis86 after each | |
388 | instruction is decoded.</para> | |
389 | </listitem> | |
390 | ||
391 | <listitem> | |
392 | <literal>void ud_set_vendor(ud_t*, unsigned vendor);</literal> | |
393 | <para>Sets the vendor of whose instruction to choose from. This is only useful for | |
394 | selecting the VMX or SVM instruction sets at which point INTEL and AMD have diverged | |
395 | significantly. At a later stage, support for a more granular selection of instruction | |
396 | sets maybe added. | |
397 | ||
398 | <itemizedlist> | |
399 | <listitem><literal>UD_VENDOR_INTEL</literal> - for INTEL instruction set.</listitem> | |
400 | <listitem><literal>UD_VEDNOR_ATT</literal> - for AMD instruction set.</listitem> | |
401 | </itemizedlist> | |
402 | </para> | |
403 | ||
404 | </listitem> | |
405 | ||
406 | <listitem> | |
407 | <literal>unsigned int ud_disassemble(ud_t*);</literal> | |
408 | <para>Disassembles the next instruction in the input stream. Returns the number of | |
409 | bytes disassembled. A 0 indicates end of input. Note, to restart disassembly, after | |
410 | the end of input, you must call one of the input setting functions with the new | |
411 | input source.</para> | |
412 | </listitem> | |
413 | ||
414 | <listitem> | |
415 | <literal>unsigned int ud_insn_len(ud_t* u);</literal> | |
416 | <para>Returns the number of bytes disassembled.</para> | |
417 | </listitem> | |
418 | ||
419 | <listitem> | |
420 | <literal>uint64_t ud_insn_off(ud_t*);</literal> | |
421 | <para>Returns the starting offset of the disassembled instruction relative to the | |
422 | program counter value specified initially.</para> | |
423 | </listitem> | |
424 | ||
425 | <listitem> | |
426 | <literal>char* ud_insn_hex(ud_t*);</literal> | |
427 | <para>Returns pointer to character string holding the hexadecimal | |
428 | representation of the disassembled bytes.</para> | |
429 | </listitem> | |
430 | ||
431 | <listitem> | |
432 | <literal>uint8_t* ud_insn_ptr(ud_t* u);</literal> | |
433 | <para>Returns pointer to the buffer holding the instruction bytes. | |
434 | Use <literal>ud_insn_len()</literal>, to determine the length of this | |
435 | buffer.</para> | |
436 | </listitem> | |
437 | ||
438 | <listitem> | |
439 | <literal>char* ud_insn_asm(ud_t* u);</literal> | |
440 | <para>If the syntax is specified, returns pointer to the character | |
441 | string holding assembly language representation of the disassembled | |
442 | instruction.</para> | |
443 | </listitem> | |
444 | ||
445 | <listitem> | |
446 | <literal>void ud_input_skip(ud_t*, size_t n);</literal> | |
447 | <para>Skips n number of bytes in the input stream</para> | |
448 | </listitem> | |
449 | ||
450 | </itemizedlist> | |
451 | </section> | |
452 | ||
453 | ||
454 | </section> | |
455 | ||
456 | </article> |