]> andersk Git - udis86.git/blame - docs/manual/manual.xml
new docbook/xml documentation, updated webpage
[udis86.git] / docs / manual / manual.xml
CommitLineData
7d488e4d 1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "file:///opt/local/share/xml/docbook/4.5/docbookx.dtd">
4<article>
5<articleinfo>
6 <title>Udis86 Manual</title>
7 <author><firstname>Vivek</firstname><surname>Thampi</surname></author>
8 <authorinitials>vmt</authorinitials>
9 <pubdate>2009</pubdate>
10 <titleabbrev>Udis86 Manual</titleabbrev>
11 <revhistory>
12 <revision>
13 <revnumber>1.0</revnumber>
14 <date>02 May 2009</date>
15 </revision>
16 </revhistory>
17</articleinfo>
18
19
20<section><title>Introduction</title>
21Udis86 is a disassembler engine that interprets and decodes a
22stream of binary machine code bytes as opcodes defined in the
23x86 and x86-64 class of Instruction Set Archictures. The core
24component of this project is libudis86 which provides a clean
25and simple interface to disassemble binary code, and to inspect
26the disassembly to various degrees of details. The library is
27designed to aid software projects that entail analysis and
28manipulation of all flavors of x86 binary code.
29</section>
30
31<section><title>Getting Started</title>
32
33<section><title>Building and Installing udis86</title>
34udis86 is developed for unix-like environments, and like most
35software, the basic steps towards building and installing
36it are as follows.
37<screen>
38<prompt>$</prompt> <userinput>./configure</userinput>
39<prompt>$</prompt> <userinput>make</userinput>
40<prompt>$</prompt> <userinput>make install</userinput>
41</screen>
42Depending on your choice of install location, you may need to
43have root privileges to do a make install. The install scripts
44copy the necessary header and library files to appropriate
45locations on the system.
46</section>
47
48<section><title>Interfacing with libudis86: A Quick Example</title>
49 The following code is an example of a program that interfaces
50 with libudis86 and uses the API to generate assembly language
51 output for 64-bit code, input from STDIN.
52 <example><title>libudis86 Usage Example</title>
53<programlisting>#include &lt;stdio.h&gt;
54#include &lt;udis86.h&gt;
55
56int main()
57{
58 ud_t ud_obj;
59
60 ud_init(&amp;ud_obj);
61 ud_set_input_file(&amp;ud_obj, stdin);
62 ud_set_mode(&amp;ud_obj, 64);
63 ud_set_syntax(&amp;ud_obj, UD_SYN_INTEL);
64
65 while (ud_disassemble(&amp;ud_obj)) {
66 printf("\t%s\n", ud_insn_asm(&amp;ud_obj));
67 }
68
69 return 0;
70} </programlisting></example>
71 To compile the program (using gcc):
72 <screen> <prompt>$</prompt> <userinput>gcc -ludis86 example.c -o example</userinput> </screen>
73 This example should give you an idea of how this library can be used. The following
74 sections describe, in detail, the complete API of libudis86.
75</section>
76</section>
77
78<section><title>libudis86 Programming Interface</title>
79 <section><title>ud_t: udis86 object</title>
80 libudis86 is reentrant, and to maintain that property it does not use static data.
81 All data related to the disassembly are stored in a single object, called the udis86
82 object <literal>ud_t</literal> (<literal>struct ud</literal>). So, to use libudis86
83 you must create an instance of this object,
84 <programlisting> ud_t ud_obj; </programlisting>
85 and initialize it,
86 <programlisting> ud_init(&amp;ud_obj); </programlisting>
87 You can create multiple such objects and use with the library, each one maintaining
88 it's own disassembly state.
89 </section>
90
91 <section><title>Examining Instructions</title>
92
93 <para>libudis86 exposes decoded instructions in an intermediate form meant
94 to be useful for programs that want to examine them. This intermediate form
95 is available as values of certain fields of the <literal>ud_t</literal>
96 udis86 object used to disassemble the instruction, as described below.</para>
97
98 <section><title>Instruction Pointer</title>
99 <para>The program counter (eip/rip) value at which the instruction was
100 decoded, is available in <literal>ud_obj.pc</literal></para>
101 </section>
102
103 <section><title>Instruction Prefixes</title>
104 <para>Prefix bytes that affect the disassembly of the instruction
105 are availabe in the following fields, each of which corressponding
106 to particular type or class of prefixes.
107 <itemizedlist id="hhllo">
108 <listitem><literal>ud_obj.pfx_rex</literal> - 64-bit mode REX prefix</listitem>
109 <listitem><literal>ud_obj.pfx_seg</literal> - Segment register prefix</listitem>
110 <listitem><literal>ud_obj.pfx_opr</literal> - Operand-size prefix (66h)</listitem>
111 <listitem><literal>ud_obj.pfx_adr</literal> - Address-size prefix (67h)</listitem>
112 <listitem><literal>ud_obj.pfx_lock</literal> - Lock prefix</listitem>
113 <listitem><literal>ud_obj.pfx_rep</literal> - Rep prefix</listitem>
114 <listitem><literal>ud_obj.pfx_repe</literal> - Repe prefix</listitem>
115 <listitem><literal>ud_obj.pfx_repne</literal> - Repne prefix</listitem>
116 </itemizedlist>
117 These fields default to <literal>UD_NONE</literal> if the respective
118 prefixes were not found.
119 </para>
120 </section>
121
122 <section><title>Instruction Mnemonic</title>
123 <para>The instruction mnemonic in the form of an enumerated constant
124 (<literal>enum ud_mnemonic_code</literal>) is available in
125 <literal>ud_obj.mnemonic</literal>. As a convention all mnemonic
126 constants are composed by prefixing standard instruction mnemonics
127 with <literal>UD_I</literal>. For example,
128 <literal>UD_Imov</literal>,
129 <literal>UD_Ixor</literal>,
130 <literal>UD_Ijmp</literal>, etc.
131 </para>
132 </section>
133
134 <section><title>Instruction Operands</title>
135 <para>
136 The intermediate form for instruction operands are availabe as
137 an array of objects of type <literal>struct ud_operand</literal>.
138 Given a udis86 object <literal>ud_obj</literal>, the
139 <literal>n</literal>th operand is availabe in
140 <literal>ud_obj.operand[n]</literal>.
141 </para>
142 <para>
143 <literal>struct ud_operand</literal> has the following fields,
144 <itemizedlist>
145 <listitem><literal>type</literal></listitem>
146 <listitem><literal>size</literal></listitem>
147 <listitem><literal>base</literal></listitem>
148 <listitem><literal>index</literal></listitem>
149 <listitem><literal>scale</literal></listitem>
150 <listitem><literal>offset</literal></listitem>
151 <listitem><literal>lval</literal></listitem>
152 </itemizedlist>
153 </para>
154
155 <para>
156 The <literal>type</literal> and <literal>size</literal> fields
157 determine the type and size of the operand, respectively. The
158 possible types of operands are,
159 </para>
160
161 <itemizedlist>
162
163 <listitem><literal>UD_NONE</literal>
164 <para>
165 No operand.
166 </para>
167 </listitem>
168
169 <listitem><literal>UD_OP_MEM</literal>
170 <para>
171 Memory operand. The intermediate form normalizes all memory
172 address equations to the scale-index-base form. The address
173 equation is availabe in
174 <literal>base</literal>,
175 <literal>index</literal>, and
176 <literal>scale</literal>.
177 If the <literal>offset</literal> field has a non-zero value
178 (one of 8, 16, 32, and 64), <literal>lval</literal> will
179 contain the memory offset. Note that <literal>base</literal>
180 and <literal>index</literal> fields contain the base and
181 index register of the address equation, in the form of an
182 enumerated constant <literal>enum ud_type</literal>.
183 <literal>scale</literal> contains an integer value that
184 the index register must be scaled by.
185 </para>
186 </listitem>
187
188 <listitem><literal>UD_OP_PTR</literal>
189 <para>A Segmet:Offset pointer operand.
190 <literal>size</literal> can have two values 32 (for 16:16 seg:off)
191 and 48 (for 16:32 seg:off). The value is available in
192 <literal>lval</literal> (<literal>lval.ptr.seg</literal> and <literal>lval.ptr.off</literal>.)
193 </para>
194 </listitem>
195
196 <listitem><literal>UD_OP_IMM</literal>
197 <para>
198 Immediate operand. Value available in <literal>lval</literal>.
199 </para>
200 </listitem>
201
202 <listitem><literal>UD_OP_JIMM</literal>
203 <para>
204 Immediate operand to branch instruction (relative offsets).
205 Value available in <literal>lval</literal>.
206 </para>
207 </listitem>
208
209 <listitem><literal>UD_OP_CONST</literal>
210 <para>
211 Implicit constant operand.
212 Value available in <literal>lval</literal>.
213 </para>
214 </listitem>
215
216 <listitem>
217 <literal>UD_OP_REG</literal>
218 <para>
219 Operand is a register. The specific register is contained in
220 <literal>base</literal> in the form of an enumerated constant,
221 <literal>enum ud_type</literal>.
222 </para>
223 </listitem>
224 </itemizedlist>
225
226 <para>The <literal>lval</literal> is a union data structure that
227 aggregates integer fields of different sizes, that store values
228 depending on the type of operand.
229 <itemizedlist>
230 <listitem><literal>lval.sbyte</literal> - Signed Byte</listitem>
231 <listitem><literal>lval.ubyte</literal> - Unsigned Byte</listitem>
232 <listitem><literal>lval.sword</literal> - Signed Word</listitem>
233 <listitem><literal>lval.uword</literal> - Unsigned Word</listitem>
234 <listitem><literal>lval.sdword</literal> - Signed Double Word</listitem>
235 <listitem><literal>lval.udword</literal> - Unsigned Double Word</listitem>
236 <listitem><literal>lval.sqword</literal> - Signed Quad Word</listitem>
237 <listitem><literal>lval.uqword</literal> - Unsigned Quad Word</listitem>
238 <listitem><literal>lval.ptr.seg</literal> - Pointer Segment in Segment:Offset</listitem>
239 <listitem><literal>lval.ptr.off</literal> - Pointer Offset in Segment:Offset </listitem>
240 </itemizedlist>
241 </para>
242
243 <para>The following enumerated constants (<literal>enum ud_type</literal>)
244 are possible values for <literal>base</literal> and <literal>index</literal>.
245 Note that a value of <literal>UD_NONE</literal> simply means that the
246 field is not valid for the current instruction.
247 </para>
248
249 <programlisting>
250 UD_NONE,
251
252 /* 8 bit GPRs */
253 UD_R_AL, UD_R_CL, UD_R_DL, UD_R_BL,
254 UD_R_AH, UD_R_CH, UD_R_DH, UD_R_BH,
255 UD_R_SPL, UD_R_BPL, UD_R_SIL, UD_R_DIL,
256 UD_R_R8B, UD_R_R9B, UD_R_R10B, UD_R_R11B,
257 UD_R_R12B, UD_R_R13B, UD_R_R14B, UD_R_R15B,
258
259 /* 16 bit GPRs */
260 UD_R_AX, UD_R_CX, UD_R_DX, UD_R_BX,
261 UD_R_SP, UD_R_BP, UD_R_SI, UD_R_DI,
262 UD_R_R8W, UD_R_R9W, UD_R_R10W, UD_R_R11W,
263 UD_R_R12W, UD_R_R13W, UD_R_R14W, UD_R_R15W,
264
265 /* 32 bit GPRs */
266 UD_R_EAX, UD_R_ECX, UD_R_EDX, UD_R_EBX,
267 UD_R_ESP, UD_R_EBP, UD_R_ESI, UD_R_EDI,
268 UD_R_R8D, UD_R_R9D, UD_R_R10D, UD_R_R11D,
269 UD_R_R12D, UD_R_R13D, UD_R_R14D, UD_R_R15D,
270
271 /* 64 bit GPRs */
272 UD_R_RAX, UD_R_RCX, UD_R_RDX, UD_R_RBX,
273 UD_R_RSP, UD_R_RBP, UD_R_RSI, UD_R_RDI,
274 UD_R_R8, UD_R_R9, UD_R_R10, UD_R_R11,
275 UD_R_R12, UD_R_R13, UD_R_R14, UD_R_R15,
276
277 /* segment registers */
278 UD_R_ES, UD_R_CS, UD_R_SS, UD_R_DS,
279 UD_R_FS, UD_R_GS,
280
281 /* control registers*/
282 UD_R_CR0, UD_R_CR1, UD_R_CR2, UD_R_CR3,
283 UD_R_CR4, UD_R_CR5, UD_R_CR6, UD_R_CR7,
284 UD_R_CR8, UD_R_CR9, UD_R_CR10, UD_R_CR11,
285 UD_R_CR12, UD_R_CR13, UD_R_CR14, UD_R_CR15,
286
287 /* debug registers */
288 UD_R_DR0, UD_R_DR1, UD_R_DR2, UD_R_DR3,
289 UD_R_DR4, UD_R_DR5, UD_R_DR6, UD_R_DR7,
290 UD_R_DR8, UD_R_DR9, UD_R_DR10, UD_R_DR11,
291 UD_R_DR12, UD_R_DR13, UD_R_DR14, UD_R_DR15,
292
293 /* mmx registers */
294 UD_R_MM0, UD_R_MM1, UD_R_MM2, UD_R_MM3,
295 UD_R_MM4, UD_R_MM5, UD_R_MM6, UD_R_MM7,
296
297 /* x87 registers */
298 UD_R_ST0, UD_R_ST1, UD_R_ST2, UD_R_ST3,
299 UD_R_ST4, UD_R_ST5, UD_R_ST6, UD_R_ST7,
300
301 /* extended multimedia registers */
302 UD_R_XMM0, UD_R_XMM1, UD_R_XMM2, UD_R_XMM3,
303 UD_R_XMM4, UD_R_XMM5, UD_R_XMM6, UD_R_XMM7,
304 UD_R_XMM8, UD_R_XMM9, UD_R_XMM10, UD_R_XMM11,
305 UD_R_XMM12, UD_R_XMM13, UD_R_XMM14, UD_R_XMM15,
306
307 /* eip/rip */
308 UD_R_RIP </programlisting>
309
310 </section>
311 </section>
312
313 <section><title>Function Reference</title>
314 <itemizedlist>
315
316 <listitem>
317 <literal>void ud_init (ud_t* ud_obj)</literal>
318 <para>
319 <literal>ud_t</literal> object initializer. This function must be called on a
320 udis86 object before it can used anywhere else.
321 </para>
322 </listitem>
323
324 <listitem>
325 <literal>void ud_set_input_hook(ud_t* ud_obj, int (*hook)())</literal>
326 <para> This function sets the input source for the library. To retrieve each byte in
327 the stream, libudis86 calls back the function pointed to by <literal>hook</literal>.
328 The hook function, defined by the user code, must return a single byte of code
329 each time it is called. To signal end-of-input, it must return the constant,
330 <literal>UD_EOI</literal>.</para>
331 </listitem>
332
333 <listitem>
334 <literal>void ud_set_user_opaque_data(ud_t* ud_obj, void* opaque);</literal>
335 <para>Associates a pointer with the udis86 object to be retrieved and used in user
336 functions, such as the input hook callback function.</para>
337 </listitem>
338
339 <listitem>
340 <literal>void* ud_get_user_opaque_data(ud_t* ud_obj);</literal>
341 <para>This function returns any pointer associated with the udis86 object, using
342 the <literal>ud_set_opaque_data</literal> function.</para>
343 </listitem>
344
345 <listitem>
346 <literal>void ud_set_input_buffer(ud_t* ud_obj, unsigned char* buffer, size_t size);</literal>
347 <para>Sets the input source for the library to a buffer of fixed size.</para>
348 </listitem>
349
350 <listitem>
351 <literal>void ud_set_input_file(ud_t* ud_obj, FILE* filep);</literal>
352 <para>This function sets the input source for the library to a file pointed to by the
353 passed FILE pointer. Note that the library does not perform any checks, assuming
354 the file pointer to be properly initialized.</para>
355 </listitem>
356
357 <listitem>
358 <literal>void ud_set_mode(ud_t* ud_obj, uint8_t mode_bits);</literal>
359 <para>Sets the mode of disassembly. Possible values are 16, 32, and 64. By default, the
360 library works in 32bit mode.</para>
361 </listitem>
362
363 <listitem>
364 <literal>void ud_set_pc(ud_t*, uint64_t pc);</literal>
365 <para>Sets the program counter (EIP/RIP). This changes the offset of the assembly output
366 generated, with direct effect on branch instructions.</para>
367 </listitem>
368
369 <listitem>
370 <literal>void ud_set_syntax(ud_t*, void (*translator)(ud_t*));</literal>
371 <para>libudis86 disassembles one instruction at a time into an intermediate form that
372 lets you inspect the instruction and its various aspects individually. But to generate the
373 assembly language output, this intermediate form must be translated. This function sets
374 the translator. There are two inbuilt translators,</para>
375
376 <itemizedlist>
377 <listitem><literal>UD_SYN_INTEL</literal> - for INTEL (NASM-like) syntax.</listitem>
378 <listitem><literal>UD_SYN_ATT</literal> - for AT&amp;T (GAS-like) syntax.</listitem>
379 </itemizedlist>
380
381 <para>If you do not want libudis86 to translate, you can pass
382 <literal>NULL</literal> to the function, with no more translations
383 thereafter. This is particularly useful for cases when you only
384 want to identify chunks of code and then create the assembly output
385 if needed.</para>
386 <para>If you want to create your own translator, you must pass a pointer to function
387 that accepts a pointer to ud_t. This function will be called by libudis86 after each
388 instruction is decoded.</para>
389 </listitem>
390
391 <listitem>
392 <literal>void ud_set_vendor(ud_t*, unsigned vendor);</literal>
393 <para>Sets the vendor of whose instruction to choose from. This is only useful for
394 selecting the VMX or SVM instruction sets at which point INTEL and AMD have diverged
395 significantly. At a later stage, support for a more granular selection of instruction
396 sets maybe added.
397
398 <itemizedlist>
399 <listitem><literal>UD_VENDOR_INTEL</literal> - for INTEL instruction set.</listitem>
400 <listitem><literal>UD_VEDNOR_ATT</literal> - for AMD instruction set.</listitem>
401 </itemizedlist>
402 </para>
403
404 </listitem>
405
406 <listitem>
407 <literal>unsigned int ud_disassemble(ud_t*);</literal>
408 <para>Disassembles the next instruction in the input stream. Returns the number of
409 bytes disassembled. A 0 indicates end of input. Note, to restart disassembly, after
410 the end of input, you must call one of the input setting functions with the new
411 input source.</para>
412 </listitem>
413
414 <listitem>
415 <literal>unsigned int ud_insn_len(ud_t* u);</literal>
416 <para>Returns the number of bytes disassembled.</para>
417 </listitem>
418
419 <listitem>
420 <literal>uint64_t ud_insn_off(ud_t*);</literal>
421 <para>Returns the starting offset of the disassembled instruction relative to the
422 program counter value specified initially.</para>
423 </listitem>
424
425 <listitem>
426 <literal>char* ud_insn_hex(ud_t*);</literal>
427 <para>Returns pointer to character string holding the hexadecimal
428 representation of the disassembled bytes.</para>
429 </listitem>
430
431 <listitem>
432 <literal>uint8_t* ud_insn_ptr(ud_t* u);</literal>
433 <para>Returns pointer to the buffer holding the instruction bytes.
434 Use <literal>ud_insn_len()</literal>, to determine the length of this
435 buffer.</para>
436 </listitem>
437
438 <listitem>
439 <literal>char* ud_insn_asm(ud_t* u);</literal>
440 <para>If the syntax is specified, returns pointer to the character
441 string holding assembly language representation of the disassembled
442 instruction.</para>
443 </listitem>
444
445 <listitem>
446 <literal>void ud_input_skip(ud_t*, size_t n);</literal>
447 <para>Skips n number of bytes in the input stream</para>
448 </listitem>
449
450 </itemizedlist>
451 </section>
452
453
454</section>
455
456</article>
This page took 0.397944 seconds and 5 git commands to generate.