libdisasm

x86 Disassembler Library




what is it?

The libdisasm library provides basic disassembly of Intel x86 instructions from a binary stream. The intent is to provide an easy to use disassembler which can be called from any application; the disassembly can be produced in AT&T syntax and Intel syntax, as well as in an intermediate format which includes detailed instruction and operand type information.

This disassembler is derived from libi386.so in the bastard project; as such it is x86 specific and will not be expanded to include other CPU architectures. Releases for libdisasm are generated automatically alongside releases of the bastard; it is not a standalone project, though it is a standalone library.

The recent spate of objdump output analyzers has proven that many of the people [not necessarily programmers] interested in writing disassemblers have little knowledge of, or interest in, C programming; as a result, these "disassemblers" have been written in Perl. In order to address this audience, a HOWTO has been provided which demonstrates how to use the libdisasm opcode tables to implement a true disassembler using Perl:

Perl Disassembler HOWTO

It is hoped that the state of the art of disassemblers in Linux/UNIX will improve beyond mere objdump backends in the near future.


how does it work?

The basic usage of the library is:

  1. initialize the library, using x86_init()
  2. disassemble stuff, using x86_disasm()
  3. un-initialize the library, using x86_cleanup()

These routines have the following prototypes:

	int x86_init( enum x86_options options, DISASM_REPORTER reporter, void *arg);

	unsigned int x86_disasm( unsigned char *buf, unsigned int buf_len,
                         unsigned long buf_rva, unsigned int offset,
                         x86_insn_t * insn );
	int x86_cleanup(void);

Instructions are disassembled to an intermediate format:

typedef struct { 
        enum x86_op_type        type;           /* operand type */
        enum x86_op_datatype    datatype;       /* operand size */
        enum x86_op_access      access;         /* operand access [RWX] */
        enum x86_op_flags       flags;          /* misc flags */
        union {
                /* sizeof will have to work on these union members! */
                /* immediate values */
                char            sbyte;
                short           sword;
                long            sdword;
                qword           sqword;
                unsigned char   byte;
                unsigned short  word;
                unsigned long   dword;
                qword           qword;
                float           sreal;
                double          dreal;
                /* misc large/non-native types */
                unsigned char   extreal[10];
                unsigned char   bcd[10];
                qword           dqword[2];
                unsigned char   simd[16];
                unsigned char   fpuenv[28];
                /* absolute address */
                void            * address;
                /* offset from segment */
                unsigned long   offset;
                /* ID of CPU register */
                x86_reg_t       reg;
                /* offsets from current insn */
                char            relative_near;
                long            relative_far;
                /* effective address [expression] */
                x86_ea_t        expression;
        } data;
} x86_op_t;

typedef struct {
        /* information about the instruction */
        unsigned long addr;             /* load address */
        unsigned long offset;           /* offset into file/buffer */
        enum x86_insn_group group;      /* meta-type, e.g. INS_EXEC */
        enum x86_insn_type type;        /* type, e.g. INS_BRANCH */
        enum x86_insn_note note;        /* note, e.g. RING0 */
        unsigned char bytes[MAX_INSN_SIZE];
        unsigned char size;             /* size of insn in bytes */
        /* 16/32-bit mode settings */
        unsigned char addr_size;        /* default address size : 2 or 4 */
        unsigned char op_size;          /* default operand size : 2 or 4 */
        /* CPU/instruction set */
        enum x86_insn_cpu cpu;
        enum x86_insn_isa isa;
        /* flags */
        enum x86_flag_status flags_set; /* flags set or tested by insn */
        enum x86_flag_status flags_tested;
        /* stack */
        unsigned char stack_mod;        /* 0 or 1 : is the stack modified? */
        long stack_mod_val;             /* val stack is modified by if known */
        
        /* the instruction proper */    
        enum x86_insn_prefix prefix;    /* prefixes ORed together */
        char prefix_string[MAX_PREFIX_STR]; /* prefixes [might be truncated] */
        char mnemonic[MAX_MNEM_STR];    
        x86_oplist_t *operands;         /* list of explicit/implicit operands */
        size_t operand_count;           /* total number of operands */
        size_t explicit_count;          /* number of explicit operands */
} x86_insn_t;

The x86_format_insn() routine can be used to generate a string representation:

	int x86_format_insn(x86_insn_t *insn, char *buf, int len, enum x86_asm_format); 

...so that a simple disassembler can be implemented in C with the following code:

   #include <libdis.h>

   

   char buf[BUF_SIZE];      /* buffer of bytes to disassemble */
   char line[LINE_SIZE];    /* buffer of line to print */
   int pos = 0;             /* current position in buffer */
   int size;                /* size of instruction */
   x86_insn_t insn;         /* instruction */

   x86_init(opt_none, NULL, NULL);

   while ( pos > BUF_SIZE ) {
      /* disassemble address */
      size = x86_disasm(buf, BUF_SIZE, 0, pos, &insn);
      if ( size ) {
         /* print instruction */
         x86_format_insn(&insn, line, LINE_SIZE, intel_syntax);
         printf("%s\n", line);
         pos += size;
      } else {
         printf("Invalid instruction\n");
         pos++;
      }
   }

   x86_cleanup();


why not use libopcodes?

Get out.

No, really. Leave.


where are the files?

The latest release can always be found here:

libdisasm

what about support?

Support can be obtained through the bastard sourcefroge project help system:

  • Submit a bug
  • Complain on the forum
  • Mail the coder


    who's behind it?

  • mammon_, mere coder
  • ReZiDeNt, Militant Dairy Activist (MIA)
  • The Grugq, Chief Makefile Architect
  • MO_K, MIA Libi386 Enthusiast (MIA)
  • a_p, Invisible Tester
  • fbj, Visible Tester (MIA)
  • drb, n0ps 'r us
  • Kees Cook, masochist