BinNavi uses an SQL database to store all disassembly data. The BinNavi database format is open and tries to be a standardized way to store disassembled code in SQL databases. This part of the manual describes all necessary tables and gives a few example queries that can be used to read data from the database.
The distinction between exporting data from IDA and converting the exported data into a BinNavi module is reflected in the database too. Each BinNavi database has tables that contain the exported data (the so called raw data) and tables that contain the converted data (the so called BinNavi data). For most users only the layout and content of the raw data tables is important because this is the simpler database format, it is less likely to change in the future, and you can always rely on BinNavi itself to turn raw data into BinNavi data. For this reason only the raw data database format is described in this manual.
For each disassembled module, the exporter creates 14 new tables in the database: ex_?_address_comments, ex_?_address_references, ex_?_basic_blocks, ex_?_callgraph, ex_?_control_flow_graphs, ex_?_data, ex_?_expression_substitutions, ex_?_expression_trees, ex_?_expression_nodes, ex_?_expression_tree_nodes, ex_?_functions, ex_?_instructions, ex_?_operands, and ex_?_sections. The question mark in those table names is a placeholder for a unique ID (like 1, 2, 123, ...) that identifies the module in the database. Furthermore there is one table called modules in the database.
The modules table contains a list of all raw modules stored in the database.
modules | ||
Column Name | Column Type | Column Description |
id | serial NOT NULL | Unique identifier of the module. This is the same identifier used for the question mark placeholder in the table names of the module. |
name | text NOT NULL | Name of the module. In most cases this equals the name of the original input file. |
architecture | character varying(32) NOT NULL | Architecture string that identifies the target architecture of the original input file. Right now the strings x86-32, PPC-32, and ARM-32 are officially supported. |
base_address | bigint | The address of the original input file in memory if no relocation operation is applied to the file when it is loaded. |
exporter | character varying(256) NOT NULL | String that identifies the exporter that exported the module. |
version | integer NOT NULL | Version string that identifies the version of the disassembly table schema. This value must always be 3. |
md5 | character(32) NOT NULL | MD5 hash of the original input file of the module. |
sha1 | character(40) NOT NULL | SHA1 hash of the original input file of the module. |
comment | text | Arbitrary comment that can be used to describe the module. |
import_time | timestamp | Timestamp that specifies when the exporter exported the module to the database. |
Additional information:
The table ex_?_functions contains information about all functions of a module.
ex_?_functions | ||
Column Name | Column Type | Column Description |
address | bigint NOT NULL | Start address of the function. |
name | text NOT NULL | Name of the function. |
has_real_name | boolean NOT NULL | Flag that indicates whether the function name was generated by the disassembler (false) or whether it is a known function name, for example from an imported library (true). |
type | integer NOT NULL DEFAULT 0 | Identifies the type of the function. Valid values are 0 (Normal Function), 1 (Library Function), 2 (Imported Function), 3 (Thunk Function), 4 (Thunk Adjustor Function), and 5 (Unknown Function). |
module_name | text | In case of imported functions this field contains the name of the dynamic library the function is imported from. In case of functions that are not imported this field is null. |
Additional information:
The table ex_?_basic_blocks contains information about the basic blocks of the module. Please note that a basic block is not uniquely identified through its address because different functions can share the same basic block.
ex_?_basic_blocks | ||
Column Name | Column Type | Column Description |
id | integer NOT NULL | Identifier of the basic block. |
parent_function | bigint NOT NULL | Function the basic block belongs to. |
address | bigint NOT NULL | Start address of the basic block. |
Additional information:
The table ex_?_control_flow_graphs contains information about the branches between the different blocks inside each function.
ex_?_control_flow_graphs | ||
Column Name | Column Type | Column Description |
id | serial NOT NULL | Identifier of the branch. |
parent_function | bigint NOT NULL | Function the branch belongs to. |
source | integer NOT NULL | Source block of the branch. Note that the basic block identified by this value must be in the same function as the branch itself. |
destination | integer NOT NULL | Destination block of the branch. Note that the basic block identified by this value must be in the same function as the branch itself. |
type | integer NOT NULL DEFAULT 0 | Describes the type of the branch. Valid values are 0 (conditional branch executed), 1 (conditional branch not executed), 2 (unconditional branch), and 3 (switch branch). |
Additional information:
The table ex_?_instructions contains all disassembled instructions of the module and the basic blocks they belong to.
ex_?_instructions | ||
Column Name | Column Type | Column Description |
address | bigint NOT NULL | Address of the instruction. |
mnemonic | character varying(32) NOT NULL | Mnemonic of the instruction. |
data | bytea NOT NULL | Raw data bytes of the instruction. |
Additional information:
The table ex_?_operands describes all operands of an instruction.
ex_?_operands | ||
Column Name | Column Type | Column Description |
address | bigint NOT NULL | Identifies the instruction an operand belongs to. |
expression_tree_id | integer NOT NULL | Identifies the expression tree of the operand. |
position | integer NOT NULL | Gives the position of the operand. The value 0 means that the operand is the first operand of the instruction and so on. |
Additional information:
The table ex_?_expression_trees contains the unique identifiers of the expression trees that can appear as operands.
ex_?_expression_trees | ||
Column Name | Column Type | Column Description |
id | serial NOT NULL | Identifies an expression tree. |
The table ex_?_expression_nodes contains the expression tree nodes used to build the operand trees for each operand.
ex_?_expression_nodes | ||
Column Name | Column Type | Column Description |
id | serial NOT NULL | Identifies an expression. |
type | integer NOT NULL DEFAULT 0 | Describes the type of the expression. Valid values are 2 (Integer Literal), 3 (Float Literal), 4 (Operator), 5 (Register), 6 (Size Prefix), and 7 (Memory Dereference). |
symbol | haracter varying(256) | Unless the expr_type value is
an integer, this column contains the value of the operand. Valid values for operators are all kinds of infix operators like '+', '-', and '*', as well as prefix operators like x86 selectors ('ss:', 'cs:', ...). Valid values for size prefixes are b1 (byte), b2 (word), b4(dword), and b8 (qword). The only valid value of memory dereferences is '['. |
immediate | bigint | If the expr_type value is an integer literal, this column contains the integer value of the literal. |
position | integer | Specifies the position of the expression in the operand. This is useful to sort members of the expression tree (for example in binary expressions like addition). |
parent_id | integer | Parent expression of the expression or null if the expression is a root expression of the expression tree. |
Additional information:
The table ex_?_expression_tree_nodes specifies what expression tree nodes belong to what expression tree.
ex_?_expression_tree_nodes | ||
Column Name | Column Type | Column Description |
expression_tree_id | integer NOT NULL | Identifies an expression tree. |
expression_node_id | integer NOT NULL | Identifies an expression tree node. |
Additional information:
The table ex_?_expression_substitutions contains alternative strings for operand expressions. This is useful to display constants or variable names instead of raw expressions.
ex_?_expression_substitutions | ||
Column Name | Column Type | Column Description |
id | serial NOT NULL | Identifies the replacement operation. |
address | bigint NOT NULL | Instruction that contains the expression to be replaced. |
position | integer NOT NULL | Position of the operand that contains the expression to be replaced. |
expression_node_id | integer NOT NULL | Expression to be replaced. |
replacement | text NOT NULL | Alternative text for the expression. |
Additional information:
The table ex_?_address_references provides information about offsets of the module referenced by instruction operands. Instruction operands can either reference other instructions or arbitrary offsets of the module.
ex_?_address_references | ||
Column Name | Column Type | Column Description |
address | bigint NOT NULL | Identifies the instruction the operand belongs to. |
position | integer | Identifies the operand that references another address. |
expression_node_id | integer | Identifies the exact expression that references another address. |
destination | bigint NOT NULL | The memory address referenced by the expression. |
type | integer NOT NULL DEFAULT 0 | Describes the reference. Valid values are 0 (conditional branch executed), 1 (conditional branch not executed), 2 (unconditional branch), 3 (switch branch), 4 (direct call), 5 (indirect call), 6 (indirect virtual call), 7 (data), and 8 (data string). |
Additional information:
The table ex_?_address_comments contains comments associated with addresses of the module.
ex_?_address_comments | ||
Column Name | Column Type | Column Description |
address | bigint NOT NULL | Address associated with the comment. |
comment | text NOT NULL | The comment text. |
Additional information:
The table ex_?_callgraph describes how the individual functions of a module call each other. Each row of the table describes a single function call.
ex_?_callgraph | ||
Column Name | Column Type | Column Description |
serial NOT NULL | unsigned int not null | Identifies the function call. |
source | bigint NOT NULL | Start address of the function that contains the function call. |
source_basic_block_id | integer NOT NULL | Basic block that contains the function call. |
source_address | bigint NOT NULL | Function call instruction. |
destination | bigint NOT NULL | Start address of the called function. |
Additional information:
This section contains some sample queries for working with the database scheme. Some expectations about the disassembled module and how it is stored in the database lead to pitfalls that should be avoided.
mysql> select id, name from modules;
+----+-----------------+ | id | name | +----+-----------------+ | 1 | NOTEPAD.EXE | | 2 | calc.exe | | 3 | kernel32.dll | +----+-----------------+
mysql> select hex(address), name from ex_2_functions; +--------------+------------------+ | hex(address) | name | +--------------+------------------+ | 1001000 | RegOpenKeyExA | | 1001004 | RegQueryValueExA | | 1001008 | RegCloseKey | | 1001010 | SetBkColor | | 1001014 | SetTextColor | | 1001028 | GetProcAddress | | 1001030 | GlobalAlloc | | 1001034 | GlobalFree | | 1001038 | GlobalReAlloc | | 1001040 | Sleep | ...
mysql> select count(distinct(address)) as bbcount from ex_2_basic_blocks; +---------+ | bbcount | +---------+ | 2141 | +---------+
Note that the distinct is necessary because one basic block can belong to more than one function. These blocks should not be counted more than once.
It is possible to load all instructions of a function including all of their operands in just one SQL query.
select hex(ex_2_instructions.address), mnemonic, symbol, immediate, ex_2_operand_tuples.position, ex_2_expression_tree.id, ex_2_expression_tree.parent_id, ex_2_expression_tree.position from ex_2_basic_blocks, ex_2_instructions, ex_2_operand_tuples, ex_2_operand_expressions, ex_2_expression_tree where ex_2_basic_blocks.parent_function = 0x1011569 and ex_2_basic_blocks.id = ex_2_instructions.basic_block_id and ex_2_instructions.address = ex_2_operand_tuples.address and ex_2_operand_expressions.operand_Id = ex_2_operand_tuples.operand_id and ex_2_expression_tree.id = ex_2_operand_expressions.expr_id order by ex_2_instructions.address, sequence, ex_2_operand_tuples.position, parent_id +---------+----------+--------+-----------+----------+------+-----------+----------+ | address | mnemonic | symbol | immediate | position | id | parent_id | position | +---------+----------+--------+-----------+----------+------+-----------+----------+ | 1011569 | push | b4 | NULL | 0 | 1 | NULL | 0 | | 1011569 | push | esi | NULL | 0 | 12 | 1 | 0 | | 101156A | push | b4 | NULL | 0 | 1 | NULL | 0 | | 101156A | push | ss: | NULL | 0 | 3 | 1 | 0 | | 101156A | push | [ | NULL | 0 | 4 | 3 | 0 | | 101156A | push | + | NULL | 0 | 5 | 4 | 0 | | 101156A | push | esp | NULL | 0 | 59 | 5 | 0 | | 101156A | push | NULL | 12 | 0 | 53 | 5 | 1 | | 101156E | mov | b4 | NULL | 0 | 1 | NULL | 0 | | 101156E | mov | esi | NULL | 0 | 12 | 1 | 0 | | 101156E | mov | b4 | NULL | 1 | 1 | NULL | 0 | ...