I am playing around with "patching" an old DOS 16-bit real-mode .exe, trying to change the machine code calls.

The disassembled code is:

push bp 
mov bp, sp 
xor ax, ax 
push ax 
mov ax, 1 
push ax 
mov ax, 64h 
push ax 
; location_X 
call f_drawDialogBox; 9a 06 00 70 00 
nop ; locationY 
nop 
nop ; locationT1 
pop bp 
; locationT2 
retf

All I'm doing is inserting a NOP at locationX and deleting a NOP at locationY - basically just move the code 1 byte down. This completely breaks the program - it crashes.

No problems arise when moving a nop from locationT1 to locationT2

I was told that this is becuase I'm upsetting the stack, and that relocations are "to blame" and that I should read about DOS relocations. I've surfed the web for quite a while but was unable to find anything for DOS (with too much windows stuff floating around).

Can anybody give me an example of how to go about getting something like this to work / point me to a tutorial / good read? My goal is to modify machine code, changing one of the functions and doing CALLs to other functions from that function.

Thanks,
-p

9A 06 00 70 00 == call far absolute 0070:0006h (706h)

Typically the assemblers will generate relative calls insted of absolute, unless in the source you directly give it an abosute address, which is my guess to what is going on. If this is the case you should be using labels insted. If this isn't the case, there could be a problem with appending to .obj outputs which doesn't update the immediate absolute address in that instruction and may have to recompile from fresh. With what information I have here that is my only guess. You may want to double check my theory by disassembling with a nop at locationX and see if the far call instruction address changes.

My bad all far calls should be absolute addresses, but I still think that is the source of conflict, the reason why a byte shift of code will crash a program.

Hey. Thanks for the reply. I think I understand the issue now - it's definitely relocation. Here's some good info on the subject (stolen off a newsgroup - this was a post by Jack Klein):

The brief overview is this:
The part of the exe format that contains the image of the code is put
together with the assumption that the executable will be loaded at
into memory at 0000:0000 (segment 0).
Let's assume for a moment that the object files for your program
contain two code segments of 4K bytes each, and one data segment of 4K
bytes, and the two code segments will be first in the image.
The very first line of the first code segment is a call to a
subroutine that starts on the very first line of the second code
segment, that is file1.asm contains this:
extern func2:far
start:
call func2
...and file2.asm contains this:
public func2
func2 proc far
mov ax, my_data_segment
mov ds, ax
Now if the executable was actually loaded into memory at 0000:0000,
the code for file1 would start at 0000:0000 and end at 0000:0FFF, the
code for file2 (and the address of func2) would be 0100:0000 and end
at 0100:0FFF, and the data segment would start at 0200:0000.
So in the code image part of the executable file, it uses those
segment values:
call func2 9A00000001
^^^^
this 16 bit words contains segment 0100
mov ax, my_data_segment B80002
^^^^
this 16 bit word contains segment 0200
Now we know that the code will not really be loaded at segment 0000.
It might be loaded, for example, at segment 4000. That means that the
call to func2 needs to become:
9A00000041 (call func2 at 4100:0000)
...and the load of the data segment value needs to become:
B80042 (mov AX, 4200)
In fact, every 16 bit value in the program image that represents a
numerical segment can be adjusted for where the program is loaded by
adding the load segment number to the value in the image.
So the relocation header at the front of the contains the offset in
the executable image of every single 16 bit word that represents the
numerical value of a segment. After loading the code image part of
the exe file into memory (starting at some segment), the loader which
is part of command.com reads the relocation table entries. For every
segment reference entry it just adds the actual load segment value to
the relative segment value already in the word. So all of the segment
fix-ups are done before the program starts running.
As for how high-level languages work, those for DOS and/or 16 bit
Windows provide different memory models, these are the common names
although some compilers might use slight variations:
small (code and data each limited to 64K, all calls are near, all
pointers are near)
compact (multiple code segments can each be up to 64K so total code
can be much larger than 64K, all calls are far, all data pointers are
near)
data (single code segment limited to 64K, multiple data segments of up
to 64K each, so total data can be more than 64K. All calls are near,
data pointers are far)
large (multiple code segments, multiple data segments, both code and
data can be larger than 64K, calls are far and data pointers are far)
Then the compiler comes with multiple versions of the library, at
least one for each of the memory models. Often the IDE or command
line switch to the compiler for a memory model will tell it which
specific version of the library to ask the linker for.
Generally the compiler breaks code and data into segments by source
code files, that is each source code file creates one code segment for
the function(s) it defines and one data segment for whatever data it
defines. Often there are extended keywords to override this and
customize the memory usage in more detail.
Of course on more modern processors (including the 386 and up) and
operating systems (including Windows, Linux, etc.), segments are gone
from the application program level. All programs are written to run
in a flat 32 bit address space. The operating system uses the memory
management hardware of the processor to map each program's address
space to specific hardware memory addresses without needing to fix up
segment values.


-p

Apart from engineering a system crash, what is the point of trying to patch machine code without access to the original source code?

Apart from engineering a system crash, what is the point of trying to patch machine code without access to the original source code?

Don't laugh now, but I'm looking into reverse engineering an old DOS game, and am trying to patch some code to understand how it works, considering that this game is online-only :) Actually got it to work.

Best regards,
-pc

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.