Taking a snapshot means capturing the state of the executable at a particular point in time, most likely when it has completely decrypted its original code and had reached its OEP (original entry point). This snapshot is saved onto a disk after fixing the import table, API redirects etc. This dumped executable can be reverse engineered since the original code for the executable has been recovered. Here we are going to see some of the anti-dumping techniques that are prevalent in packers that prevent the reverser from capturing the decrypted executable.

Anti-dumping techniques

Nanomites

Nanomites are a powerful anti-dumping technique. This technique works by replacing certain branch instructions with int 3h. The removed jump instructions are placed in a table that is encrypted many times over. This table contains data such as address, type (whether it’s a JCC, JMP, JLE etc), the offset, whether it is a Nanomite or a regular debug exception. The problem here gets amplified due to presence of “false Nanomites”. That means the generated table contains a mixture of legitimate int 3h debug breakpoints that will actually be executed and instructions that have 0xCC in their code and under normal conditions won’t be executed. A3 CC884000 MOV DWORD PTR DS:[4088CC],EAX As we can see in the above instruction, there is byte 0xCC.The packer then generates an entry in the table with an address pointing to this byte (0xCC).But as we can see, this byte is actually a part of the instruction and will never be executed as an int 3h, under normal conditions. When the injected debug break points get fired, they can be caught by either of the following ways:

By self-debugging the process. In this, the parent process spawns a child process and attaches to it with debug privileges. And then it waits for a debug event using WaitForDebugEvent. By hooking KiUserExceptionDispatcher. Using SEH (Structured exception handler), which is a big risk, since the top level handler can be replaced later on in the code. By registering a Vectored Exception handler (VEH) (RtlAddVectoredExceptionHandler), which is also risky, since the application can install its own Vectored Exception handler later on replacing ours.

Here’s a fairly simple implementation of Nanomites taken from Nanomites.w32 by Deroko which is a sample polymorphic virus. This does not have all the complexities as in armadillo Nanomites, but it is good enough to explain the concept. The code consists of three parts

The first part is the declaration of the Nanomite as a Macro. The second part shows how the Nanomite is actually used in the virus code. The third part is the handler that takes care of the nano jumps.

[plain]nanojmp macro jmp_t, __xxx local nano nano: int 3h db jmp_t dd offset __xxx – offset nano endm[/plain] The above code snippet is the first part. Here, as we can see, the macro is replaced with a instruction sequence containing an interrupt, a jump type (JMP, JNZ etc) given by jmp_t, and a relative displacement. Unlike armadillo, the jump details are stored along with the exception causing instruction (int 3h), and the details of the jump are not encrypted. Separate encrypted Nanomite jump table and false Nanomites is something that can be added to the code Deroko has written. [plain]xor edx,edx lea eax, [ebp+sehhandle] push eax push dword ptr FS:[edx] mov dword ptr FS:[edx], esp call IsDebugPresent testeax, eax nanojmp jmp_jnz, getdelta [/plain] The above code shows how the nanojmp macro is used, sehhandle is the Structured Exception Handler (SEH) that will catch the debug breakpoint exception and the in the later steps we can see that it being installed as the topmost exception handler. And also it is being checked whether a debugger is attached to the process. And in the last instruction we can see the nanojmp being called where the first argument is the jump type, which is jump if not zero, and the second argument is the address to the label getdelta which is actually the starting point of the virus. The procedure given below is the exception handler which is fired whenever the execution hits the Nanomites [plain]sehhandle proc C pException:dword, pFrame:dword, pContext:dword, param:dword mov edx, pException mov eax, 1 cmp [edx.ER_ExceptionCode], EXCEPTION_BREAKPOINT jne __exit_handle [/plain] Above, it is checked whether the exception is a break-point exception which is the only one we need to be concerned about. [plain] mov edi, pContext xor esi, esi mov [edi.CONTEXT_Dr0], esi mov [edi.CONTEXT_Dr1], esi mov [edi.CONTEXT_Dr2], esi mov [edi.CONTEXT_Dr3], esi [/plain] Here Context record address is copied to edi and it is used to clear any hardware debug-breakpoints that were set by the debugger. This is another anti-debug measure. [plain] mov ebx, [edi.CONTEXT_EFlags] mov esi, [edi.CONTEXT_Eip] inc esi [/plain] Here the instruction pointer pointing to the exception causing instruction and Eflags register is being copied to ebx and esi respectively. And esi is incremented to point to the stored jump type for this particular Nanomites which fired the exception, which as we saw in the macro is the next byte following the int 3h instruction. [plain] xor eax, eax lodsb cmp eax, jmp_jz jne __skip0 lodsd test ebx, 40h jnz __follow jmp __notfollow __skip0: cmp eax, jmp_jnz jne __skip1 lodsd test ebx, 40h jz __follow jmp __notfollow __skip1: cmp eax, jmp_jmp jne __skip2 lodsd jmp __follow __skip2: cmp eax, jmp_jc jne __skip3 lodsd test ebx, 1h jnz __follow jmp __notfollow __skip3: cmp eax, jmp_jnc jne __skip4 lodsd test ebx, 1h jz __follow jmp __notfollow __skip4: __notfollow: add [edi.CONTEXT_Eip], 6h xor eax, eax jmp __exit_handle __follow: add [edi.CONTEXT_Eip], eax xor eax, eax jmp __exit_handle __exit_handle: ret sehhandle endp [/plain] In the rest of the code, the value is retrieved and checked using a switch statement for the kind of jump instruction — such as JMP,. JNZ, etc. After identifying the jump instruction, we need to check the Eflags register to check the jump condition as to whether we should follow the jump or not. If we are following the jump, then we add the relative displacement that we stored earlier with the jump type, with the exception causing instruction address which we can get from the context record. If we are not following this jump, then we add with exception, causing instruction address the length of the Nanomites record which is 6h.This includes the int 3h instruction, one byte to store the jump type, another four bytes to store the relative displacement. This code essentially embodies the concept of Nanomites and much can be built on top this one. Such as:

Separating the table from the int 3h instruction and storing it in a different section.

Adding false Nanomites entries in the table to prevent automatic parsing and replacement of int 3h in the code by scanning for 0xcc byte.

Encrypting the table for storing Nanomite entries, obfuscating the decryption routine, and adding anti-debug checks to it.

Stolen bytes (code splicing)

The point at which the program starts execution is called the original entry point (OEP). The packers generally unpack the whole program in memory and then jump to its OEP for execution. Code splicing (Stolen bytes) works by copying few data bytes from the OEP and then moving them to separate memory for execution. This is used to hide the actual OEP. More correctly, it should be called stolen instructions instead of stolen bytes. The packer generally moves the first three or four instructions from the OEP to highly obfuscated, anti-debug check ridden routines which are most probably with in the packers own code section. After executing these few instructions from the OEP, the packer then jumps to the original decrypted code that comes after the stolen instructions. The position containing these stolen instructions is filled with zeroes or junk code. The problem with code splicing is that packers can’t steal instructions from just anywhere. It has to know the size of the instructions before it can move it another location, since it will be executing these instruction from this location and half-copied instruction will cause exceptions. Packers can steal only those instructions that are sure to be present at the OEP such as [plain]PUSH EBP MOV EBP, ESP PUSH -1 PUSH Shellcod.004050A0 PUSH Shellcod.00401D5C MOV EAX, DWORD PTR FS:[0] PUSH EAX MOV DWORD PTR FS:[0],ESP [/plain] The instruction PUSH Shellcod.004050A0 and the PUSH instruction following it has a fixed size of 5 bytes each, so they can also be moved. It doesn’t matter what data they posses. We need to know the size of the instructions and their starting address (knowing starting address requires the knowledge of the size of the instruction preceding the current instruction, that is why moving instructions from the OEP is easier choice) to be able to move them. Now we are going to see a very simple method to implement the code splicing technique. Here a section will be created where the stolen instructions will be injected. The most of the code snippets are taken from “Inject your code to a Portable Executable” by Ashkbiz Danehkar. Some of it added by the author according to the need of the example. [plain]PIMAGE_DOS_HEADER image_dos_header; PCHAR pDosStub; DWORD dwDosStubSize, dwDosStubOffset; PIMAGE_NT_HEADERS image_nt_headers; PIMAGE_SECTION_HEADER image_section_header[MAX_SECTION_NUM]; PCHAR image_section[MAX_SECTION_NUM]; [/plain] Before getting to the code section below, the file is opened and read into a buffer, and then image_dos_header and image_nt_header is extracted from the file. After reading the image_nt_header, we can recover the number of sections present in the executable. SectionNum=image_nt_headers->FileHeader.NumberOfSections; Here, the section header is recovered for all the sections in the executable and is saved in image_section_header. [plain]for( i=0;i<SectionNum;i++) { CopyMemory(image_section_header[i],pMem+dwRO_first_section+isizeof(IMAGE_SECTION_HEADER), sizeof(IMAGE_SECTION_HEADER)); } [/plain] The following loop is used to recover the section data using the information from the section header extracted earlier. [plain]for(i=0;i<SectionNum;i++) { image_section[i]=(char)GlobalAlloc(GMEM_FIXED | GMEM_ZEROINIT, PEAlign(image_section_header[i]->SizeOfRawData, image_nt_headers->OptionalHeader.FileAlignment)); CopyMemory(image_section[i],pMem+image_section_header[i]->PointerToRawData, image_section_header[i]->SizeOfRawData); } [/plain] The code snippet below is responsible for adding a new section to the executable. dwSize is the size of the new section and szName is the name given to the new section. [plain]DWORD PEAlign(DWORD dwTarNum,DWORD dwAlignTo) { return(((dwTarNum+dwAlignTo-1)/dwAlignTo)dwAlignTo); } . . . DWORD roffset,rsize,voffset,vsize; int i=image_nt_headers->FileHeader.NumberOfSections; rsize=PEAlign(dwSize,image_nt_headers->OptionalHeader.FileAlignment); vsize=PEAlign(rsize,image_nt_headers->OptionalHeader.SectionAlignment); roffset=PEAlign(image_section_header[i-1]->PointerToRawData+image_section_header[i-1]->SizeOfRawData, image_nt_headers->OptionalHeader.FileAlignment); voffset=PEAlign(image_section_header[i-1]->VirtualAddress+image_section_header[i-1]->Misc.VirtualSize, image_nt_headers->OptionalHeader.SectionAlignment); memset(image_section_header[i],0,(size_t)sizeof(IMAGE_SECTION_HEADER)); image_section_header[i]->PointerToRawData=roffset; image_section_header[i]->VirtualAddress=voffset; image_section_header[i]->SizeOfRawData=rsize; image_section_header[i]->Misc.VirtualSize=vsize; image_section_header[i]->Characteristics=0xC0000040; memcpy(image_section_header[i]->Name,szName,(size_t)strlen(szName)); image_section[i]=(char)GlobalAlloc(GMEM_FIXED | GMEM_ZEROINIT,rsize); image_nt_headers->FileHeader.NumberOfSections++; [/plain] The new OEP can be set using the following code. [plain]DWORD OEP_RVA = image_nt_headers->OptionalHeader.AddressOfEntryPoint; OEP_RVA +=10; // OEP_RVA points to instructions after the removed instructions // OEP being set below image_nt_headers->OptionalHeader.AddressOfEntryPoint= image_section_header[i]->VirtualAddress; DWORD newOEP= image_section[i]; [/plain] Now we have to find the old OEP of the executable, which is in RVA. With it, we need to find the data offset with in the copied file where the OEP resides and extract top ten bytes from it. This can be done using the following way: We start by iterating through the Section table, each section header given by image_section_header[j]->VirtualAddressstores the starting RVA of the section, and the section size given by image_section_header[j]->VirtualSize. These are VirtualAddress and VirtualSize respectively. A section is guaranteed to be loaded contiguously in memory whether it is memory mapped or loaded by the operating system. We check our RVA against the VirtualAddress field and verify that our RVA is greater than VirtualAddress of the section, and then check that our RVA is not greater than the VirtualSize + VirtualAddress, if these conditions are true then this means that our RVA lies inside this section. Now by simply subtracting our RVA with the VirtualAddress of the section we get the offset within the section where our data/instruction is stored. That is, the desired location will be: [plain]oldOEP = starting_address_of_image_section_where_the_OEP_lies + (Offset recovered with the above method) __asm { mov eax, oldOEP // eax contians OEP mov ecx,newOEP // ecx contains the location where the stolen bytes are being stored. MOV EBX,[EAX] // stealing code from OEP Mov [EAX], 0x00000000 // the stolen data being replaced with 0’s mov [ecx], ebx ADD EAX,4 add ecx,4 MOV EBX,[EAX] Mov [EAX], 0x00000000 mov [ecx], ebx add eax,4 add ecx, 4 mov bx, WORD PTR DS:[eax] Mov WORD PTR DS:[EAX], 0x0000 mov WORD PTR DS:[ecx], bx add eax,2 add ecx, 2 mov [ecx], 0xEB175883 //Other code being injected into the section add ecx,4 mov [ecx], 0xE8098B08 add ecx,4 mov [ecx], 0x64A13000 add ecx,4 mov [ecx], 0x00008B58 add ecx,4 mov [ecx], 0x0803D9FF add ecx,4 mov BYTE PTR [ecx], 0xE3 add ecx,1 mov [ecx ], OEP_RVA // The OEP_RVA being stored in location marked by four 0xcc bytes add ecx,4 mov [ecx], 0xE8E4FFFF add ecx,1 mov [ecx], 0xFF } [/plain] The code injected into the newly created section has the following format: [plain] . . STOLEN INSTRUCTION (10 bytes) . . jmp tick1 tick2: pop eax sub eax,9 mov ecx,[eax] //ecx contains the OEP_RVA mov eax,fs:[0x30] //eax contains the address of PEB mov ebx, [eax + 8] // ebx contains the image base address add ebx,ecx //Address of instruction after stolen instructions jmp ebx //Control transferred to the instruction __emit 0xcc __emit 0xcc __emit 0xcc __emit 0xcc tick1: call tick2 [/plain] The code contains stolen instructions followed by a sequence of instructions that fetches the previously stored RVA of the OEP, set EBX to the address of the instruction after the last the stolen instruction. After that address of PEB (Process Environment Block) is fetched and from it the ImageBaseAddress is retrieved, which gives the base address of the executable loaded in memory, it is then added to the retrieved OEP+10 (which is in EBX) to get the complete address of the instruction to which the control has to be transferred. At this point we have created a new section, stored stolen instruction and added extra code (so that execution continues without any exception) in the section. Now all that remains is to save all the changes we have made to the executable to a file. While copying, we copy everything from DOS header and stub, PE File Header and Optional Header and all the section header information as well the section data that we have changed to a file. The technique above can be improved further by following ways:

Using a lot of junk code between stolen instructions and the extra code that we have injected. This will make reversing the application a lot more difficult.

Use a lot of anti-debug checks in the stolen code to prevent analysis by debugger.

Self-unmapping

When the executable is loaded for execution, all the data with in the sections is mapped into the address space. That is, it is simply a mapped view of file. This mapped view of file can be unmapped using UnmapViewofFile(), like any ordinary mapped file. But before we can unmap the loaded executable, we must transfer all the data to a separate location. Because once the file is unmapped the address ranges occupied by the various sections in the image become invalid. After we have relocated the image, we have to adjust all the absolute references according to the new base address. This is done using the relocation table. After all the relocations are fixed we can unmap the previous view of image. Here’s an example that does all that is explained above: [plain]PVOID baseAddress = GetModuleHandleA(0); short int location_File_header = *(baseAddress + 0x3c); int size_of_image = *(baseAddress + location_File_header + 0x50); PVOID new_base = VirtualAlloc(NULL,size_of_image,0x1000,PAGE_EXECUTE_READWRITE); long int new_execution_point; [/plain] In the above code, size of the image is extracted from the PE header, and then memory is allocated with read, write and execute permission of a size equal to the extracted size. [plain]__asm { mov eax,new_base mov esi,base_address mov ecx, size_of_image lea edi, [eax+offset l1] sub edi, esi mov new_execution_point, edi mov edi, eax rep movsb } [/plain] Suppose that, Eax contains the address of the newly created page, let eax= 0x003f0000, the offset l1 will be 0x00401121 (l1 is the location from where the relocated code will resume execution) then edi = eax + offset l1 will be 0x007f1121 here esi contains 0x00400000 that is image base of the executable, now edi = edi-esi will be 0x003f1121, that address will be the address of the l1,if the whole executable image were copied to the newly allocated page. In short here we are copying the executable image to the newly allocated memory, and then an address within the relocated image where the execution should resume is calculated and stored in new_execution_point. [plain]long int image_directory_basereloc_rva = *(baseAddress + location_File_header + 0xa0); long int size_of_section = *(baseAddress + location_File_header + 0xa4); long int relocation_diff = new_base – baseAddress; long int size_of_section_temp = size_of_section; while (size_of_section_temp > 0) { long int page_rva = *(baseAddress + image_directory_basereloc_rva); long int size_relocation_block = *(baseAddress + image_directory_basereloc_rva + 4); image_directory_basereloc_rva += 8; size_of_section_temp -= size_relocation_block; while (size_relocation_block > 0) { short int relocation_value_type = *(baseAddress + image_directory_basereloc_rva); if( ((relocation_value_type » 12) & 0x000F) == IMAGE_REL_BASED_HIGHLOW) { short int offset = relocation_value_type & 0x0FFF ; long int *address_to_be_patched = new_base + page_rva + offset; *(address_to_be_patched) = *(address_to_be_patched) + relocation_diff; } size_relocation_block -= 2; image_directory_basereloc_rva += 2; } } [/plain] The code above is used to fix relocation table. First the RVA and the size for the relocation table is fetched from the Optional header. After that, relocation difference is calculated that will added to the locations pointed by the relocation entries. The relocation table has the following format: Virtual address RVA of the page that is to be fixed (4 bytes) Size of Block The size of the relocation block, this includes the size of the header. (4 bytes) This is followed by relocation entries. They are 16-bit words, where the higher 4 bits indicates the type of relocation. For ex, IMAGE_REL_BASED_ABSOLUTE The Relocation is skipped. This type can be used to pad a relocation block so that the next block starts at a 4-byte boundary. IMAGE_REL_BASED_HIGHLOW The relocation adds the base-address difference to the 32-bit double word at the location denoted by the 12-bit offset. The lower 12 bits are the offsets with in the 4K page. Hence the address to be patched is calculated by adding the base address of loading, the RVA of the page and the offset within the page. [plain]__asm { push base_address push new_execution_point jmp UnmapViewOfFile } l1: //execution continues from this location [/plain] Finally, we unmap the view of previously loaded executable image. Here we have arranged values in the stack so that the return address after the execution of UnmapViewOfFile is the point labeled by l1. This method is effective in preventing memory dumping if the user is using automated tools which dump only static memory, such as sections and headers and not the memory created dynamically by calls to VirtualAlloc() etc. We can further improve this by

Placing these routines in the TLS callback routines, which are executed before the program reaches OEP.

Conclusion

This paper is meant to elaborate on some of the memory dumping techniques that are being used by people to protect their applications. The techniques outlined here are in no means exhaustive, but the intention of this paper is to give a more detailed view of some of the commonly used techniques.

Sources

“ANTI-UNPACKER TRICKS – PART ONE” – Peter Ferrie, Microsoft, USA Undocumented Windows NT Anti-reverse Engineering Nanomites.w32 by Deroko – A virus written by Deroko