Pushing the Envelope with HASP
De-Hasping, zip cracking and other marvels

Advanced essays

20 January 1998
by Quine
( )Beginner ( )Intermediate (x)Advanced (x)Expert
NO beginners

An overly long and overdue essay about how to break the 'envelope' PE encryption scheme used by HASP dongles. May be of general interest for cracking code encryption routines (which seem to be all the rage now).
Oh, and I put a code snippets section at the end of the article with links to it throughout.
Pushing the Envelope with HASP
De-Hasping, zip cracking and other marvels
Written by Quine

I'm a fan of good digital audio software. SoundForge is a nice example of such software, but it's been cracked black and blue. However, SoundForge takes plugins and there are some great plugins for it out there, most notably, those made by the waves corporation under the name Native Power Pack. The folks at waves have a demo version on their web site, but they also have an update to version 2.3. I thought I'd get the update and see how hard it would be to make it work. This was done as a casual thing. Little did I know what would come of it.

Well, the interesting thing here is that we're not actually going to crack this target. It would be very time consuming, it turns out, without the dongle (and what I hope to show in this essay is that there is a lot more cracking that can be done without a dongle than people think). The dongle used by waves is a HASP dongle (a MemoHASP in particular) and I recommend taking a look at the essay by zafer on these dongles as well as getting some info from HASP's ftp site, which I'll explain in a moment. Rather than cracking this target, we're going to learn a lot about how HASP implements various aspects of their protection scheme and how, in particular, to break their envelope protection scheme, which is a full blown exe encryptor for Win32 that relies on the dongle for the decryption codes. The accomplishment of this essay will ultimately be a decryptor that works for *most* envelope protected files. The target itself, Native Power Pack (NPP), has already been cracked by one of those "warez" groups that call themselves Radium. They have done a good job, but they had the dongle (that takes away all of the fun....). However, having their crack enabled me to verify some hypotheses that otherwise would have been quite tedious to test. In no way have I copied their crack nor has it really even been relevant to what I'm doing here. The only thing I benefitted from in having their version was the ability to compare the encrypted code with the unecrypted code to see if I was getting it right. I'll point out where this happens along the way (actually, reading over this, I realize that I won't, but you can figure it out). If you're looking for Radium's version, well, don't ask me (or reverser+) but it's not that hard to find.
Tools required
IDA Pro 3.7 (of course)
SoftICE 3.22 for NT (any 3.0+ will do)
HexWorkshop32 (or any hex editor with good copy+paste functions)
PkCrack v1.2 - http://www.wco.com/~micuan/Tools/pkcrack.zip
A lot of stuff from ftp://ftp.hasp.com/pub/hasp
W32Dasm (yes, our old friend is still handy every once in a while :-)
Spy++ from Win32 SDK (or any other prog for spying on windows and, most importantly, for getting thread IDs, but only if you're using Windows NT)
SoftDump95 or SoftDumpNT (written by me! - source code included)
Letter Opener - the hasp envelope decryptor
(again written by me with source code included)
Target's URL/FTP
Alledged Target:
Waves Native Power Pack Update v2.3 - If you have this please tell me :).
Actual Target:
HASP's Envelope protector: w32hinst.exe (see HASP link above)


I think that takes care of the preliminaries, so let's start cracking. The NPP update is an InstallShield packaged for the web file. Running it you either get the message that you don't have the dongle installed or that you don't have the dongle drivers installed (you should download the drivers from waves' site and install them). Getting the installation to run successfully is not that hard and involves techniques that have been discussed extensively on this site, so I'll be brief. The trick is to find where the message box with the bad guy message comes from. One might guess that it's in the InstallShield script file, setup.ins (on this topic see the absolutely spectacular essay by natzgul on decompiling InstallShield scripts), but a quick search through that file doesn't turn up anything. Turns out that it's in a file called mydll.dll (great name). Try disassembling that file if you want a quick preview of what's to come. There's some crazy stuff going in there (self-modifying code and just about anything you can think of), but luckily we don't have to worry about it yet. It's the ins script that does the installation, so that's what we've got to worry about. All it wants from mydll.dll is the right return code. I didn't crack the script, although with the help of natzgul's essay that shouldn't be hard. What I did was just set a breakpoint on MessageBoxA and F12'd until I got back into the main InstallShield code. At that point, I looked at what was in eax, which was 0 (meaning false -- bad guy) and changed it to 1 (true -- good guy). The installation continued along happily until I got six error messages saying that it couldn't register the plugins as OLE controls. I just kept clicking OK and the Installation exited. The plugins had been installed (six plugin dlls and one runtime dll), but they had not been registered and were not accessible in Sound Forge. This is where the real cracking starts.

OCX controls (or OLE or ActiveX or whatever the hell Billy wants us to call them these days) are often capable of registering themselves with the operating system. In other words, all the appropriate registry settings are set by the plugin itself. All you need to do is run regsvr32.exe (in your windows dir) with the name of the dll as a command line argument. regsvr32 is a very simple program that just calls a function called something like DllRegisterSvr (I don't remember the exact name) in the dll itself. So, let's try this on one of the plugins and see what happens. 2 Error messages. The first error message says "HASP not found" and the second says that there was an error loading the dll. (Note: it was unable to LOAD the dll - this is an error message from the operating system! - so it never even tried to call the register function). If the operating system couldn't load the dll, how did the dll manage to manage to put up the "HASP not found" box? When a dll is loaded, the DllMain function is called immediately and if it wants to indicate that it is unable to load (even though strictly speaking it's already loaded) it returns FALSE, causing the operating system to immediately unload it and generate an error. However, before we get any further involved in this, let's gather some of the material that we're going to need along the way. In particular, let's visit the HASP ftp/web sites and see what goodies we can find. Note: I realize that HASP is the name of a product and Aladdin Knowledge Systems is the name of a company and that when I refer to the company as HASP I am in error. That's ok. HASP is a good name (although it stands for something other than Hard ASs Protection) and Aladdin Knowledge Systems is a stupid, pretentious name. When the HASP people are being smart I call them hasp when they are being stupid I call them by their true name.

There are a couple of HASP ftp sites out there. The first one I tried was ftp.hasp.com (seemed like a good guess). If you go into the /pub/hasp dir you'll find a dir called something like cd-4 (this dir, unfortunately, just disappeared. You can get version 5 and the version 4 updates, but if you want the original version 4, you've got to download a 100meg zip with the whole CD --- or, of course, you can hunt down the individual zips on the web). It contains all of the software from v.4 of the HASP release and it's all in zip files. So, I downloaded dll.zip from the /pub/hasp/cd-4 directory (it looked promising) and lo and behold, it was password protected. I downloaded some more stuff and it was all password protected. I knew that zip password crackers existed, but I had never tried one. I also knew that they were said to be slow, but I didn't know how slow. So, off to some semi-legitimate web sites for as many zip crackers as I could find. Of the brute force crackers, fast zip crack (fzc v1.4) seemed to be the best. It just tries all the possible passwords. This is no good. It took my Pentium-225 (an overclocked 200) 1.5 hours to try all the 5 character passwords (none of them were right). 6 character passwords would take a few days and any more than that are out of the question. The passwords can go up to 26 characters in length. This wasn't going to work. I looked around a little more and found my answer. pkcrack v1.2 for Win32 is the way to go. There are two programs called pkcrack, so make sure you get the right one (see the tools section above). It goes about cracking zips in an altogether different way. The catch is that you have to have at least 13 (a whole lot more is much better) unencrypted bytes from one of the encrypted files and you have to zip these bytes using exactly the same compression method used in the encrypted file. Where do you get these bytes? Well, we can see the file titles in the encrypted zip. Certain file types have standard headers (exe files, etc.). We can simply take the standard header from another file of the same type, zip it, and run pkcrack. However, we can do even better than that in most cases. It occurred to me that since there were at least a hundred zip files on hasp's site they probably all use the same password. Passing out that many different passwords to customers would be a nightmare. Furthermore, there was a directory called 'setup'. I got the english setup files out of that dir and took a look. Sure enough, standard Installshield instatllation files, all encrypted. Now we've got way more than 13 bytes--we've got thousands. The trick is to find Installshield files that are exactly the same version as the ones in the encrypted zips. This is not hard--just make sure the file sizes match exactly. Next, zip the file and make sure that its zipped size is exactly 12 bytes less than the size of the encrypted file (12 bytes are added by the encryption). Now, run pkcrack and you'll get your password in short order. It took my machine 13 minutes using setup.exe (~40k). Not bad, given that it would have taken fzc several years to do it. I haven't tried it using a bigger plaintext file, because this was fast enough. You could try the ~300k inst300.ex_, but that might actually take longer. Who knows? Try it. The hasp password, by the way, is incredibly stupid. If you were lucky, it wouldn't even be that hard to guess. This tactic has proven effective on many password protected zips found on the internet. There is a very simple solution to this security hole, which is to zip the files twice so that you've got a pw-protected zip inside a pw-protected zip. There's nothing to be done with that. So, go to work on password protected zips (oh, and be sure to get your 'hasponftp', although you might have more 'success' with version 5 ;-).

Now, with as much HASP software as we could possibly hope for ready to hand, let's see what we can do. To save us some time, I'll just tell you what the most important files to get are: manual.zip (10meg pdf of the whole manual), dll.zip, win32api.zip, prokit.zip, envelope.zip, pcs.zip, msc.zip, borlandc.zip. There's a lot of overlap between these files, but they're all useful. The most useful file besides the manual (which is a treasure trove of information) is hasp32b.obj. There are several versions of this file, but the one we want is 77,459 bytes and can be found in the win32api and msc zips (among others). What this is is the object file that developers are supposed to link to their applications so that they can communicate with the dongle. As is characteristic of .obj files, this one has a whole lot of debugging information that never makes it into the exe. Why is it there? Well, the linker has to know how to connect your code with the object file code and the only connection it has to do that is the name of the function or global variable. So, the names of most functions and global variables appear in the object file. Keep in mind also, that IDA is able to parse object and library (.lib) files in order to create a signature file which it uses for fast library recognition (FLIRT). You need to get the FLAIR tools off of IDA's website to do this. What we're going to do is make a signature file for hasp32b.obj and then just apply it our target and bask in the warm glow of named functions and global variables. However, I'm getting ahead of myself. Our target is still a god awful mess of encrypted crap. So, let's sort that out before we get too excited.

I chose s1.dll as my target for no particular reason. The cracks should in principle be the same for each plugin. Crack one, crack them all. Anyway, whenever I talk about 'the target' from here on I'm talking about s1.dll. Run s1.dll through IDA and get ready to hit 'y' .... forever. It's not going to work. Why not? The export section of the file is encrypted and therefore when IDA tries to read it in it gets a whole lot of garbage which it doesn't know what to do with and keeps giving you error messages. I thought that maybe if you just hit 'y' (for 'yes, continue') long enough, it would stop. This appears not to be the case. I balanced a screwdriver on top of the 'y' key and let it sit there for about an hour. The errors were still going strong and I felt like using the computer again. This was not going to work. I thought it might shed some light on the matter to run s1 through pedump (the version by Matt Pietrek included with his book, but probably available on the web). s1.dll crashed it at the exports section with an out of bounds pointer. This thing is no joke. So, how about our old and almost forgotten friend w32dasm? Stupid little disassembler that it is, it plowed right through the file no problem. See, it's still good for something. Most of the file was ignored, of course, because most of the segments weren't marked as containing code. However, it found the resource section (which isn't encrypted--I think I know why--read on!) and the segment starting at RVA 3E000, which is marked as code and contains the entry point for the program (3E000, or 1C43E000). w32dasm disassembled at 1C43E000 for about 300h bytes, generating nice clean code, until it hit more garbage. Well, guess what this code does? It decrypts the loading code. At this point you might think that it would be a good idea to pour over this fragment from w32dasm and figure out the decryption algorithm in order to decrypt the loading code. That, however, is way too much work. Let's let the program decrypt itself, which is what it has to do anyway. To do that we're going to have to teach SoftICE some new tricks.

Start up the SoftICE loader and load regsvr32 (be sure to specify s1.dll as the command line argument). At the initial breakpoint, set a breakpoint for 1C43E000. This, of course, won't be in memory yet, but that's ok, the breakpoint will still hit. Let it go and when it hits again we're at the top of the first round of decryption code. Tracing through the decryption code is long and boring and pointless. What we want is a breakpoint that will hit before it looks for the dongle, but after it's decrypted the loading code. bpx CreateFile should do nicely. It's going to have to call CreateFile to open communications with the dongle driver, so breaking there is just what we want. That done, go back and disassemble at 1C43E000. You'll notice that the decryption code is gone. It's been replaced by nop's and BAh's. This is the real deal, people. These people use every trick in the book when it comes to copy protection. I have never, ever seen self-modifying code in a Win32 program before this. That used to be solely the province of DOS encryptors where protected memory was unheard of. How does this thing do it? Well, it's not that hard. It just has the linker set the code sections as READWRITE instead of READONLY (which is the default). Now you can write whatever you want in the code section. Anyway, keep scrolling down past the nop's and you'll soon hit the decrypted loading code. This is great, but what do we do with it now? SoftICE is not the ideal environment for looking at a dead listing (it's worse than w32dasm!). How do we get it out of SoftICE? One way is to turn off the code window, disassemble large chunks of code and use the SoftICE loader to save the history window to a file. That will work, but I want to look at it in IDA, not ultraedit! Hope is not lost, however. What we need to do is get the decrypted memory into a file and we need to do it from within SoftICE. The whole point of SoftICE, however, is that it stops the system cold. No code is executing except for SoftICE's code and that means no file i/o while SoftICE is active. However, you can move large chunks of memory around from within SoftICE and that's all we need. Win32 has a very handy feature called memory mapped files. I recommend that you read up on these a little to understand what comes next. The gist of it is that memory mapping a file just copies the file to memory and returns a pointer to the start of the memory map. The crucial point, though, is that any process running on the system can access this memory. It's not limited to the process that first opened the file. So, what you do is write a program that creates a file of a specified length, memory maps it, displays the pointer, and waits for the user to hit return before writing the memory map back to the disk. While it's waiting, we go into SoftICE, move whatever memory we want to save into the memory map (which starts at the location of the returned pointer), exit SoftICE, and hit return. Now our memory is saved in the file! This is exactly the way it works under Windows 95, because under w95 memory map pointers are valid for all address spaces. If it's at 82000000 in one space, it's at that address in all spaces. That's important, because SoftICE, for whatever reason, won't let you move memory from one address space to another (I don't see why it doesn't, because it seems like it could very easily). Windows NT, however, isn't quite as friendly as 95. In Windows NT, each process that wants access to the memory map has to call OpenFileMapping and MapViewOfFile with the relevant memory map handle. I use Windows NT because it is the only M$ operating system that's worth a damn (being written mainly by people who started out writing UNIX systems). This is just one example, however, of how NT's superior security and stability can get the best of you. Hope is still not lost, though, because I worked out a way to map a file in a process of your choosing under NT. I won't explain it here because it would take too long. However, you can download both SoftDump95 and SoftDumpNT from the address given in the tools section. How to use them, how they work, etc. along with source code is given in the zip files.

Well, you can see what's coming. Use SoftDump to map a file of about 10000h bytes (always err on the side of being too big--it won't hurt anything), and get up to the point of having your breakpoint fire at CreateFile. Now, we can just move the section of memory from 1C43E000 to 1C449E00 to our memory mapped file and save it. Make a copy of s1 (we'll be modifying the copy) and paste the BE00h bytes from the saved file into s1 at offset 3C000 using HexWorkshop32 (3C000 is the offset in the dll where RVA 3E000 starts). Be sure when using HexWorkshop32 that you paste the new data *over* the old data, rather than inserting it. Don't worry that the preliminary decryption routine has been written over. Once that's done its work, which it has, we have no need for it. Also, notice that when it has done its work and overwritten itself, a call to 1C43E000 will not crash, but will work exactly as before. I also discovered that the program decrypts a data segment at 1C451000-1C45F200, so you want to do the same thing to move that into the file. Now we need to get our newly patched dll into IDA.

Since the problem with loading s1 into IDA comes when it tries to read the export table, let's see if we can get IDA to not read the export table. Unfortunately there is no convenient switch to tell IDA to skip the export section, so we're going to have to patch the code. This is not hard. If you haven't read my previous essays on IDA this next part isn't going to make sense to you. Search through ida.hlp for "Reading exports...", determine the index number (it's 585h) for the string using the algorithm I give in my first IDA essay, and use IDA to do a 'Search for Immediate' in ida.wll for the index value. I should mention that I'm doing this with the demo version of IDA, not the full version. This is simply for the reason that I know the layout of the code in the demo like the back of my hand and my demo ida.wll database is commented up a storm. I have only looked at the full version long enough to see how the functions I patched are actually implemented (perhaps a little essay on the differences soon....). Anyway, we find the index value pushed onto the stack at the top of a subroutine at 44A9E0. Jump back to the place from which it's called and simply remove the call along with the argument passing. The code is as follows:

0044A46D              lea edx, [esp+2Ch]
0044A471              mov eax, edi
0044A473              call sub_44A9E0
We ought to remove those three lines, although just removing the call would probably do it, because eax and edx are used to pass arguments to sub_44A9E0. So, patch in nops with a hex editor. Save it. Run the IDA demo and load up our patched s1. Sure enough, it loads without a hitch. At this point, since I now have the full IDA, I saved the database and loaded it up with the full version. Even if you don't have the full version it's probably a good idea to reload the database with the un-export disabled demo (but certainly with one that's had the reloading enabled :-). This was just a quick and dirty patch which it is better not to test too rigorously. Note: Those of you who are a little sharper than I am may be wondering why I went to all this trouble about the export section when there is an infinitely easier way of getting around the problem. Well, the answer is that I was just dense. The obvious way to fix it is to patch s1.dll, not IDA or pedump (which I also did). Just fill in the location of the export table in the PE header with zeroes and all the programs will think that it doesn't export any functions. This would have taken 2 minutes. Oh well. It was kind of fun to go back to IDA for a non-malicious patch and I hope that this little foray was instructive.

Now we've got the world of the HASP envelope decryption engine spread out before us and what a twisted, crazy world it is. This target required more thorough reverse engineering than I have ever attempted. It required me to identify the role of almost every function in the code (that gets called--- there appear to be a lot of functions that never get called). It required sorting through the intricacies of the implementation of C++ classes at the assembly level. It involved a rather extended foray into encryption algorithms and it involved some of the most painstaking code tracing I've ever done. Furthermore, I realize that those who might to want to follow along with the essay will have to go through a lot tedious and tricky work just to get the target loaded. Many will not be able to find the relevant hasp software. So, if people feel strongly about it, I can make available some "hasp-packs" containing the relevant hasp files, a copy of s1.dll ready to be loaded into IDA, etc. I have also considered making the IDA database available, but I'm not sure about that yet. Anyway, if anyone cares about this, they can drop me an e-mail. Please don't send mail unless you really want this stuff. If I get enough response, I'll add the link to the essay.


The first thing to do is get as much library recognition out of this thing as we can. Let's guess Micro$oft Visual C++. It looks like VC's code (those jmp to the next instruction lines near the end of subroutines is a hallmark of VC code generation when the optimizations haven't been cranked up). So, run vc42rt on it and sure enough a whole section of library functions pop up. That's a good chunk of the code we don't have to worry about. Next, and most importantly, is getting a run time library signature out of hasp32b.obj. You can do that by following the instructions to the FLAIR tools available on the IDA web site. Now that we've got our sig file, copy it into the sig directory and go into IDA select signatures from the view menu. Hit insert to select a sig to apply and choose hasp32b. Watch the magic as it marks out a great big section of code. Now, there may be things that I don't understand about FLAIR (in fact, there are), so what you get out of the hasp32b is not ideal. For example, with many of the functions it puts the name in a repeatable comment attached to the function, rather than just naming the function. Also, it doesn't pick up on the global variable names. However, there are things in the FLAIR docs that suggest that you can make it do this. This needs to be looked into further. However, none of this is dire. It just means that we've a little more manual labor to do. A little comparison of hasp32b and s1 will allow you to fill in the global variable names in s1 by hand. I've done most of them for my database, so you can take a look there. Also, some of the function names just plain got missed, so those need to be filled in too. There is, of course, an advantage to this also, which is that it gets you familiar with the code.

At this point, we've got a lot of code and data identified and we haven't even started tracing through anything! That's the beauty of IDA. Now comes the really hard part -- figuring out what the hell is going on here. If you really want to see something crazy, follow the code starting at codereg. That's some crazy self-generating code. Writing that stuff must have been an incredible pain in the neck. Anyway, to get started, let's head right for the dongle check routine. After all, we're very helpfully told by the hasp people that all communication with the dongle ought to go through a single function call. My guess is that it's the function called 'CallHasp' (_CallHasp@36 in its mangled form). Whatta ya know, it is! (these names are so helpful-- thank you Aladdin Knowledge Systems!). Look at all those helpful cross references IDA gives us. What we're going to do now is get out the hasp manual and look at the parameters that get passed to the hasp routine. The plan then is to go to every point in the code that calls CallHasp and mark the parameters appropriately in IDA. This will give us a lot of information. You'll notice that arguments p1-p4 are always passed as addresses of local variables. Go ahead and rename these to p1, p2, etc. in all cases. Further, you'll notice that the service is almost always a constant. This is nice for us because it means less time in SoftICE. Go ahead and set up an enum in IDA called something like HASP_Service_Codes and then go through the manual and enter each code into the enum. For example, the service IsHasp is service code 1, so enter the name IsHasp for 1 in the enum. Now we can go through and change the immediate values to enums for the first parameter passed to CallHasp in all cases. You'll notice that there are service calls in the code ranging over all the different types of keys avaiable. Naturally, only one kind of key is going to be attached to the computer, so presumably the decryption code is standard across all keys. Further, of course, only a subset of these calls is ever going to execute, but go ahead and mark them all for completeness. Now we're ready for the first of many forays into SoftICE. The breakpoint, of course, goes on CallHasp.

Start up the symbol loader with Regsvr32 and let it rip. At the start, set a breakpoint on CallHasp. A regular breakpoint won't fire because they've re-routed the interrupt. No problem. This has always seemed more of a minor nuisance than anything else because all you have to do is set a debug register breakpoint. So, 'bpmb CallHasp x' does the trick. The x indicates that it will break if the execution reaches that address. In other words, it's functionally equivalent to a regular bpx, but invisible to any anti-SoftICE tricks. Of course, you only get four debug register bps at a time, but I've always found that to be plenty. When the breakpoint hits, dump the memory at esp to see what the program wants to know form the hasp. What I do at this point is dump about 400h bytes of stack to the log window, exit SoftICE, and use the symbol loader to save the log. As wonderful as SoftICE is, the less time you spend in it the better. I always get a little nervous and edgy with that big black box up on the screen. Anyway, however you want to do it, what we're trying to find is the first place the dongle is checked what the program wants to know from it. The top of the stack, of course, contains the return address from CallHasp was called. Immediately after that are the parameters, the first of which is the service code. We can see that the routine will return to 1C440E99 and that the service code is 1, meaning that it's just checking for the presence of the dongle. It should be no problem to get past this since the routine returns 1 if there is a dongle and 0 otherwise. Looking at the call in IDA we see that it checks for just that. Patching the code, naturally has to be done at run time from within SoftICE at this point since our target is still encrypted (I never actually tried running the decrypted dll-- I probably should since it might work). So, back into SoftICE and lets break immediately after the dongle check. Put 1 into eax and trace a little to make sure it works. It does. Now, let's set a breakpoint on CallHasp again and see what happens at the next dongle call. This time we see from a stack dump that the call came from 1C441036 and the service code is 2. That's the code for HaspCode, which gets a response from the dongle algorithm based on a seed code. Reversing the dongle's algorithm is basically impossible (assuming you even have the dongle, which we don't). So, without the dongle itself there is no way to figure out a response from just a seed code. The algorithm is undoubtedly massively complex and effectively irreversible. However, as with all encryption or algorithm based protection schemes, trying to do the math is never the way to proceed. Go after the implementation; that's your only hope and it's usually got a weakness somewhere. The trick is to find that weakness. The weakness with this particular dongle check is that it compares the unaltered responses from the dongle with values that appear to be hard coded into the dll. This seems pretty dumb, but we haven't gotten to the real action yet. So, trace a little into through the code keeping your eye on the local variables into which the respnses were saved. You soon come to the following code:

1C44107D  mov eax, [ebp+var_3C] ; ?? # of HASPs?
1C441080  mov ecx, [ebp+arg_off_1C44D1BA] ; 1C44D1BA
1C441083  mov ax, [ecx+eax*8+12h] ; 6A42
1C441088  and eax, 0FFFFh
1C44108D  cmp eax, [ebp+p1]
1C441090  jnz loc_1C4412B2   ;-- needless to say, this is bad
1C441096  mov eax, [ebp+var_3C] ; ?? # of HASPs?
1C441099  mov ecx, [ebp+arg_off_1C44D1BA] ; 1C44D1BA 
1C44109C  mov ax, [ecx+eax*8+14h] ; 7B9E
1C4410A1  and eax, 0FFFFh 
1C4410A6  cmp eax, [ebp+p2]
1C4410A9  jnz loc_1C4412B2
1C4410AF  mov eax, [ebp+var_3C] ; ?? # of HASPs?
1C4410B2  mov ecx, [ebp+arg_off_1C44D1BA] ; 1C44D1BA 
1C4410B5  mov ax, [ecx+eax*8+16h] ; F690 
1C4410BA  and eax, 0FFFFh
1C4410BF  cmp eax, [ebp+p3] 
1C4410C2  jnz loc_1C4412B2
1C4410C8  mov eax, [ebp+var_3C] ; ?? # of HASPs? 
1C4410CB  mov ecx, [ebp+arg_off_1C44D1BA] ; 1C44D1BA 
1C4410CE  mov ax, [ecx+eax*8+18h] ; 9EB3 
1C4410D3  and eax, 0FFFFh 
1C4410D8  cmp eax, [ebp+p4] 
1C4410DB  jnz loc_1C4412B2
There's rather a lot going on here actually. The variable I've named arg_off_1C44D1BA is passed that value (i.e., 1C44D1BA) as an argument. It points to an array of rather large structures that contain data concerning the hasp. My best guess is that var_3C is an index into that array to deal with the possibility of multiple hasps. Everytime I've been through this code it's been 0. What the code is doing here is checking each of the four response codes (p1-p4) against values in the hasp structure. I've indicated what those values are in comments. This is very nice. We now know the responses for seed code 3FCF, which we will see again. Getting past this check is obviously very easy. As each value is compared, simply patch the appropriate value into each local variable and continue. No problem. Also, be sure to write down what the seed code was and what the responses were. They might come in handy later. Now, of course, the immediate temptation is to ctrl-d out of SoftICE, cross our fingers and see if it works. It won't. What it will do is crash with a memory access violation. This is reassuring because it would be disappointing if the protection were that weak.

Anyway, don't break out of SoftICE yet, because at this point every time we want to go back in, we've got do both sets of patches which is a pain in the ass. I trust that everyone's breakpoint on CallHasp is still there, so let's let it run (I lied, we are hitting ctrl-d now) and see were it checks the dongle next. Well, this time we've got another service code 2 check with the same seed code at 1C440E94. How convenient. F12 back to the original call and let's see what's going on there. As you can see there are two service code three checks here. The first uses our original seed code, but look, the second decrements the seed code by one and runs the check. This is not so good since while we know what the first check ought to return we have no idea what the second check wants. My nagging fear through all of this was that it was going to use unrecoverable responses from the dongle to decrypt the main part of the dll. In fact, the hasp people would be fools not to do it this way. Needless to say, they do do it that way. To jump ahead a little, I'll just tell you that it's the values from these two checks that are used to do the decryption. And, as for just about any decent encryption scheme, having half the key probably isn't much better than having no key at all (and I'm not about to try to reverse the MD5 algorithm). The fact that I still wasn't sure that this was true was what kept me going at this point. It's a good thing I did, because as I said above the implementation of an encryption scheme (did I actually say that above?) is almost always weaker than the encryption scheme itself. The variable is just how much weaker. Anyway, back to the main story...

Tracing the code from this point on is something of a nightmare. Not because we encounter self generating hand-coded asm (although there is a lot of that that we never really have to worry about), but because we encounter compiler generated C++ code. This next part took me about 10 days to figure, so I'll give you the condensed version here. First of all, if you've been following along with the actual code or with my database, you'll see that there are some functions from hasp32b.obj that have MD5 in their names. I didn't think anything of this because I had no idea what MD5 meant if anything. It began to dawn on me that this might not be some code that hasp made up (I'm sure that those of you who know something about encryption are chuckling now). So, I did an AltaVista search on MD5 just in case there was something on the web about it. There was. I came up with well over 2000 web pages. Turns out that MD5 is a one-way hashing algorithm used by PGP to produce message digests (MD5 = Message Digest 5) which then get encrypted. A one-way hash is a function which takes an input of variable length and produces a fixed length code using an algorithm which is too complex to reverse (once again, don't mess with the math). I'm sure that you can guess what's coming and it's not good news. Anyway, let's trace immediately after the second of this pair of dongle calls and see if we can learn anything useful. Depending on much tracing you like to do (and I prefer to do very little--SoftICE, while beautiful, is unhealthly in large doses), you may start getting into some pretty hairy code. Before we get into what that code does, let's spend some relaxing time in IDA trying to see what we can figure out. As we look through the IDA listing we find references to things like 'md5.cpp' and 'randpool.cpp'. What are they doing there? Well, now's the time to talk about assertions in C and C++. Assertions are used programmers during development (they're never supposed to make it into releases of the software, but often they do which is good for us). An assertion is simply a statement in the code that a specific condition holds. For example, one might write:

    ASSERT(myNumber <= maxNumber);
If this condition is ever not met, the ASSERT function pops up a message box explaining where the assertion failed. That is, it tells the name of the source code file and the line number where the assertion is. There are also provisions for specifying a string to be displayed for any particular assertion. For example, we might say "myNumber is greater than maxNumber" for the above. Well, the folks at Aladdin Knowledge Systems didn't remove all of their assertions and that's bad form even if you're not trying to write the most solid protection scheme known to man. From the assertions, we can match up the code with the two source files mentioned fairly easily so that we can isolate all the MD5 stuff and all the randpool stuff. Furthermore, we get variable names out of the assertions too. Fill those in wherever you can. At this point it occurred to me that randpool might be a known quantity also and that furthermore the hasp people may have used a pre-packaged encryption class library for all of this stuff. Sure enough, randpool goes hand in hand with MD5. Randpool is used to generate really good random numbers from a particular seed input (can you guess where that input is coming from?). The randomized pool is then fed into the MD5 hash and what you get is a binary string that really can't be reversed (they're really serious about this, don't waste your time). I wasn't able to determine exactly what class library hasp used for the encryption stuff, but I suspect that it is a somewhat modified version of Crypto++ version 2.1. The functions don't match up exactly, but it's close. Anyway, having the source code for this stuff isn't really going to help us reverse it, because, remember, you can't and you're wasting your time if you try. What it can help us do is figure out how they use the responses from the hasp to generate an MD5 hash value and what they then do with that value. After that, we'll find their weakness.

Let's take a step back for a moment and look at what the program does in broader terms (read: I went through all of this a month and a half ago and I can't keep it all straight). I've identified the calls to the dongle and discussed their relevance, but it's worth looking at how they fit into the grand scheme of the loading code. After the program has decrypted the loading code, it calls sub_1C43E32A. This subroutine is for the most part just a series of calls to other subroutines that do the real work. After each subroutine it calls, it checks for an error and puts up a message box if one has occurred (i.e., if the dongle is not found or returns an incorrect code). The message box display routine is at 1C43EFEB and takes either the address of a string to display or an index into an array of strings which can be found at 1C44DC2B, with each string being alloted 78h bytes. The first of these subroutines that is called is sub_1C43F9EF, which in turn calls the very long sub_1C440CFA. It is from this long routine that all of the calls to the HASP we have discussed are made. This routine is a little tricky to trace from the dead listing because it is set up to deal with all of the different kinds of hasp keys, so a lot of the code is repeated with only slight variations. One thing from this routine that I haven't explained yet is what exactly happens with the 10h bytes retrieved from the dongle with the last two seed code calls. Since this essay is already going to be ridiculously long, let's take a detailed look at what happens to those bytes and learn a little bit about C++ code generation along the way. Here is the first of many chunks of code:

1C440C54   mov     eax, dword ptr [ebp+var_8] ; ==10h
1C440C57   and     eax, 0FFFFh
1C440C5C   push    eax
1C440C5D   mov     eax, [ebp+var_20] ; ptr to 10h byte chunk (WORD*)
1C440C60   push    eax
1C440C61   call    sub_1C4459C0    ; takes you into randpool stuff
1C440C61                           ; ?? constructor
The first argument to sub_1C4459C0 is the length of the block pointed to by the second argument. sub_1C4459C0 is one of what I believe to be several constructors for randpool objects (if this sentences makes absolutely no sense to you, you need to learn about C++, the greatest high level language ever invented--java is as slow as molasses in January, but reverser+ is right that we must learn to crack it). This particular constructor appears, in turn, to call another of the constructors, sub_1C45D50, which is the constructor that does the work. I'm going to give you the whole routine, because you'll see this sort of thing *a lot* so it's good to know what a constructor looks like. The first thing it does is allocate the memory for the randpool object (new just calls malloc), which in this case is 14h bytes. This is useful knowledge. We now know the size of the object and have a framework in which to fill in the members. Note that the jump at 1C445D62 is always taken unless you don't have 14h bytes of memory to spare, therefore, the call to sub_1C445A30 is always made. This routine is passed the value 10h and a pointer to the newly created randpool object. What this routine does is initialize the randpool object with zeros. I'm giving you this routine too because, for one thing, it gives an example of an assert and for another it shows the use of structs (in C++. a struct and a class are very close to the same thing) within IDA. It is followed by what I was able to figure out as the layout of a randpool object, put into an IDA structure. The real action in the randpool constructor comes at sub_1C445AC0, where it creates an MD5 object and stores a pointer to it in the randpool object. This routine is pretty long, so I'll spare you. The interesting point is that it gets passed a pointer to the 10h bytes from the dongle which in turn get mixed into the randpool object via MD5. We are now back from the call to the randpool constructor with a pointer to the new object in eax, which gets put into a local variable. We then get the setup for a very important function call which will turn out to hold the key to the decryption. sub_1C4459E0 takes 4 arguments and it is essential to figure out what each one of them does. I'll start with the first argument, which is, of course, the last one pushed onto the stack before the call (remember your C calling convention).
1st arg:  this is a pointer to a randpool object which is used to randomize
          the dest. buffer (see 3rd arg.)
2nd arg:  pointer to a buffer (the 'source' buffer) that contains data to 
          be used in decrypting data in pointed to by 3rd arg
3rd arg:  pointer to buffer that contains the data to be decrypted
          (the 'dest' buffer)
4th arg:  size in bytes of the two buffers (they must be the same size)
This subroutine takes the destination buffer and "stirs" the randpool object data into it. Basically this means that it randomizes the dest buffer using the random data in the randpool object. It then xor's the dest buffer with the source buffer, storing the result back into the dest buffer. The call to sub_1C4459E0 at 1C440C9C passes the newly created randpool object (created out of the hasp return codes), and two pointers into the data section. The source pointer points to an 18h byte block at 1C44D296 that contains some random data. The dest pointer points to an empty (i.e., zeroed out) buffer at 1C44A110. The result is that the randpool object is used to randomize the material in the dest buffer and then the contents of the dest buffer is xor'ed with the source buffer. Furthermore, the randpool object is destroyed (you can find out what this involves if you don't already know by learning about C++). All remnants of the two seed code responses from the hasp, therefore, are destroyed except for the buffer at 1C44A110. Thus, this is obviously the memory to watch.

Ok, we're now back at the main routine that calls the series of other routines, checking their return status as it goes. The next call (to sub_1C43E58C) is nice and short, but it's where things begin to get exciting. It reads in some relevant information from the PE header of the current file in memory (i.e., s1.dll in this case) and stores it in global variables. The four things it's interested in are (a) the address of the start of the section table, (b) the number of sections, (c) the file alignment of the sections, and (d) the handle of the process (gotten from a call to OpenProcess). All this of course, means that we're getting ready to do the real decryption. (a), (b), and (d) make sense, but why does it want to know the file alignment of the sections? After all, that is relevant only for the disk image--the exe/dll file itself, not the image that gets loaded into memory. And, of course, the decryption happens in memory after the image has been mapped, not in the file on disk. Just wait and see....

The next call from the main routine, to sub_1C43E6AF, is the real payoff. It immediately starts playing around the file alignment size. It compares it to 1000h and if it's less than that, it shifts it to the left by 3 (i.e., multiplies it by eight) and sticks the result in a local variable. If it's greater than 1000h, it just puts 8000h in the local variable (I guess that's big enough). In fact, the file alignment is pretty much always 200h. This is because that's how big a disk sector is. If it actually were any bigger this whole project of mine would be screwed (read on!). Anyway, it then calls VirtualAlloc to reserve a chunk of memory that size. The rest of the routine is one great big loop. What's it looping through? Why, look, it's looping through the sections of the image! What's it doing to them? By George, I think it's decrypting them. The question, then, is how? Well, here's how.

In the data section there's a bitmap of the sections that are encrypted (e.g., 01010101b means that sections 1, 3, 5, and 7 are encrypted). If a section isn't encrypted, it skips it. If it is encrypted, it reads the raw size of the section and the virtual size of the section out the section table. (By the way, you should really be up on the nature of PE files to be following all of this). The difference between these two numbers is that the raw size represents the size of the section on the disk and the virtual size represents its size when mapped into memory. Or at least this is what they're supposed to be. Microsoft and Borland have different interpretations (I have no idea what other compilers do). M$ takes raw size to mean all the bytes the section takes up on the disk, including padding for alignment (which, quite significantly for us, is done with 00h). M$ takes virtual size to mean the minimum size that needs to be mapped into memory. Borland takes raw size to mean the size on disk minus the padding and virtual size to mean minimum required in memory plus file padding. Matt Pietrek goes through all of this in his Windows 95 System Programming Secrets (I've said it before and I'll say it again: BUY THAT BOOK). The upshot is that the envelope engine takes the smaller of the two values as the size of the section for its purposes (there's a very good reason for this having to do with those 00h's--you'll see). We then start another loop that loops through the section in chunks the size of the file alignment times eight (remember that local variable?) until it reaches the end of the section. For each iteration of the loop, the buffer that was allocated with VirtualAlloc is filled with the next chunk from the current segment and sub_1C445900 is called. This routine takes a pointer to a buffer and the buffer size as arguments (along with another insignificant argument). Given what we have already learned about our target, this routine is quite easy to understand. It decrypts the buffer that is passed to it using the data at 1C44A110, which, as you may remember, was gotten from the pair of dongle calls earlier. First off, the routine allocates another buffer the same size as the one it was passed and has Windows initialize it to zeroes. It then creates a randpool object using the data at 1C44A110 as the seed material. Can you guess what routine we're all ready to call now? You got it, sub_1C4459E0. We've got a randpool object and two equally sized buffers. The new buffer will serve as the dest. buffer and the buffer with the encrypted code as the source. After the call to sub_1C4459E0, the dest buffer (which, if 1C44A110 has the right 10h bytes in it, will contain the decrypted code) is copied back into the source buffer (i.e., the one that was passed into the routine). It's not too hard to figure out what happens from this point on.

What is hard to figure out, though, is what that damn dongle is supposed to return. Without the dongle we have little chance of ever figuring that out. It's a good thing we don't need to. Here's why we don't need to. The sequence is reapplied starting at the beginning after 1000h bytes. This means that two bytes in the encrypted code of any given segment that are exactly 1000h bytes apart are xor'd with the same byte from the sequence. This point is essential. What it means is that if we can find just slightly over 1000h bytes of code from within the encrypted program that are known to us, we can decrypt the whole thing. Here's why. Everyone knows that xor'ing is transitive (is this the right word? commutative, maybe -- or maybe it's both -- I'm no mathematician). If x xor y = z, then x xor z = y and y xor z = x. So, if we knew that at some point in the code there was a byte, x, and at a point exactly 1000h bytes later or earlier there was a byte, y, then we could figure out that at that point in the sequence, the value of key is x xor y. Furthermore, if we knew that at some point in the code there was a string of bytes, just slightly longer than 1000h bytes, all of which are known to us, we could find the location of that string within the encrypted code using the following method. Start at the beginning of the segment in which the known string resides and xor the first value in the segment with the first value in our known string. That's our candidate for that particular byte of the key. Now, xor our candidate with the value of our known string exactly 1000h bytes later and compare the result with the value in the encrypted code at exaclty 1000h bytes after where we got the first the byte. If that comparison doesn't match, move on to the second byte in the encrypted code. If it does match, try the second byte of the known string and see if it passes the test with the next byte of the encrypted code. Only 5-10 bytes of testing should be necessary to find the starting point of the known string, so we really need a known string of about 1005h to 100Ah bytes. Once we've found the beginning of the known string, we can just xor the first 1000h of it with the encrypted code and generate the key.

You may be asking yourself at this point where we could possibly find a known string that long. That sort of problem seemed hopeless to me in the case of encrypted zips as well until I thought about it a little. At first, with the encrypted code, I thought there might be some hope with library functions, but no library function is that long and there is little consistency in how the linker arranges library function in the code, given that you never know exactly which functions a program is going to use. Then it occurred to me that Aladdin Knowledge Systems had provided the answer. hasp32b.obj! Oh, silly developer, link that thing right into your HASP Envelope Protected Executable and you might as well post the decryption key on your web site. And thank you, Aladdin Knowledge Systems, for encouraging the users of your product to do that linking because it completely and utterly compromises your envelope protection. Fools! It would have been so easy to avoid this by coming up with separate keys for each segment or to use a longer key or anything! They should hire some actual crackers there for some real beta testing. I'll do it freelance for them at $100 an hour ;-) (In fact, they can send me a check for many thousands of dollars for this essay).

Sorry, I'm getting a little excited, when really the truth is that this only works for some programs. For example, it doesn't work with Native Power Pack. This, however, seems to be an anomalous case. None of the encrypted dlls link hasp32b because they call into the waves runtime dll for the dongle checks and it is not encrypted. This seems to be an accident of the waves' implementation rather than any conscious attempt at increasing security. Linking hasp32b.obj into every dll would have unnecessarily increased the size of the plugins. Given this fact about NPP, I consider effectively uncrackable without the dongle. I hope that someone will take this as a challenge and try to figure out a way of cracking the envelope protection on a program that doesn't link hasp32b.obj, but I'm skeptical. So, how do I know that my method works? Well, there is one other envelope encrypted program that I've got and it, appropriately, is the envelope encryption program itself straight from hasp. It's called w32hinst.exe and is found in envelope.zip on hasp's ftp site. My technique works like a charm on it. However, applying the technique by hand is unbelievably tedious, so the trick is to write a program that does it for you.

At first, writing such a program seems fairly straightforward, but it has given me a number of headaches. I used the first 1005h bytes from the code section in hasp32b.obj as my known string. However, one must never forget about relocations. When a program references memory through an absolute address, the loader has to know how change that address in case it wants to load the program somewhere other than where the program expects to be loaded. This is accomplished through the relocation table which is essentially just a list of locations in the program that contain absolute address references. The loader then simply adds a displacement value to those locations if it needs to relocate the program. The reason this is relevant here is that everywhere in our known string of code where an absolute address is referenced, the value of the string is going to be different in the actual program than it is in hasp32b.obj. Luckily for us, though, the differences are systematic. Most of the references are references to locations in the data section of hasp32b.obj. That data section also gets linked into the program along with the code section, so what we need to do is figure out where it is in our envelope protected program. This is not hard to do. Consider the following line in hasp32b:

.text:00001000 8D 35 E1 22 00 00  lea     esi, ds:22E1
At exactly 1002h bytes into the code section, there is an absolute reference into the data section. The address is 22E1. We need to find out what that address is in our target. That can be done by xor'ing the 4 bytes which are 2 bytes from the start of the start of the known string with the bytes from the encrypted target at the same position. This gives us 4 bytes of the key at position 2. These 4 bytes of the key can now be xor'ed with the encrypted code at position 1002h to give us the address in the data section of the target that corresponds to 22E1 in hasp32b. We can now subtract 22E1 from our address in the target's data section to come up with an offset that can be added to the remaining absolute address locations in hasp32b.obj to put them in line with the target. (I just re-read this and realized that it's not too easy to understand. Look at the decryptor's source code below, which is much clearer.) Problem solved. Almost...

Consider, now, this line:

.text:00000FC8 E8 5E EE 00 00   call    _GetProcAddress@8
This is not an absolute reference, but a relative one. However, it refers to a part of the code that I thought for a while would be next to impossible to find. That part is the jump table used to get at the operating system routines and it is going to be in a different place in every target. Only the linker knows where it really is ... This means that we're 0Ch bytes away from getting the whole key (there are 3 such calls in the 1st 1000h bytes of code). At first I thought that the way to solve this problems was to write incredibly twisted code to dig into the relocation segment and piece together those 12 bytes by fairly hairy heuristic devices. Wrestling with that is one of the things that has delayed this article so much. I'm almost embarassed to admit all of this because the solution is actually quite easy and obvious. I'm an utter idiot for not seeing it immediately. All three of the OS calls within the first 1000h bytes are calls to GetProcAddress. So, all we need to know is where the jump table entry to GetProcAddress is and we can reconstruct the relative calls. Well, guess what? GetProcAddress is called from a lot of other places also where we know what the key is. For example,
.text:0000100D E8 19 EE 00 00   call    _GetProcAddress@8
All we need to is get the key at position 0Dh and work out the location of GetProcAddress in our target just like we worked out the absolute address problem above. The rest is fairly self explanatory. The trick now is to put it all together into a program. I've done a fairly rough job of this and have included the code below. The code has only been tested with M$ VC++ 5 (sorry, it's better than Borland and it's what I've got), but it should work with most compilers. It's C++ code, but doesn't use too many of C++'s features. It shouldn't be too hard to convert to C if anyone has a preference. Writing it in ASM seemed to me to be completely unnecessary and unduly complicated. ASM is nice, but it has its time and place. Further, the program only checks for the one version of hasp32b.obj (which I think is M$ specific, but I have to look into that more). Anyway, I'm planning a new version of the program (which I'm calling "Letter Opener" -- get it? :) to handle more compilers and to handle version 5 of the HASP software. That will be along at some point.

The last thing is to see if we can get w32hinst.exe running. Before getting into that, though, there are a few things to mention. First, w32hinst.exe may well not depend on return codes from the hasp to do it's decryption, depending instead on the mere presence of a hasp and then using a hard coded encryption key. The documentation for the envelope (which is fairly poor) suggests that this is a possible option and given that w32hinst is supposed to work with any hasp key it seems likely. However, this does not compromise the effectiveness of my decryptor because it assumes no knowledge of the decryption data, whether hard-coded or dongle-generated. Second, my interest when embarking on dongle cracking was to try to figure out what values the dongle returned based on how the program used those values (for variable initialization or what have you). This I thought would require getting deep into the target and understanding parts of it separate from the protection scheme per se. I still hope to be able to do this, although from some recent essays on this site it looks like programmers don't know a damn thing about implementing dongle protection (why are they wasting their money on them?). I was hoping that w32hinst would have further protection beyond the envelope but, as you're about to see, it doesn't appear to. Oh, well. Let's get back to work.

Running w32hinst through my decryptor generates an almost usable exe. There are a couple of things left to be done. First, we need to make two changes to the data directory at the end of the PE optional header. The locations of the import table and the relocation need to be changed to point to the sections in the original file image, rather than the locations in the envelope code. It is fairly easy to figure out which sections are the .reloc and .idata sections in the original file by looking at the file in a hex editor. Patching the data directory is fairly straightforward. Doing this now allows IDA to properly read the file. Second, the program entry point needs to be changed in the PE header to keep it from pointing into the envelope code. But where do we tell it to start? IDA can answer this question fairly quickly with its run-time library recognition feature. It identifies the standard libraries' start-up code and marks it with an appropriate name. The start address is 418130.

That ought to do it. Now the moment of truth. Double-click on your newly patched w32hinst.exe. It works. That's it.

(of course, there's not much you can do with it - you don't have a hasp)
Final Notes

Well, Waves Native Power Pack would take a long time to crack without the dongle, but this may have just been dumb luck on their part. On the other hand, I got a lot farther cracking dongle protection without having the dongle than I thought possible. That was all I wanted anyway. I should mention now that there is still more work to be done with hasp envelope protection. I have not even looked at version 5 yet. Perhaps they have increased the length of the key, but I doubt it. Furthermore, it is not absolutely essential that the target link hasp32b.obj. It is possible, although my guess is that it would take a couple of weeks, to determine the key by hand. Between the highly predictable nature of the relocation table, the function names in the import table, the library routines and the fact that the code has to make sense, it would not be too hard (just very time consuming) to reconstruct the key. I suppose it's up to you to decide whether this is worth it. Also, there are three more functions called after the decryption is complete. These deal with fixing the relocation table (this is not necessary if you patch the PE header appropriately) and inserting dongle checks at appropriate places (again, my method wipes these out altogether) among other things. The only thing that's holding me back is that I'm having a hard time getting ahold of any other envelope protected programs. I fanyone knows where to get any on the web, please let me know. I'm dying to see how well the decryptor works. Only one test target is not enough.
Small Add-on
I have been reading about encryption (I realized that I really ought to know something about it) and have discovered that both hasp and +RCG make a well known mistake in their encryption schemes: they repeatedly use an xor key. An encryption scheme along +RCG's lines which did not make this mistake and which manually did the necessary relocations would be crackable in only two ways:

1. find the key (basically impossible)
2. reconstruct the code of the function (much more interesting and more possible, but still very tough as long as a non-trivial function is discovered).

I hope to have an essay about this soon demonstrating how this sort of protection can be done in a high-level language (C/C++) without using VxDs (which I disapprove of for reasons I will give in the essay -- for one thing they don't work in NT) and in such a way that it would possible to create a generic method for applying the protection. The software author wouldn't have to even understand the protection method. It could be packaged up a la TimeLock, etc.
Code Snippets

1C440B85   mov     eax, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440B88   add     eax, 0Ch
1C440B8B   push    eax
1C440B8C   mov     eax, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440B8F   add     eax, 8
1C440B92   push    eax
1C440B93   mov     eax, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440B96   add     eax, 4
1C440B99   push    eax
1C440B9A   mov     eax, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440B9D   push    eax
1C440B9E   mov     eax, dword ptr [ebp+pword2]
1C440BA1   and     eax, 0FFFFh
1C440BA6   push    eax
1C440BA7   mov     eax, dword ptr [ebp+pword1]
1C440BAA   and     eax, 0FFFFh
1C440BAF   push    eax
1C440BB0   mov     eax, [ebp+var_4]
1C440BB3   push    eax
1C440BB4   mov     eax, dword ptr [ebp+SeedCode]
1C440BB7   and     eax, 0FFFFh
1C440BBC   push    eax             ; Seed Code == 3FCF
1C440BBD   mov     eax, [ebp+HASPCode] ; either HASPCode or NetHASPCode
1C440BC0   push    eax
; CallHasp ( Service, SeedCode, LptNum, Pass1, Pass2, p1, p2, p3, p4 )
1C440BC1   call    _CallHasp@36Pa
1C440BC6   add     esp, 24h
1C440BC9   mov     eax, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440BCC   add     eax, 1Ch
1C440BCF   push    eax
1C440BD0   mov     eax, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440BD3   add     eax, 18h
1C440BD6   push    eax
1C440BD7   mov     eax, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440BDA   add     eax, 14h
1C440BDD   push    eax
1C440BDE   mov     eax, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440BE1   add     eax, 10h
1C440BE4   push    eax
1C440BE5   mov     eax, dword ptr [ebp+pword2]
1C440BE8   and     eax, 0FFFFh
1C440BED   push    eax
1C440BEE   mov     eax, dword ptr [ebp+pword1]
1C440BF1   and     eax, 0FFFFh
1C440BF6   push    eax
1C440BF7   mov     eax, [ebp+var_4]
1C440BFA   push    eax
1C440BFB   mov     eax, dword ptr [ebp+SeedCode]
1C440BFE   and     eax, 0FFFFh
1C440C03   dec     eax
1C440C04   push    eax             ; Seed Code == 3FCE
1C440C05   mov     eax, [ebp+HASPCode] ; either HASPCode or NetHASPCode
1C440C08   push    eax
; CallHasp ( Service, SeedCode, LptNum, Pass1, Pass2, p1, p2, p3, p4 )
1C440C09   call    _CallHasp@36  
1C440C0E ; At this point, var_28 points to 20h bytes filled with responses 
1C440C0E ; from the HASP to two SeedCode queries:  3FCF and 3FCE
1C440C0E   add     esp, 24h
1C440C11   mov     [ebp+loop_counter], 0
1C440C17   jmp     loc_1C440C20
1C440C1C ; for (WORD i=0; i<8; i++) {
1C440C1C ;     *(var_20 + i) = (WORD) *(var_28 + i)
1C440C1C ; }
1C440C1C loc_1C440C1C:
1C440C1C   inc     [ebp+loop_counter]
1C440C20 loc_1C440C20:
1C440C20   mov     eax, dword ptr [ebp+loop_counter]
1C440C23   and     eax, 0FFFFh
1C440C28   cmp     eax, 8
1C440C2B   jge     loc_1C440C54
1C440C31   mov     eax, dword ptr [ebp+loop_counter]
1C440C34   and     eax, 0FFFFh
1C440C39   mov     ecx, [ebp+var_28] ; ptr to 20h byte chunk  (DWORD*)
1C440C3C   mov     eax, [ecx+eax*4]
1C440C3F   mov     ecx, dword ptr [ebp+loop_counter]
1C440C42   and     ecx, 0FFFFh
1C440C48   mov     edx, [ebp+var_20] ; ptr to 10h byte chunk (WORD*)
1C440C4B   mov     [edx+ecx*2], ax
1C440C4F   jmp     loc_1C440C1C
; At this point, the 20h byte block has been paired down to 10h bytes 
; by stripping out the highorder word from each response (the responses
; are only in the low order word).  That 10h byte block is pointed to
; by var_20.
1C445900 sub_1C445900    proc near
1C445900 buffer  = dword ptr -0Ch ; allocated in this routine
1C445900                          ; size= arg_size; zeroed
1C445900 cpBuffer= dword ptr -8
1C445900 var_4   = dword ptr -4   ; RandPool object init'd with cpBuffer data,
1C445900                          ; which came from hasp
1C445900 arg_0   = dword ptr  8   ; 1C44D1BA - only used here to check a flag
1C445900 arg_4   = dword ptr  0Ch ; *buffer with data to be decrypted
1C445900 arg_size= dword ptr  10h
1C445900   push    ebp
1C445901   mov     ebp, esp
1C445903   sub     esp, 0Ch
1C445906   push    ebx
1C445907   push    esi
1C445908   push    edi
1C445909   mov     [ebp+cpBuffer], offset cpBuffer ; this gets initialized
1C445909                             ; by the 2 responses;  10h bytes long
1C445910   mov     eax, [ebp+arg_0] ; 1C44D1BA - only used here to check a flag
1C445913   mov     ax, [eax+336h]  ; 1C44D4F0
1C44591A   and     eax, 0FFFFh
1C44591F   test    eax, eax
1C445921   jz      loc_1C4459AC    ; no take
1C445927   mov     eax, [ebp+arg_size]
1C44592A   and     eax, 0FFFFh
1C44592F   push    eax
1C445930   push    40h             ; GMEM_ZEROINIT
1C445932   call    ds:GlobalAlloc
1C445938   mov     [ebp+buffer], eax ; allocated in this routine
1C445938                           ; size= arg_size; zeroed
1C44593B   cmp     [ebp+buffer], 0 ; allocated in this routine
1C44593B                           ; size= arg_size; zeroed
1C44593F   jnz     loc_1C44594F    ; good -- allocation succeeded
1C445945   mov     eax, 1
1C44594A   jmp     loc_1C4459B3
1C44594F loc_1C44594F:
1C44594F   push    10h
1C445951   mov     eax, [ebp+cpBuffer]
1C445954   push    eax
1C445955   call    sub_1C4459C0    ; takes you into randpool stuff
1C445955                           ; ?? constructor
1C44595A   add     esp, 8
1C44595D   mov     [ebp+var_4], eax ; RandPool object init'd with cpBuffer
1C44595D                            ; data, which came from hasp
1C445960   mov     eax, [ebp+arg_size]
1C445963   and     eax, 0FFFFh
1C445968   push    eax
1C445969   mov     eax, [ebp+buffer] ; allocated in this routine
1C445969                             ; size= arg_size; zeroed
1C44596C   push    eax
1C44596D   mov     eax, [ebp+arg_4] ; *buffer with data to be decrypted
1C445970   push    eax
1C445971   mov     eax, [ebp+var_4] ; RandPool object init'd with cpBuffer
1C445971                            ; data, which came from hasp
1C445974   push    eax
1C445975   call    sub_1C4459E0
1C44597A   add     esp, 10h
1C44597D   mov     eax, [ebp+arg_size]
1C445980   and     eax, 0FFFFh
1C445985   push    eax
1C445986   mov     eax, [ebp+buffer] ; allocated in this routine
1C445986                             ; size= arg_size; zeroed
1C445989   push    eax
1C44598A   mov     eax, [ebp+arg_4] ; *buffer with data to be decrypted
1C44598D   push    eax
1C44598E   call    _memcpy
1C445993   add     esp, 0Ch
1C445996 ; the next 2 calls clean up the locally allocated object and buffer
1C445996   mov     eax, [ebp+var_4] ; RandPool object init'd with cpBuffer
1C445996                            ; data, which came from hasp
1C445999   push    eax
1C44599A   call    sub_1C445A20     ; RandPool::~Randpool()
1C44599A                            ; // takes ptr to randpool obj in eax
1C44599F   add     esp, 4
1C4459A2   mov     eax, [ebp+buffer] ; allocated in this routine
1C4459A2                             ; size= arg_size; zeroed
1C4459A5   push    eax
1C4459A6   call    ds:GlobalFree
1C4459AC loc_1C4459AC:
1C4459AC   xor     eax, eax
1C4459AE   jmp     $+5
1C4459B3 loc_1C4459B3:
1C4459B3   pop     edi
1C4459B4   pop     esi
1C4459B5   pop     ebx
1C4459B6   leave
1C4459B7   retn
1C4459B7 sub_1C445900    endp
1C4459E0 sub_1C4459E0    proc near
1C4459E0 arg_a_opHandle  = dword ptr  0Ch        ; randpool object
1C4459E0 arg_4           = dword ptr  10h        ; source for xoring
1C4459E0 arg_a_cpBuffer  = dword ptr  14h        ; destination for xoring
1C4459E0 arg_size        = dword ptr  18h        ; size of src and dest
1C4459E0   push    ebx
1C4459E1   push    esi
1C4459E2   mov     esi, [esp+8+arg_a_cpBuffer] ; destination for xoring
1C4459E6   push    edi
1C4459E7   mov     edi, [esp+0Ch+arg_4] ; source for xoring
1C4459EB   mov     ebx, [esp+0Ch+arg_size] ; size of src and dest
1C4459EF   mov     eax, [esp+0Ch+arg_a_opHandle] ; randpool object
1C4459F3   push    ebx
1C4459F4   push    esi
1C4459F5   push    eax
1C4459F6   call    sub_1C445D90    ; ?? stir
1C4459FB   add     esp, 0Ch
1C4459FE   lea     eax, [esi+ebx]
1C445A01   cmp     esi, eax
1C445A03   jnb     short loc_1C445A14
1C445A05 loc_1C445A05:
1C445A05   mov     ecx, [edi]
1C445A07   add     esi, 4
1C445A0A   add     edi, 4
1C445A0D   xor     [esi-4], ecx    ; well, this is it-- this is where the
1C445A0D                           ; code gets decrypted--
1C445A10   cmp     esi, eax
1C445A12   jb      short loc_1C445A05
1C445A14 loc_1C445A14:
1C445A14   pop     edi
1C445A15   pop     esi
1C445A16   pop     ebx
1C445A17   retn
1C445A17 sub_1C4459E0    endp
1C445D50 sub_1C445D50    proc near               ; CODE XREF: sub_1C4459C0+A
1C445D50 cpBuffer        = dword ptr  8     ; I got these names out of asserts
1C445D50 lPoolSize       = dword ptr  0Ch   ; found elsewhere
1C445D50   push    esi
1C445D51   push    14h
1C445D53   mov     esi, 0
1C445D58   call    ??2@YAPAXI@Z    ; operator new(uint)
1C445D5D   add     esp, 4
1C445D60   test    eax, eax    ; check for valid pointer
1C445D62   jz      short loc_1C445D6F
1C445D64   push    10h             ; lPoolSize
1C445D66   mov     ecx, eax        ; ecx=our new ptr (to the
                                   ; newly created object)
1C445D68   call    sub_1C445A30
1C445D6D   mov     esi, eax
1C445D6F loc_1C445D6F:
1C445D6F   mov     eax, [esp+4+lPoolSize]
1C445D73   mov     ecx, [esp+4+cpBuffer]
1C445D77   push    eax
1C445D78   push    ecx
1C445D79   mov     ecx, esi
1C445D7B   call    sub_1C445AC0 ; do member initialization with args
1C445D80   mov     eax, esi
1C445D82   pop     esi
1C445D83   retn
1C445D83 sub_1C445D50    endp
1C445A30 sub_1C445A30    proc near
1C445A30 arg_a_lPoolSize = dword ptr  8
1C445A30   push    ebx
1C445A31   xor     eax, eax
1C445A33   mov     [ecx+RandPool.pPool], eax       ; [ecx+4h]
1C445A36   mov     [ecx+RandPool.m_lCurrentDigest], eax  ; [ecx+0Ch]
1C445A39   mov     [ecx+RandPool.field_10], eax    ; [ecx+10h]
1C445A3C   mov     dword ptr [ecx], offset off_1C44C000
1C445A42   push    esi
1C445A43   mov     ebx, ecx
1C445A45   push    edi
1C445A46   mov     esi, [esp+0Ch+arg_a_lPoolSize]
1C445A4A   cmp     esi, 10h        ; MD5SIZE == 10h
1C445A4D   jnb     short loc_1C445A63
1C445A4F   push    20
1C445A51   push    offset aRandpool_cpp
1C445A56   push    offset aA_lpoolsizeMd5size ; "a_lPoolSize >= MD5SIZE"
1C445A5B   call    __assert
1C445A60   add     esp, 0Ch
1C445A63 ; this code creates a Pool, zeroes it out and stores
1C445A63 ; a ptr in RandPool+8
1C445A63 loc_1C445A63:
1C445A63   push    esi
1C445A64   mov     [ebx+8], esi
1C445A67   call    ??2@YAPAXI@Z    ; operator new(uint)
.... boring code to move pool into mem and save pointer
1C445A89   retn    4
1C445A89 sub_1C445A30    endp

The RandPool struct from IDA:
0000 RandPool        struc
0000 field_0         dd ?
0004 pPool           dd ?
0008 lPoolSize       dd ?
000C m_lCurrentDigest dd ?
0010 field_10        dd ?
0014 RandPool        ends