>Do either of you know of a unix program capable of performing binary >diff/patching operations? I'm sure that this program must exist. No. >We need >to transport many large files (average 750MB) back and forth between our >remote sites and Austin, but these files are not likely to change an awful >lot over time. Are they append only? If so you can hack a custom tool to do it. >Handling binary files is a "suggested project for improving GNU diff and >patch". Yes. Diff is line-based and thus diffing a binary file is a different problem. Text-based stuff (like programs) tend to have logical divisions for each line (not entirely true... consider "if (cond)" might change to "if (cond) {" when you add another statement to the "then block"). Thus line-based diffing exploits these logical divisions. It doesn't always work, especially when you have lots of similar lines. So... with binary files, you'd have to treat each byte as an individual unit. Making a minimum diff is an O(n^2) problem at minimum, where n is the number of bytes in the file. LZW compression does something similar, an analogous diff algorithm might be "good enough", but it is definitely a research project (it may have already been done). There's two alternatives; either binhex or base64 your data, and gzip the diffs, which won't do what you want if you are _inserting_ data of variable sizes into the middle. The other thing to do is to break the binary data into lines based on its contents; in other words, if it is an object-oriented database you need each object to convert itself to a text representation (or at least a newline-terminated string of binary data), and you need something inside the program to traverse the database and serialize the data into a stream. This is called "marshalling", and is non-trivial when objects have pointers to other objects. ... >The data is not append-only.. >all different parts of it are constantly changing, but from day-to-day, the >vast majority of items are staying the same (maybe just moving around) A simple PERL or C program will iterate through two copies and print the offset, length, and data for any area which changed. Patch works in reverse. It's a one-hour job. Note if data blobs move around, then it will create two "diff" sections. Determining what moved and where is hard.