You are here: Home / Projects / OSADL Toolbox / 
2024-10-15 - 03:04

Disjunctify: Pick disjunct files from two directories

Background and goal of the project

The program is intended primarily to facilitate license curation of a project when an updated version is available and only a minority of files were changed so the existing license clearance data can partly be reused. The files of the original version and the files of the new version are passed to the program. The script will then compare each and every file of the two directories and only copy new or modified files of the new version to an output directory. (In set theory, this file set is referred to as disjunct.) Only the files in the output directory need then to be scanned and evaluated to generate licensing data, while the existing data of the remaining files can be reused.

Command line arguments

./disjunctify.py --help
usage: disjunctify.py [-o DIR] [-n DIR] [-d DIR] [-p DIR] [-f] [-m] [-q] [-u] [-v] [-h]

options:
  -o DIR, --old DIR     path to input directory with original files (required)
  -n DIR, --new DIR     path to input directory with new files (required)
  -d DIR, --disjunct DIR
                        path to output directory with extracted disjunct files (required)
  -p DIR, --preserve DIR
                        preserve removed or modified and renamed files in this directory
  -f, --force           clean up output directory if existing and not empty
  -m, --md5sum          print MD5 sum of all files (for debugging purposes)
  -q, --quiet           do not create any output, if not explicitly requested
  -u, --unidiff         generate unified diff between modified files at the same relative location
  -v, --verbose         be verbose
  -h, --help            show this message and quit

General principle of the program's functionality

Graphical representation of the program's inputs and output
Graphical representation of the program's inputs and output

After running the script Disjunctify.py and passing the directory of the files in the petrol circle set as old package (-o command line argument) and the directory of the files in the orange circle set as new package (-n command line argument), the output directory (-d command line argument) will only contain the files in the orange crescent at the bottom of the image. The relative position of the original files in the directory hierarchy is maintained in the output directory.

Optionally, unified context-sensitive diffs of the modified files can be created and a directory can be specified to take files of the old version that are no longer used in the new version or have been renamed and modified. This set corresponds to the files in the petrol crescent at the top of the image.

Example output using the test directories of the repository

disjunctify.py -o test/old -n test/new -d test/out
File 'test/new/file9' was modified
File 'test/old/subdir/file2' has moved to 'test/new/file3'
File 'test/new/file1' was modified
File 'test/new/subdir/file10' is new at this location
File 'test/old/file4' has moved to 'test/new/subdir/file5'
File 'test/old/subdir/fileX' was removed or modified and renamed

Access to the software

The disjunctify.py script is hosted on OSADL's Git server https://git.osadl.org/cemde/disjunctify where instructions how to use, download and install the software are given. Please send merge requests, if you fixed bugs or added new features. Requests for information on the project and support may be directed to officeªosadl.org.