Character dump program using XSL-FO

This Python utility dumps the character contents of a supplied file assuming a particular character encoding for the bytes in the file. Each character in the file is displayed in a large font, with the corresponding hexadecimal Unicode character code point displayed below.

Some Unicode non-displayed characters are rendered using their abbreviated name, such as the C0 control characters and the Unicode directionality characters LRE, RLE, LRO, RLO and PDF. This utility was created specifically to debug streams of characters incorporating these direction changing special characters.

Because of the powerful text formatting features of XSL-FO, the resulting information is an XSL-FO 1.0 conformant XML instance ready for processing by an XSL-FO engine. No vendor extensions are used. This is an example of using XSL-FO independent of XSLT in that this Python program generates XSL-FO directly, ready for processing.

The file is downloaded here: dumpfo-20021112-2050.zip

An example session using the test files would be along the lines of:

T:\ftemp>python dumpfo.py

Usage:  [-opt]* filename

 -lf           = act on linefeed as well as interpret linefeed
 -a4           = A4 page size
 -a4l          = A4 landscape page size (default)
 -us           = US letter page size
 -usl          = US letter landscape page size
 -e {encoding} = character encoding system used in the input file
               = default: Latin1
               = other typical values: utf-8, utf-16, utf-16-le, utf-16-be

T:\ftemp>python dumpfo.py -lf dtest1.txt >t:\d.fo

T:\ftemp>python dumpfo.py -lf dtest1.txt >dtest1.fo

T:\ftemp>python dumpfo.py -lf dtest2.txt >dtest2.fo

T:\ftemp>python dumpfo.py -lf -e utf-8 dtest2.txt >dtest2b.fo

T:\ftemp>dir
2002-11-11  09:03             6,056 dtest1.fo
2002-11-10  12:54                34 dtest1.txt
2002-11-11  09:03             6,199 dtest2.fo
2002-11-10  12:57                35 dtest2.txt
2002-11-11  09:03             6,056 dtest2b.fo
2002-11-11  08:58             6,566 dumpfo.py
               6 File(s)         24,946 bytes

T:\ftemp>

Limitation: only those character sets supported by your Python registry of codecs (encoders and decoders) can be used.

Side note: the verb "to dump" regarding file contents is an old computing term used in diagnostics. The noun "a dump" of a file or memory is the diagnostic formatting of the file or memory contents. Dumps typically have hidden information that otherwise isn't seen in standard listings.

Crane logo
CRANE
SOFTWRIGHTS
LTD.
 TRAINING RSS XML 
 RESOURCES RSS XML 

Please consider to


towards our
free resources.

+1 (613) 489-0999 (Voice)
+1 (613) 489-0995 (Fax)

info@CraneSoftwrights.com


Link traversal: This web site relies heavily on client-side redirection. If certain links do not work for you, please ensure you have this behaviour enabled in your browser.

Site navigation:

Small print: All use of this web site and all business conducted with Crane Softwrights Ltd. is subject to the legal disclaimers detailed at http://www.CraneSoftwrights.com/legal ... please contact us if you have any questions. All trademarks, servicemarks, registered trademarks, and registered servicemarks are the property of their respective owners.

Link legend: links that are marked with this dotted underline will open up a new browser window, otherwise the same browser window is used for the link target. 

Last changed: $Date: 2006/12/28 00:05:31 $(UTC) (Privacy policy)