Stanford Lecture: The Internal Details of TeX82 - Session 1 (July 28, 1982)
By Unknown Author
Here's a comprehensive summary of the provided YouTube video transcript, maintaining the original language and technical precision.
Key Concepts
- TeX82: The specific version of TeX being discussed, emphasizing its portability and experimental nature.
- Web System of Structured Documentation: The system used for writing and documenting TeX in Pascal, allowing for both program generation and readable documentation.
- TANGLE: A program within the Web system that converts Web source files into Pascal programs.
- WEAVE: A program within the Web system that generates formatted documentation (books) from Web source files.
- Pascal: The programming language used for implementing TeX.
- DVI files (Device-Independent Output): The output format generated by TeX.
- Font Metric (TFM) files: Files containing information about font characteristics.
- String Pool File: A file where strings used in the program are stored.
- Modules: The fundamental building blocks of the TeX program, each with a unique number and a structured format.
- Macro Definitions: Mechanisms for defining symbolic names for constants, substitutions, and parameterized operations.
- Change File: A file used to introduce modifications to the Web source files without altering the original.
- Compiler Directives: Special commands recognized by Pascal compilers to control compilation behavior.
Course Overview and Objectives
This intensive three-day course focuses on the TeX82 system, presented as a unique opportunity to understand the system at its current developmental stage. The primary goal is to make participants familiar with the system's documentation and internal workings, enabling them to locate information and understand how different parts interact. While not aiming for complete proof of correctness, the course intends to provide a solid foundation for understanding the software's behavior. The speaker estimates that approximately 40 hours of dedicated reading would be sufficient to understand each module of the code.
Handouts and Resources
Participants are provided with three key handouts:
- TeX82 writeup: This document, intended for publication in a book, contains the TeX user manual (not yet finished) and a brief summary of the TeX82 language.
- The Web System of Structured Documentation: This handout details the system used for system programming in Pascal and portability.
- Techware: A collection of four auxiliary programs useful for installing TeX:
- Printing the string pool file.
- Converting font metric files to symbolic format.
- Converting symbolic format back to binary format.
- A symbolic printer for DVI files.
By the time the lectures are available on videotape, the first handout is expected to be published as a book, and the latter two are anticipated to be Stanford reports. Appendices to the TeX82 listing, relating to a test program for the system, are expected to be published in Tugboat.
Course Structure and Schedule
The course spans three days with four hours of instruction each day, adhering to a strict schedule:
- Morning: 9:30 AM - 10:30 AM, followed by a break until 11:00 AM. Then, 11:00 AM - 12:00 PM.
- Afternoon: 2:00 PM - 3:00 PM, followed by a break until 3:30 PM. Then, 3:30 PM - 4:30 PM.
Day-by-Day Breakdown of Topics
Day 1: Fundamental Ground Level Concepts
The first day focuses on foundational elements crucial for understanding the rest of the system:
- Reading Web Programs: Understanding how to navigate and interpret the documentation generated by the Web system.
- Representation of Strings: How strings are handled and stored within the program.
- Data Structures for Boxes and Glue: The internal representation of typographical elements like boxes and flexible spacing (glue).
- Representation of Control Sequences: How commands and instructions are encoded.
Day 2: Main Processing Routines
The second day delves into the core logic of TeX:
- Morning: Syntax routines (parsing input) followed by semantic routines (interpreting the parsed input).
- Afternoon: Paragraph breaking into lines and hyphenation algorithms.
Day 3: System-Dependent Issues and Portability
The final day addresses aspects critical for installing and adapting TeX to different environments:
- Scanning of File Names: Handling variations in file naming conventions across operating systems.
- Font Metric (TFM) Files: Understanding the internal structure of TFM files.
- DVI Files: The format and handling of device-independent output.
- Initialization and Bootstrapping: The process of getting the TeX system up and running.
The Web System and TANGLE/WEAVE
The "web" in the context of the Web system refers to a structured, networked approach to documentation, not a trap. The system uses source files that can be processed by two main tools:
- TANGLE: Converts Web source files into executable Pascal programs.
- WEAVE: Generates formatted documentation (books) from Web source files.
Illustrating TANGLE
The speaker demonstrates TANGLE by running it on TANGLE.web itself. The process involves:
- Input: Specifying the Web file (
TANGLE.web), an optional change file (used for local modifications, here left null), and an output file (tangle.psc). - String Pool: An optional pool file (
tangle.pool) is mentioned for storing strings longer than one character enclosed in double quotes. - Execution: TANGLE reads the Web file, processes it, and outputs Pascal code. It indicates progress by printing dots (100 lines of Pascal output per dot).
- Statistics: The process provides statistics on memory usage (e.g., 18,000 tokens for the TANGLE program in a compact form, plus byte memory).
The same process is then illustrated for TeX.web, highlighting that TeX is a much larger program. The output Pascal file is named tex.pascal, and the pool file is tex.pool.
Structure of the TeX82 Program Listing
The TeX82 listing, generated by WEAVE, has a specific structure:
- Table of Contents: The program is divided into 55 parts, with Part 1 being the introduction and Part 55 the index.
- Module Numbering: The program consists of 1,244 modules (expected to be 1,245 after debugging), each with a distinct number. Modules are typically presented several per page, with a large boldface number.
- Part and Module Identification: Each page indicates the part and module number (e.g., "part 35 module number 652").
- Index: The index lists every occurrence of every identifier. Underlined entries indicate where an identifier was defined.
- Identifiers: For example, "empty" is defined in module 588 and used in many other modules.
- One-Letter Identifiers: These are only listed with underlined references (definitions) to avoid excessive verbosity.
- Error Messages: The index helps locate the module where a specific error message is generated.
- Reserved Words: Pascal reserved words are generally not indexed unless they have a specific, noteworthy usage.
- Roman Type Entries: These are for broader topics, like "Chinese characters," directing the reader to relevant modules for discussion or suggestions.
Module Structure
Each module (except the index) follows a consistent three-part structure:
- TeX Part: Expository text explaining the overall idea, less obvious points, the code's environment, invariants, or reasons for design choices. Comments are intended to be interesting and informative.
- Definitions Part: Defines macros. This is a key aspect of understanding programs written in Web.
- Pascal Part: The actual Pascal code for the module. The speaker aims for a maximum of around 12 lines of code per module to ensure comprehensibility as a unit.
Modules can have any of these parts absent, but never more than one TeX part, and definitions always precede Pascal code.
Example: Module 180 (Displaying Glue Set)
Module 180, on page 59, is presented as an example of a small module. It displays the value of glue_set(p), which represents the ratio of glue expansion.
- Logic: It checks if
glue_set(p)is non-zero and prints "glue set". Ifglue_sign(p)indicates shrinking, it prints a minus sign. It also handles large values by printing "greater than 20,000" or "less than 20,000". - String Handling: The string "glue set" within double quotes is a literal string. TANGLE converts such strings into integers stored in the string pool file. The integer for "glue set" is 428 in this case.
- Internal Codes: Strings are converted to integers, with single-character strings using ASCII codes.
- Dependencies: This code is used in module 178 (
display_box).
Example: Module 178 (Display Box)
Module 178, display_box(p), breaks down the problem of displaying a box:
- Box Types: It checks if the box is an H-list node, V-list node, or an unset node.
- Output: For H/V list nodes, it prints 'H' or 'V' prefixed by a backslash. For unset nodes, it calls module 179 (
display_special_fields). Otherwise, it calls module 180 to display the glue set. - Shifting: It also checks for shifted boxes.
- Dependencies: This module refers to modules 179 and 180.
Example: Module 177 (Display Node)
Module 177, display_node(p), is a case statement that handles various node types:
- Cases: H-list node, V-list node, unset node (calling
display_box), rule, insertion. - Math Mode: A special case
cases_show_nodes_that_arise_in_maths_onlyis mentioned, which is not written directly in this part of the program as the reader is not expected to know about math mode yet. This part will be patched in later. - Unknown Node Type: A catch-all for unlisted node types.
- Dependencies: This module refers to module 176.
Module 176 is a bottom-level module, meaning it doesn't explicitly state where it's used, implying it's part of the core Pascal program structure.
Macro Definitions
There are three kinds of macro definitions in Web:
- Numeric Definition: Assigns a numerical value, computed at tangling time.
- TeX Substitution: Substitutes one identifier for another, with or without parameters. These are indicated by three arrows (
=>). - Parameterized Substitution: Substitutes with parameters.
Examples of Macro Definitions
membaseandtemp_head:temp_headis defined ashi_membase + 3, providing a mnemonic name for a memory location.unset_node: Defined as a numeric type (13).glue_shrink: Defined as a substitution forshift_amount.glue_stretch: Defined asmem(parameter) + 6.sc. This allows for symbolic access to fields within structured types, making the code more portable as only this definition would need to change if the memory representation changes.
Constant Macros and Limitations
- Size Restriction: Constant macros defined with
=have a size restriction (around 15 bits plus a sign) to allow for arithmetic operations without excessive parentheses. - Arithmetic: Macros defined with
=handle arithmetic correctly. Macros defined with=>can lead to issues if not used with parentheses (e.g.,-macrowheremacroisx + y).
Distinguishing Macros from Pascal Identifiers
The speaker notes that while reading the program, it can be challenging to distinguish between macro names and Pascal identifiers. However, clues include:
- Index: The index clearly shows where identifiers are defined.
- Identifier Uniqueness: In the final Pascal program, no two identifiers share the same first seven letters. This helps catch typographic errors and is a convention for macro names, which can be much longer and have agreeing prefixes. Long identifiers are a strong indicator of a macro.
Program Initialization and Structure
The program starts with compiler directives in module 9.
program TeX: The main program declaration.- Labels, Constants, Types, Variables: Declared in outer blocks.
M_type: A Web macro used to concatenate identifiers, defined ast & y & p & e. This is used becausetypeis a reserved word in Pascal. Aformatdefinition allowsM_typeto be treated liketypeandtypeto be treated liketruefor formatting purposes.- Procedures: Declarations for
initialize,basic_printing_procedures, anderror_handling_proceduresare made early. Error handling procedures are declared early because errors can occur anywhere, and printing procedures are needed even earlier to report errors. - Program Flow: Modules 1-3 have no Pascal text. Module 4 contains the first Pascal text. Subsequent modules build up procedures, and the program execution begins at a
start_heresection, which callsinitializeand other procedures.
System-Dependent Features and Compilation Options
- Version Control: Definitions in modules 7 and 8 allow for different versions of the program to be compiled.
- Debug and Backwards Debug: These directives, along with
statisticsandbackwards_statistics, allow for conditional compilation of code for debugging or performance monitoring. They use Pascal comment syntax (@ {and} @) to conditionally include or exclude code. - Initialization vs. Production: Two versions of TeX are discussed:
- INITEX: The initialization version, used to load hash tables, primitives, and preloaded formats into a condensed file for efficient startup. This version is not commented out by default.
- Production TeX: The version used for regular typesetting. To compile this, the
INITEXdirectives need to be commented out using braces.
- Change Files: System-dependent changes are managed through change files, which are merged with the Web source files during the TANGLE process. This avoids modifying the original
.webfiles. - Compiler Directives: Pascal compilers often recognize comments starting with a dollar sign (
$) for special actions (e.g., range checking).
Program Size and Statistics
- TeX.pascal: The generated Pascal program for TeX is approximately 4,556 lines long.
- Semicolons: The program contains around 10,928 semicolons, indicating approximately 11,000 statements and declarations.
- "Else" Occurrences: There are 664 occurrences of "else", bringing the total statements and declarations to nearly 12,000.
- Comparison: This size is slightly shorter than the combined P-tex and sysdek of the previous TeX version.
Change File Format and Usage
- Structure: A change file consists of an index page followed by pages of modifications. Each modification starts with a verbatim line from the original Web file, followed by the corrected line.
- Purpose: Change files allow for local modifications without altering the main Web source. They are particularly useful for system-dependent adjustments.
- Example: A change file might modify the version number or uncomment debugging directives.
- Memory Limits: Change files can also adjust parameters like
memory_maxfor debugging purposes. - Module Replacement: The
system_dependenciesmodule, located just before the index, is intended to be replaced by system-specific changes. This approach aims to minimize disruption to module numbering.
Macro Formatting and Pascal Constants
formatdirective: Used to control how macros are displayed in the documentation and how they are treated by the compiler. For example,format debug equivalent to beginmakesdebugappear with the same indentation and boldface asbegin.- Pascal Constants vs. Web Macros:
- Pascal Constants: Used for values that can be changed at compile time without affecting the Web file. recompiling after changing a Pascal constant should be straightforward.
- Web Macros: Used to circumvent Pascal's limitations, such as array boundaries that involve arithmetic (e.g.,
array_boundary - 1). TANGLE performs the arithmetic, and the resulting value is inserted into the Pascal code. This is also useful for defining symbolic names for memory structures that might not be directly representable as Pascal constants.
Conclusion and Next Steps
The course aims to provide a deep understanding of the TeX82 system through its structured documentation and internal code. The initial sessions focus on foundational concepts, followed by the core processing routines and system-dependent aspects. Participants are encouraged to engage with the provided materials and explore the code to gain a comprehensive grasp of TeX's functionality and implementation. The speaker emphasizes the importance of understanding how the system is documented and structured using the Web system.
Chat with this Video
AI-PoweredHi! I can answer questions about this video "Stanford Lecture: The Internal Details of TeX82 - Session 1 (July 28, 1982)". What would you like to know?