"To master riding bicycles you have do ride bicycles"
started at 23/07/2025, agsb@
first version at 12/10/2025, @agsb
minimal dictionary compiled words at 04/12/2025, @agsb
Please, vide Changes and Notes
Any Forth system depends on the I/O functions and the executable linkable format (ELF) of the host system.
The problem is reach a functional minimal code forth engine for RISCV ISA.
This is an implementation of MilliForth (sector-forth) concept for RISCV ISA, using Minimal Indirect Thread Code.
Milliforth uses a minimal set of functions and primitives for make a Forth.
This version with minimal code (.text), uses only 454 bytes, 388 bytes for Forth engine and 66 bytes for linux system I/O. Not counting ELF headers. Used 56 bytes to load ELF PIC address and 44 bytes for word headers.
No human WORDS. It uses DJB2 hash in headers.
No Terminal Input Buffer, just an token-to-hash stream ascii parser.
Only use a IMMEDIATE flag, at MSBit (31) of hash, it also is NaN, used to indicate errors.
There are a file with more core words in native code to use.
How shink to a minimal compiled size in a Risc-V ?
1. do not need align, the size of opcodes is always 2 or 4 bytes;
2. choose registers to maximize use of compressed riscv opcodes;
3. warn the user about possible errors but abandon error checking;
4. use streams, no buffers;
5. do not speculate;
The sector-riscv.S is working, also the extra-milliforth.S,
could test by:
**cat t0.f t1.f t2.f - | sh doit.sh | tee output**
t0.f is a minimal set of words, same as test0-riscv.f;
t1.f is a complement with hash and more words;
the hiphen refers to terminal (/dev/tty)
Could test by:
cat t0.f | sh doit.sh | tee z1
cat t0.f t1.f | sh doit.sh | tee z2
t1.f includes <builds create variable constant does> (_STUB_)
PS.
Add a hyphen at end of cat files list to allow terminal I/O
cat t0.f t1.f t2.f - > sh doit.sh
Some esoteric bug makes the first word to have hash error.
The memory management is done by extend the dictionary
into .bss, by reserve .skip bytes, defaults to 64k * 4
no linux calls for memory allocation. (Anyone ?)
The source could be compiled with 'missed' hack and
more extensive native code word set.
"WE STI.. DON.. SEE THE NEE. FOR 31 CHA...... NAM.. IN THE GEN.... CAS."
The letter to the Editor of Forth Dimensions [Moore 1983] concerning the practice of storing names of Forth words as a count and first three characters,
A count and first three characters, four bytes was enough.
"AI uses hash code as word, Humans uses semantics as word" Liang Ng
In this century, computers uses hashes to compare contents, so why not use a 4 bytes hash to identify tokens ?
This version of milliforth uses 32-bit DJB2 hash. It provide a fast comparation in compilations and have small footprint.
For a 32-bit DJB2 hash, collisions become highly probable after approximately 65,536 items which requires a damn huge dictionary.
"The spice must flow"
Chuck executes or compiles each word individually rather than line by line. In fact Chuck doesn't really have lines. I will also go word by word rather than line by line in aha.
Jeff Fox ?
Why no Terminal Input Buffer ?
Forth is not a editor. Does not need of undo, redo, copy or paste.
The input is a stream, just flows tokens.
A token is being defined, has been defined, or has not been defined and Forth reacts.
This version uses DJB2 hash for dictionary entries, uses relatives branches and includes:
minimal primitives:
u@ return the address of user structure
0# if top of data stack is not zero returns -1 (0xFFFFFFFF)
+ adds two values at top of data stack
NAND logic not-and the two values at top of data stack
@ fetch a value of cell wich address at top of data stack
! store a value into a cell wich address at top of data stack
: starts compiling a new word
; stops compiling a new word
EXIT ends a word
KEY get a char from default terminal (stdin)
EMIT put a char into default terminal (stdout)
only internals:
main, cold, warm,
miss, abort, quit, warp,
token, skip, hash, scan, mask,
find, eval, compile, execute,
unnest, next, pick, jump, nest, move
comma, _init, _getc, _putc, _exit
ps. next is not the NEXT of FOR NEXT loop !
with externals, ecall to linux:
_getc, _putc, _exit, ( _fcntl, _init )
More words in native code are selectable in defines.S
Eg. extras:
;$ execute native code at instruction pointer (IP), vide Notes
NAN place 0x80000000 into stack
LSHIFT shift left a value by n bits
RSHIFT shift right a value by n bits
ABORT restart the Forth interpreter
BYE ends the Forth, return to system
. show the cell at top of data stack in hexadecimal
$ next token is a signed integer hexadecimal number to TOS
A full list of primitives in Word Lists
For Forth language primer see Starting Forth
For Forth from inside howto see JonasForth
For A Problem Oriented Language see POL