Programming: more precisely, this project is a LOGO language interpreter written in RUST language instead of a compiler. So the name of the blog will be changed to LOGO Interpreter in the future. (I’m pretty strict about it.)
Design Reference: “Implementing a LUA Interpreter Using Rust”
Design reference URL: https://wubingzheng.github.io/build-lua-in-rust/zh/PREFACE.html
Reference to the author’s github page: https://github.com/WuBingzheng
The main task of the interpreter is to convert the source LOGO code into executable Rust code. The original LOGO code has the following main components:
MAKE "DISTANCE "3
/////////////////////////////
Command(string) -> Parameter name(string) -> Parameter value(number)
/////////////////////////////
PENUP
/////////////////////////////
Command (string)
Let’s start with two basic concepts: “bytecode” and “value”.
Bytecode:
- Usually refers to intermediate code that has been compiled, but is not related to a specific machine code and needs to be translated by an interpreter to become machine code. Bytecode is usually not human-readable like source code, but is an encoded sequence of numeric constants, references, instructions, etc. (Personally, I think the RUST language is a good place to start with this. (Personally, I think the RUST language takes on the role of bytecode here, as a kind of intermediate code between the source code (LOGO) and the executable machine code (binary file)).
Values:
- Personally, I think of them as variables. Fortunately, in our program, values can only be numeric or null, and any incoming character type will be defined as an error.
Starting the project straight away, divide the project into several large sections:
Program Entry:
- Lexical analysis,
- syntax analysis,
- virtual machine,
- Byte codes and values
The program entry has been given in start code, the exact code and explanation is given below:
use clap::Parser;
use unsvg::Image;
#[derive(Parser)]
struct Args {
file_path: std::path::PathBuf,
image_path: std::path::PathBuf,
/// Height
height: u32,
/// Width
width: u32,
}
fn main() -> Result<(), ()> {
let args: Args = Args::parse();
let file_path = args.file_path;
let image_path = args.image_path;
let height = args.height;
let width = args.width;
let image = Image::new(width, height);
match image_path.extension().map(|s| s.to_str()).flatten() {
Some("svg") => {
let res = image.save_svg(&image_path);
if let Err(e) = res {
eprintln!("Error saving svg: {e}");
return Err(());
}
}
Some("png") => {
let res = image.save_png(&image_path);
if let Err(e) = res {
eprintln!("Error saving png: {e}");
return Err(());
}
}
_ => {
eprintln!("File extension not supported");
return Err(());
}
}
Ok(())
}
Here will be the source file path, generate the path of the picture, the width and height of the picture read in a struct, the latter two paragraphs mainly in the generation of the picture of the preservation, and in the preservation of the unsuccessful throw an error. Here you can see, the code does not contain the file read, need to write my own (FUCK).
Start with a very simple logo program, and then add annoying details, logo code is as follows:
PENDOWN
FORWARD "50
BACK "80
////////////////////////
begin to draw, forward 50 then back 80
A stack structure can be constructed to hold each token (command, value, and variable name are all tokens), and the stack structure can be represented in the following diagram:
------------
| FORWARD | <- token 1 (command)
------------
| "50 | <- token 2 (value)
------------
| |<- token 3
------------
| |
------------
-first load the global variable named FORWARD into the stack (0);
-Then load the string constant (represented by the new struct) "50" into stack position (1);
-then execute the function at stack (0) location with stack (1) location as an argument.
I came up with a simple way to represent tokens: read the entire statement as a string and then store it in a stack divided by spaces. The syntax is relatively simple in this simple example, and the commands can be divided into different categories based on the number of tokens read:
- One token: PENUP, PENDOWN
- Two tokens: FORWARD, BACK
In fact that’s what I did. To illustrate the further design, here is another analysis of bytecode. The definition of bytecode has been given above, that is, a transformation of source code and machine code intermediary, although here you can directly use RUST as a bytecode, but I have not been able to think of how to directly convert the rusty, might as well redefine a bytecode. Take this sentence in the original LOGO file as an example:
FORWARD "50
To execute this sentence, we need to do the following things expressed clearly in the above illustration, its involves three different bytecodes:
- GetGlobal(i32,i32): get the global variable name (position in the stack, position of the global variable in the constant table)
- Loadconst(i32,i32): load constant (position in stack, position of constant in constant table)
- Call(i32,i32):call function (location of called function in stack, location of arguments in stack)
The execution logic of the code is:
- Read the LOGO instruction
- transform it into a syntax tree structure (ParseProto): contains the constant table and bytecode
- transfer the generated syntax tree to the virtual machine for execution.
Where the syntax tree is defined as follows:
pub fn load(input: File) -> ParseProto {
let mut constants = Vec::new(); // constant table
let mut byte_codes = Vec::new(); // Byte codes
let mut lex = Lex::new(input);
//Lex is the lexical structure to transfer to
} // Simplified version of code, not necessarily accurate
All the VM needs to do is to construct a HashMap that corresponds the command functions (commands) passed in from the organized syntax tree to the command functions in the Rust code that perform the same task. The following structure can be constructed to clearly indicate the result:
pub struct ExeState {
globals: HashMap<String, Value>, //Table of global variable equivalents
stack: Vec::<Value>, // this is the stack -> stack
}
Then iterates through all the bytecodes in the incoming syntax tree and takes different actions based on the bytecodes. That’s the general idea, let’s get started on the implementation.
