Posts Tagged ‘ lexer

Compiler Tutorial, part 2 – Lexing

In Part 1 of my compilers tutorial, I went over context-free grammars. Read that first, unless you only care about lexing.

Lexing, short for “lexical analyzing”, basically means “turn this string into something I can use”. Since I’m assuming you’ve programmed before, I’m sure you’ve had an occasion to take input from the user and turn it into smaller components you can use. Even if it’s a simple expression like “5+2”, you don’t get it as input in a nice format – but rather an obtuse string. If you’ve ever accepted input from the user and done anything other than print it right back out, you’ve written a lexer. Any sort of string-splitting or “understanding” the contents of a string requires a lexer. Even Java’s String.split(“, “) method is lexing, in a simple form.

Lexing is necessary for compilation, but also for many other programs. I’ve written too much input-handling code in my time, and I wish somebody had told me about lexing long ago. So this section of the tutorial is probably the most useful to areas other than compiler writing.

Let’s get started. Read more

Compiler Tutorial, part 1 – Introduction and Grammars

Rough few weeks – sorry for no post. But here’s something that people might find useful.

This semester at Penn, I’m taking a compilers course (CIS341). It’s been a blast – we’re developing a compiler for a language called OAT (as in Quaker Oats) that’s ‘objects, arithmetic, and types’. Due Monday is the object compiler, and the language is done in its entirety.The last two projects concern optimizations (register allocation, constant propagation, etc).

Writing compilers, for anybody who hasn’t done it before, involves a tremendous amount of work. Because compilers have always seemed like magic to me, I’m going to attempt to demystify them.

This will be a several-part tutorial written as I have time, so check back often.

I will assume you know something about programming, and are willing to learn a crash-course in formal languages, but I’ll try to keep it simple. By the end, we’ll have a reasonably interesting compiler for basically the language I’m writing now. The only thing is, we’ll do it in C instead of OCaml, which is what I’m using now. This is partly because most compilers are written in C, or their own languages, and partly because I don’t want to enable cheating.

Alright! Here we go. To start, we need to discuss formal languages – what they are, what they can do, and how to work with them. Next time, we’ll get into actual code. Read more