# First steps in Nom: Parsing pseudo GLSL

I am currently working on my rendering engine and I always wanted to streamline my shader pipeline. I want to detect .glsl files, parse them, extract the type information and generate Rust bindings inside the build script.

GLSL can look like this:

#version 400
layout(location = 0) uniform vec4 color;
layout(location = 1) uniform mat4 mvp;

layout(location = 0) in vec4 pos;
in vec2 uv;

layout(location = 0) out vec3 test;
void main(){
//...
}


Here we have two uniform variables of type vec4 and mat4, a vec4 and a vec2 as input and a vec3 as output for a vertex shader. Before today I had never really written any parser besides for CSV or OBJ and I was always scared of it because I know how complex they can get. These are my first steps in Nom.

We are going to parse the GLSL code from above. I started by defining an enum.

#[derive(Debug)]
pub enum Glsl {
Version(u32),
Input(Option<u32>, String, String),
Output(Option<u32>, String, String),
Uniform(Option<u32>, String, String),
}


I just treat Input, Output and Uniform the same for now. There are edge cases that I just ignore for now for example we will not parse uniforms that are directly initialized nor array types.

We start by parsing the Version:

named!(glsl_version<u32>,
do_parse!(
tag!("#version") >>
opt!(space) >>
number: map_res!(
digit,
std::str::from_utf8
) >>
opt!(multispace) >>
(number.parse::<u32>().unwrap())
)
);


glsl_version<u32> will create a function with the name glsl_version and a return type of u32. We then look for a specific string that matches #version. It follows by 0 or more spaces which we express with opt!(space), then we extract n characters that are digits into a variable called number. multispace recognizes spaces, tabs, carriage returns and line feeds. After that we parse the number into a u32. I call .unwrap() here because it shouldn't fail.

I will do something really hacky that you should probably never do in production code but it helps us to get started.

named!(glsl_alt<Option<Glsl>>,
alt!(
glsl_version => { |n| Some(Glsl::Version(n)) }
| take!(1) => { |_| None }
)
);


alt! is a conditional parser. It will try to execute the parser in order. If glsl_version fails we execute the take!(1) parser which consumes 1 byte and returns None. This is actually not the right place for glsl_version as the #version specifier should only occur at the top, but I will let it stay there for now.

This parser only executes once which in the case of #version would be enough but we still need to parse in, out and uniform.

named!(parse_glsl<&[u8], Vec<Option<Glsl>> >, many0!(glsl_alt));


many0 will execute the parser repeatedly and will write the results into a Vec.

layout(location = 0) uniform vec4 color;


First we will write a parser for the optional layout specifier.

named!(glsl_location<u32>,
do_parse!(
tag!("layout") >>
opt!(space) >>
tag!("(") >>
opt!(space) >>
tag!("location") >>
opt!(space) >>
tag!("=") >>
opt!(space) >>
location: map_res!(
digit,
std::str::from_utf8
) >>
opt!(space) >>
tag!(")") >>
opt!(space) >>
(location.parse::<u32>().unwrap())
)
);


I am sure this could be written more elegantly but it should do the trick. It is very similar to the version parser. Because I pretend that in, out and uniform are the same we will create a macro that generates a parser to avoid code duplication.

macro_rules! glsl_gen(
($name: ident,$i: expr) => (
named!($name<(Option<u32>, String, String)>, do_parse!( location: opt!(glsl_location) >> tag!($i) >>
opt!(space) >>
type_name: map_res!(
alphanumeric,
std::str::from_utf8
) >>
opt!(space) >>
name: map_res!(
alphanumeric,
std::str::from_utf8
) >>
opt!(space) >>
char!(';') >>
opt!(multispace) >>
(location,
FromStr::from_str(type_name).unwrap(),
FromStr::from_str(name).unwrap())
)
);
)
);

glsl_gen!(glsl_in, "in");
glsl_gen!(glsl_out, "out");
//Note: Doesn't parse if the uniform has a default initialization
glsl_gen!(glsl_uniform, "uniform");


First we use the glsl_location parser that we just created and write the value into the variable location. location is of type Option<u32>. After that we basically do the same thing as we did in glsl_version and glsl_location. The only difference is that we return a tuple of type (Option<u32>, String, String).

Now we can update glsl_alt.

named!(glsl_alt<Option<Glsl>>,
alt!(
glsl_version => { |n| Some(Glsl::Version(n)) }
| glsl_in => { |(loc, ty, name)| Some(Glsl::Input(loc, ty, name)) }
| glsl_out => { |(loc, ty, name)| Some(Glsl::Output(loc, ty, name)) }
| glsl_uniform => { |(loc, ty, name)| Some(Glsl::Uniform(loc, ty, name)) }
| take!(1) => { |_| None }
)
);


I hope you can see why I have written glsl_alt that way, it made it easy to get started but it comes with some problems. Every byte that fails to parse will add a None into a Vec.

fn main() {
let s = "
#version 400
layout(location = 0) uniform vec4 color;
layout(location = 1) uniform mat4 mvp;

layout(location = 0) in vec4 pos;
in vec2 uv;

layout(location = 0) out vec3 test;
void main(){
//...
}
";
if let IResult::Done(_, o) = parse_glsl(s.as_bytes()){
let v: Vec<Glsl> = o.into_iter().filter(|x| x.is_some()).map(|x| x.unwrap()).collect();
println!("{:?}", v);
}
}


For now I am just going to filter all those None's.

[Version(400),
Uniform(Some(0), "vec4", "color"),
Uniform(Some(1), "mat4", "mvp"),
Input(Some(0), "vec4", "pos"),
Input(None, "vec2", "uv"),
Output(Some(0), "vec3", "test")]


For the first day of using Nom I call this a success, it was much easier than I thought. Obviously I completely ignored errors and edge cases and tons of other stuff but this is work for another day.