First steps in Nom: Parsing pseudo GLSL
2016-11-27I am currently working on my rendering engine and I always wanted to streamline my shader pipeline. I want to detect .glsl files, parse them, extract the type information and generate Rust bindings inside the build script.
GLSL can look like this:
#version 400
layout(location = 0) uniform vec4 color;
layout(location = 1) uniform mat4 mvp;
layout(location = 0) in vec4 pos;
in vec2 uv;
layout(location = 0) out vec3 test;
void main(){
//...
}
Here we have two uniform variables of type vec4 and mat4, a vec4 and a vec2 as input and a vec3 as output for a vertex shader.
Before today I had never really written any parser besides for CSV or OBJ and I was always scared of it because I know how complex they can get.
These are my first steps in Nom.
We are going to parse the GLSL code from above.
I started by defining an enum.
#[derive(Debug)]
pub enum Glsl {
Version(u32),
Input(Option<u32>, String, String),
Output(Option<u32>, String, String),
Uniform(Option<u32>, String, String),
}
I just treat Input, Output and Uniform the same for now. There are edge cases that I just ignore for now for example we will not parse uniforms that are directly initialized nor array types.
We start by parsing the Version:
named!(glsl_version<u32>,
do_parse!(
tag!("#version") >>
opt!(space) >>
number: map_res!(
digit,
std::str::from_utf8
) >>
opt!(multispace) >>
(number.parse::<u32>().unwrap())
)
);
glsl_version<u32> will create a function with the name glsl_version and a return type of u32. We then look for a specific string that matches #version. It follows by 0 or more spaces which we express with opt!(space), then we extract n characters that are digits into a variable called number. multispace recognizes spaces, tabs, carriage returns and line feeds. After that we parse the number into a u32. I call .unwrap() here because it shouldn't fail.
I will do something really hacky that you should probably never do in production code but it helps us to get started.
named!(glsl_alt<Option<Glsl>>,
alt!(
glsl_version => { |n| Some(Glsl::Version(n)) }
| take!(1) => { |_| None }
)
);
alt! is a conditional parser. It will try to execute the parser in order. If glsl_version fails we execute the take!(1) parser which consumes 1 byte and returns None. This is actually not the right place for glsl_version as the #version specifier should only occur at the top, but I will let it stay there for now.
This parser only executes once which in the case of #version would be enough but we still need to parse in, out and uniform.
named!(parse_glsl<&[u8], Vec<Option<Glsl>> >, many0!(glsl_alt));
many0 will execute the parser repeatedly and will write the results into a Vec.
layout(location = 0) uniform vec4 color;
First we will write a parser for the optional layout specifier.
named!(glsl_location<u32>,
do_parse!(
tag!("layout") >>
opt!(space) >>
tag!("(") >>
opt!(space) >>
tag!("location") >>
opt!(space) >>
tag!("=") >>
opt!(space) >>
location: map_res!(
digit,
std::str::from_utf8
) >>
opt!(space) >>
tag!(")") >>
opt!(space) >>
(location.parse::<u32>().unwrap())
)
);
I am sure this could be written more elegantly but it should do the trick. It is very similar to the version parser. Because I pretend that in, out and uniform are the same we will create a macro that generates a parser to avoid code duplication.
macro_rules! glsl_gen(
($name: ident, $i: expr) => (
named!($name<(Option<u32>, String, String)>,
do_parse!(
location: opt!(glsl_location) >>
tag!($i) >>
opt!(space) >>
type_name: map_res!(
alphanumeric,
std::str::from_utf8
) >>
opt!(space) >>
name: map_res!(
alphanumeric,
std::str::from_utf8
) >>
opt!(space) >>
char!(';') >>
opt!(multispace) >>
(location,
FromStr::from_str(type_name).unwrap(),
FromStr::from_str(name).unwrap())
)
);
)
);
glsl_gen!(glsl_in, "in");
glsl_gen!(glsl_out, "out");
//Note: Doesn't parse if the uniform has a default initialization
glsl_gen!(glsl_uniform, "uniform");
First we use the glsl_location parser that we just created and write the value into the variable location. location is of type Option<u32>. After that we basically do the same thing as we did in glsl_version and glsl_location. The only difference is that we return a tuple of type (Option<u32>, String, String).
Now we can update glsl_alt.
named!(glsl_alt<Option<Glsl>>,
alt!(
glsl_version => { |n| Some(Glsl::Version(n)) }
| glsl_in => { |(loc, ty, name)| Some(Glsl::Input(loc, ty, name)) }
| glsl_out => { |(loc, ty, name)| Some(Glsl::Output(loc, ty, name)) }
| glsl_uniform => { |(loc, ty, name)| Some(Glsl::Uniform(loc, ty, name)) }
| take!(1) => { |_| None }
)
);
I hope you can see why I have written glsl_alt that way, it made it easy to get started but it comes with some problems. Every byte that fails to parse will add a None into a Vec.
fn main() {
let s = "
#version 400
layout(location = 0) uniform vec4 color;
layout(location = 1) uniform mat4 mvp;
layout(location = 0) in vec4 pos;
in vec2 uv;
layout(location = 0) out vec3 test;
void main(){
//...
}
";
if let IResult::Done(_, o) = parse_glsl(s.as_bytes()){
let v: Vec<Glsl> = o.into_iter().filter(|x| x.is_some()).map(|x| x.unwrap()).collect();
println!("{:?}", v);
}
}
For now I am just going to filter all those None's.
[Version(400),
Uniform(Some(0), "vec4", "color"),
Uniform(Some(1), "mat4", "mvp"),
Input(Some(0), "vec4", "pos"),
Input(None, "vec2", "uv"),
Output(Some(0), "vec3", "test")]
For the first day of using Nom I call this a success, it was much easier than I thought. Obviously I completely ignored errors and edge cases and tons of other stuff but this is work for another day.