Save Ukraine

A Rust adventure

Christian Kruse,

tl;dr: Rust rocks

During the 31C3 I started learning Rust. I wanted to for a long time, but finally I had the time to do so: a compiled, system programming language featuring static type safety, a type system inspired by Haskell's type classes and easy parallelism with nearly no runtime overhead? Sounds like a dream comes true!

So I started the Rust nightly install (in archlinux this is just a yaourt -S rust-nightly-bin && yaourt -S cargo-nightly-bin), read through the guide and started hacking a small project to get into the language a little bit more: a parallel grep.

First steps were pretty easy, creating new projects and building them is, thanks to cargo, just a command away and the guide is pretty easy to understand. My basic concept was to create a std::sync::Future (Rust's futures implementation) for each file we have to search through and return the results as an array to the main thread when finished.

The problems began when I tried to find out how to work with the standard library: the API documentation is pretty unusable. Try to find out how to search for a substring in a string or how to open a file. I wasn't able to find them by myself, I always had to do a web search. This has to get better!

The first „woah!!“ occured when I tried to put a std::sync::Future into a vector. I definitely have to get used to type inference, I tried to use a type annotation for the vector since I didn't think that the compiler is able to get the type right. Thus I've been reading futures.rs to get the type annotation right: Vec<Future<_>>. The underscore seems to tell the type checker that we don't care about the exact future type, it is just some future. Man, that is really cool stuff! Then (after I spent an hour to get the type right) I tried to simply leave out the type - and it works! This is really amazing.

Another problem occured when I tried to iterate over the vector to gather the results. The default way to iterate over a vector in Rust is using iterators:

for result in results.iter() {
    let lines = result.get();
    for line in lines.iter() {
        print!("{}", line);
    }
}

results is the Vec<Future<_>>. The code above leads to this error message:

/home/ckruse/dev/ngrep/src/main.rs:48:21: 48:27 error: cannot borrow immutable dereference of `&`-pointer `*result` as mutable
/home/ckruse/dev/ngrep/src/main.rs:48         let lines = result.get();
                                                             ^~~~~~
error: aborting due to previous error
Could not compile `ngrep`.

To learn more, run the command again with --verbose.

This error message is not that helpful for beginners, but after reading the guide again I was able to track down the problem: the vector iterator returns an immutable value, but the .get() method needs the std::sync::Future to be mutable. So after searching the web again I was able to find a solution: I iterate over the array the old way with a counter and call .get_mut() which returns the element mutable:

for i in range(0, results.len()) {
    let mut result = match results.get_mut(i) {
        Some(v) => v,
        None => panic!("error! no value!")
    };

    let lines = result.get();
    for line in lines.iter() {
        print!("{}", line);
    }
}

This works, but is ugly. Maybe a reader knows a better solution?

The rest of the code is pretty straight forward:

use std::os;
use std::sync::Future;
use std::io::BufferedReader;
use std::io::File;

fn main() {
    let args = os::args();
    let mut results = vec![];

    for file in args.slice(2, args.len()).iter() {
        let path = Path::new(file);
        let pattern = args[1].clone();

        let delayed_value = Future::spawn(move || -> Vec<String> {
            let mut retval: Vec<String> = vec![];

            let fd = match File::open(&path) {
                Ok(f) => f,
                Err(e) => panic!("Couldn't nopen file: {}", e)
            };

            let mut file = BufferedReader::new(fd);
            let mut lineno : uint = 0;

            for maybe_line in file.lines() {
                let line = match maybe_line {
                    Ok(s) => s,
                    Err(e) => panic!("error reading line: {}", e)
                };

                if line.contains(pattern.as_slice()) {
                    let s = path.as_str().unwrap();
                    retval.push(format!("{}:{}: {}", s, lineno, line));
                }

                lineno += 1;
            }

            retval
        });

        results.push(delayed_value);
    }

    for i in range(0, results.len()) {
        let mut result = match results.get_mut(i) {
            Some(v) => v,
            None => panic!("error! no value!")
        };

        let lines = result.get();
        for line in lines.iter() {
            print!("{}", line);
        }
    }
}

Another really cool feature are the valued enums. Let's have a look at this:

let mut result = match results.get_mut(i) {
    Some(v) => v,
    None => panic!("error! no value!")
};

The return value of get_mut() is just an enum, but in Rust enums may contain additionally a value. This leads to constructs like above. The match keyword introduces a pattern matching construct and we basically say „when the return value of get_mut() is Some give me the value into the variable v and return it; if it is None simply panic out.“ This rocks! Now we can forget about all this int somefunc(char *real_retval) bullshit from C. We can properly distinguish between error cases and real return values. Yay!

Another interesting point was that I tried to use the pattern variable directly in the closure. The compiler did forbid that with some remarks about the lifetime of the variable and I didn't understand what exactly it tried to say. After a web search I found this: everything is fine during the first loop. The closure takes over the pattern variable and we can use it. But after the first loop (well, to be exact: after the first closure finished) it would clean up after itself and free the memory. Thus we would access invalid memory, potentially leading to a crash. This has been caught by the compiler! Wow, great! So the solution was to create a copy for each loop. Finished.

All in all I really like that language. The functional elements (immutable variables by default, pattern matching, tuples, etc, pp) are integrated well and fit into the flow. Paralellism is easy and remindes me of Erlang. It is really hard to introduce bugs or crashes, the compiler detects a lot of potential and real problems. Of course there is still work to do (some error messages are hard to understand, the API documentation is really bad), but I could definitely imagine that this will be my first-choice language in future.