Exploring Rust String Concepts

Introduction

In this lab, we will explore the concept of strings in Rust. Rust has two types of strings: String and &str.

A String is a heap-allocated, growable string that is guaranteed to be a valid UTF-8 sequence. On the other hand, &str is a slice that points to a valid UTF-8 sequence and can be used to view into a String.

In Rust, string literals can be written in different ways, including using escapes to represent special characters. For example, \x3F represents the question mark character and \u{211D} represents a Unicode code point. Raw string literals can also be used if you want to write a string as-is without escapes.

If you need to work with byte strings, Rust provides byte string literals using the b prefix. Byte strings can have byte escapes, but not Unicode escapes. Raw byte strings can also be used in a similar fashion to raw string literals.

It's important to note that str and String must always be valid UTF-8 sequences. If you need to work with strings in different encodings, you can use external crates like encoding for conversions between character encodings.

Note: If the lab does not specify a file name, you can use any file name you want. For example, you can use main.rs, compile and run it with rustc main.rs && ./main.

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL rust(("`Rust`")) -.-> rust/BasicConceptsGroup(["`Basic Concepts`"]) rust(("`Rust`")) -.-> rust/DataTypesGroup(["`Data Types`"]) rust(("`Rust`")) -.-> rust/ControlStructuresGroup(["`Control Structures`"]) rust(("`Rust`")) -.-> rust/FunctionsandClosuresGroup(["`Functions and Closures`"]) rust(("`Rust`")) -.-> rust/MemorySafetyandManagementGroup(["`Memory Safety and Management`"]) rust(("`Rust`")) -.-> rust/DataStructuresandEnumsGroup(["`Data Structures and Enums`"]) rust(("`Rust`")) -.-> rust/AdvancedTopicsGroup(["`Advanced Topics`"]) rust/BasicConceptsGroup -.-> rust/variable_declarations("`Variable Declarations`") rust/BasicConceptsGroup -.-> rust/mutable_variables("`Mutable Variables`") rust/DataTypesGroup -.-> rust/integer_types("`Integer Types`") rust/DataTypesGroup -.-> rust/string_type("`String Type`") rust/DataTypesGroup -.-> rust/type_casting("`Type Conversion and Casting`") rust/ControlStructuresGroup -.-> rust/for_loop("`for Loop`") rust/FunctionsandClosuresGroup -.-> rust/function_syntax("`Function Syntax`") rust/FunctionsandClosuresGroup -.-> rust/expressions_statements("`Expressions and Statements`") rust/MemorySafetyandManagementGroup -.-> rust/lifetime_specifiers("`Lifetime Specifiers`") rust/DataStructuresandEnumsGroup -.-> rust/method_syntax("`Method Syntax`") rust/AdvancedTopicsGroup -.-> rust/operator_overloading("`Traits for Operator Overloading`") subgraph Lab Skills rust/variable_declarations -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/mutable_variables -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/integer_types -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/string_type -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/type_casting -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/for_loop -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/function_syntax -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/expressions_statements -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/lifetime_specifiers -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/method_syntax -.-> lab-99254{{"`Exploring Rust String Concepts`"}} rust/operator_overloading -.-> lab-99254{{"`Exploring Rust String Concepts`"}} end

Strings

There are two types of strings in Rust: String and &str.

A String is stored as a vector of bytes (Vec<u8>), but guaranteed to always be a valid UTF-8 sequence. String is heap allocated, growable and not null terminated.

&str is a slice (&[u8]) that always points to a valid UTF-8 sequence, and can be used to view into a String, just like &[T] is a view into Vec<T>.

fn main() {
    // (all the type annotations are superfluous)
    // A reference to a string allocated in read only memory
    let pangram: &'static str = "the quick brown fox jumps over the lazy dog";
    println!("Pangram: {}", pangram);

    // Iterate over words in reverse, no new string is allocated
    println!("Words in reverse");
    for word in pangram.split_whitespace().rev() {
        println!("> {}", word);
    }

    // Copy chars into a vector, sort and remove duplicates
    let mut chars: Vec<char> = pangram.chars().collect();
    chars.sort();
    chars.dedup();

    // Create an empty and growable `String`
    let mut string = String::new();
    for c in chars {
        // Insert a char at the end of string
        string.push(c);
        // Insert a string at the end of string
        string.push_str(", ");
    }

    // The trimmed string is a slice to the original string, hence no new
    // allocation is performed
    let chars_to_trim: &[char] = &[' ', ','];
    let trimmed_str: &str = string.trim_matches(chars_to_trim);
    println!("Used characters: {}", trimmed_str);

    // Heap allocate a string
    let alice = String::from("I like dogs");
    // Allocate new memory and store the modified string there
    let bob: String = alice.replace("dog", "cat");

    println!("Alice says: {}", alice);
    println!("Bob says: {}", bob);
}

More str/String methods can be found under the std::str and std::string modules

Literals and escapes

There are multiple ways to write string literals with special characters in them. All result in a similar &str so it's best to use the form that is the most convenient to write. Similarly there are multiple ways to write byte string literals, which all result in &[u8; N].

Generally special characters are escaped with a backslash character: \. This way you can add any character to your string, even unprintable ones and ones that you don't know how to type. If you want a literal backslash, escape it with another one: \\

String or character literal delimiters occurring within a literal must be escaped: "\"", '\''.

fn main() {
    // You can use escapes to write bytes by their hexadecimal values...
    let byte_escape = "I'm writing \x52\x75\x73\x74!";
    println!("What are you doing\x3F (\\x3F means ?) {}", byte_escape);

    // ...or Unicode code points.
    let unicode_codepoint = "\u{211D}";
    let character_name = "\"DOUBLE-STRUCK CAPITAL R\"";

    println!("Unicode character {} (U+211D) is called {}",
                unicode_codepoint, character_name );


    let long_string = "String literals
                        can span multiple lines.
                        The linebreak and indentation here ->\
                        <- can be escaped too!";
    println!("{}", long_string);
}

Sometimes there are just too many characters that need to be escaped or it's just much more convenient to write a string out as-is. This is where raw string literals come into play.

fn main() {
    let raw_str = r"Escapes don't work here: \x3F \u{211D}";
    println!("{}", raw_str);

    // If you need quotes in a raw string, add a pair of #s
    let quotes = r#"And then I said: "There is no escape!""#;
    println!("{}", quotes);

    // If you need "## in your string, just use more #s in the delimiter.
    // You can use up to 65535 #s.
    let longer_delimiter = r###"A string with "## in it. And even "##!"###;
    println!("{}", longer_delimiter);
}

Want a string that's not UTF-8? (Remember, str and String must be valid UTF-8). Or maybe you want an array of bytes that's mostly text? Byte strings to the rescue!

use std::str;

fn main() {
    // Note that this is not actually a `&str`
    let bytestring: &[u8; 21] = b"this is a byte string";

    // Byte arrays don't have the `Display` trait, so printing them is a bit limited
    println!("A byte string: {:?}", bytestring);

    // Byte strings can have byte escapes...
    let escaped = b"\x52\x75\x73\x74 as bytes";
    // ...but no unicode escapes
    // let escaped = b"\u{211D} is not allowed";
    println!("Some escaped bytes: {:?}", escaped);


    // Raw byte strings work just like raw strings
    let raw_bytestring = br"\u{211D} is not escaped here";
    println!("{:?}", raw_bytestring);

    // Converting a byte array to `str` can fail
    if let Ok(my_str) = str::from_utf8(raw_bytestring) {
        println!("And the same as text: '{}'", my_str);
    }

    let _quotes = br#"You can also use "fancier" formatting, \
                    like with normal raw strings"#;

    // Byte strings don't have to be UTF-8
    let shift_jis = b"\x82\xe6\x82\xa8\x82\xb1\x82\xbb"; // "ようこそ" in SHIFT-JIS

    // But then they can't always be converted to `str`
    match str::from_utf8(shift_jis) {
        Ok(my_str) => println!("Conversion successful: '{}'", my_str),
        Err(e) => println!("Conversion failed: {:?}", e),
    };
}

For conversions between character encodings check out the encoding crate.

A more detailed listing of the ways to write string literals and escape characters is given in the 'Tokens' chapter of the Rust Reference.

Summary

Congratulations! You have completed the Strings lab. You can practice more labs in LabEx to improve your skills.