How Strings and Substrings work in Swift

How Strings and Substrings work in Swift

This is a synthesis of two answers I wrote on Stack Overflow (here and here).

Image for post

All of the following examples use

var str = “Hello, playground”

Strings and substrings

Strings have changed a lot over the Swift versions. As of Swift 4, when you get some substring from a String, you get a Substring type back rather than a String. Why is this? Strings are value types in Swift. That means if you use one String to make a new one, then it has to be copied over. This is good for stability (no one else is going to change it without your knowledge) but bad for efficiency.

A Substring, on the other hand, is a reference back to the original String from which it came. No copying is needed so it is much more efficient to use. However, imagine you got a ten character Substring from a million character String. Because the Substring is referencing the String, the system would have to hold on to the entire String for as long as the Substring was around. Thus, whenever you are done manipulating your Substring, convert it to a String.

let myString = String(mySubstring)

This will copy just the substring over and the old String can be garbage collected. Substrings (as a type) are meant to be short lived.

Another big improvement in Swift 4 is that Strings are Collections (again). That means that whatever you can do to a Collection, you can do to a String (use subscripts, iterate over the characters, filter, etc).

String.Index

Before we look more at substrings, it would be helpful to understand how String indexing works for the characters that make up strings.

startIndex and endIndex

  • startIndex is the index of the first character
  • endIndex is the index after the last character.

Example

var str = “Hello, playground”// characterstr[str.startIndex] // Hstr[str.endIndex] // error: after last character// rangelet range = str.startIndex..<str.endIndexstr[range] // “Hello, playground”

With Swift 4?s one-sided ranges, the range can be simplified to one of the following forms.

let range = str.startIndex…let range = ..<str.endIndex

I will use the full form in the follow examples for the sake of clarity, but for the sake of readability, you will probably want to use the one-sided ranges in your code.

after

As in: index(after: String.Index)

  • after refers to the index of the character directly after the given index.

Examples

// characterlet index = str.index(after: str.startIndex)str[index] // “e”// rangelet range = str.index(after: str.startIndex)..<str.endIndexstr[range] // “ello, playground”

before

As in: index(before: String.Index)

  • before refers to the index of the character directly before the given index.

Examples

// characterlet index = str.index(before: str.endIndex)str[index] // d// rangelet range = str.startIndex..<str.index(before: str.endIndex)str[range] // Hello, playgroun

offsetBy

As in: index(String.Index, offsetBy: String.IndexDistance)

  • The offsetBy value can be positive or negative and starts from the given index. Although it is of the type String.IndexDistance, you can give it an Int.

Examples

// characterlet index = str.index(str.startIndex, offsetBy: 7)str[index] // p// rangelet start = str.index(str.startIndex, offsetBy: 7)let end = str.index(str.endIndex, offsetBy: -6)let range = start..<endstr[range] // play

limitedBy

As in: index(String.Index, offsetBy: String.IndexDistance, limitedBy: String.Index)

  • The limitedBy is useful for making sure that the offset does not cause the index to go out of bounds. It is a bounding index. Since it is possible for the offset to exceed the limit, this method returns an Optional. It returns nil if the index is out of bounds.

Example

// characterif let index = str.index(str.startIndex, offsetBy: 7, limitedBy: str.endIndex) { str[index] // p}

If the offset had been 77 instead of 7, then the if statement would have been skipped.

Getting substrings

You can get a substring from a string by using subscripts or a number of other methods (for example, prefix, suffix, split). You still need to use String.Index and not an Int index for the range, though.

Beginning of a string

You can use a subscript (note the Swift 4 one-sided range):

let index = str.index(str.startIndex, offsetBy: 5)let mySubstring = str[..<index] // Hello

or prefix:

let index = str.index(str.startIndex, offsetBy: 5)let mySubstring = str.prefix(upTo: index) // Hello

or even easier:

let mySubstring = str.prefix(5) // Hello

End of a string

Using subscripts:

let index = str.index(str.endIndex, offsetBy: -10)let mySubstring = str[index…] // playground

or suffix:

let index = str.index(str.endIndex, offsetBy: -10)let mySubstring = str.suffix(from: index) // playground

or even easier:

let mySubstring = str.suffix(10) // playground

Note that when using the suffix(from: index) I had to count back from the end by using -10. That is not necessary when just using suffix(x), which just takes the last x characters of a String.

Range in a string

Again we simply use subscripts here.

let start = str.index(str.startIndex, offsetBy: 7)let end = str.index(str.endIndex, offsetBy: -6)let range = start..<endlet mySubstring = str[range] // play

Converting Substring to String

Don?t forget, when you are ready to save your substring, you should convert it to a String so that the old string’s memory can be cleaned up.

let myString = String(mySubstring)

Using an Int index extension

It would be much easier to use an Int index for Strings. And it actually is possible to hide the complexity of String indexing by using an Int based extension. However, I?m hesitant to do that after reading the article Strings in Swift 3 by Airspeed Velocity and Ole Begemann. Also, the Swift team purposely hasn’t used Int indexes. It is still String.Index. The reason for this is that Characters in Swift are not all the same length under the hood. A single Swift Character might be composed of one, two, or even more Unicode code points. Thus each unique String must calculate the indexes of its Characters.

I have to say, I hope the Swift team finds a way to abstract away String.Index in the future. But until them I am choosing to use their API. It helps me to remember that String manipulations are not just simple Int index lookups.

13

No Responses

Write a response