Swift 4 正則表達式的使用以及案例構造

正則表達式是一個強大的匹配功能,支持 C、python 等多種語言,新興時尚的 Swift,當然也少不了它。學習完本教程,您將感受到正則表達式賦予程序使用者的強大能力。

本教程首先介紹了 Swift 中各種匹配模式的使用,輔之以各色實例;然后講解 NSRegularExpression,即我們所要使用的蘋果提供的類;最后用一個比較復雜的實例挽總。本教程內容不光涉及正則表達式,也包括錯誤處理、閉包使用、文檔讀取與寫入等,如有疏漏乃至謬誤,請讀者不吝賜教。

Part One —— Swift 正則表達式

正則表達式說來也很簡單:給定一個 pattern (匹配模式,String 類型),看被檢測的對象 String 是否滿足這個 pattern,如果滿足了,你可以獲得對應的部分。

例如:apple是一個 pattern,它能夠匹配 apple treeI love apples.這樣的 String,獲得的結果都是 apple

除此之外,正則表達式支持特定符號代表的省略的值,例如:d.g可以匹配dogdigdag等等 String,這就讓正則的功能變得強大起來。

這些 pattern 有一套自己的規則,該規則是一般的語言所通用的,不同語言可能有部分微調。pattern 包括普通字符(例如,a 到 z 之間的字母)和特殊字符(稱為”元字符”)。下表列出了所有 Swift 下的元字符(metacharacters)中的字符表達式,來自官方文檔。

字符表達式 描述 注釋
\a Match a BELL, \u0007
\A Match at the beginning of the input. Differs from ^ in that \A will not match after a new line within the input. 始終匹配輸入的開端,不會 因為類型為 anchorsMatchLines 而改變,這是與^不同的地方。
\b, outside of a [Set] Match if the current position is a word boundary. Boundaries occur at the transitions between word (\w) and non-word (\W) characters, with combining marks ignored. 連字符不是字符邊界
\b, within a [Set] Match a BACKSPACE, \u0008. 退格鍵
\B Match if the current position is not a word boundary.
\cX Match a control-X character
\d Match any character with the Unicode General Category of Nd (Number, Decimal Digit.) 匹配數字,包括 Unicode 中的各種數字寫法。
\D Match any character that is not a decimal digit.
\e Match an ESCAPE, \u001B.
\E Terminates a \Q ... \E quoted sequence.
\f Match a FORM FEED, \u000C. 換頁符
\G Match if the current position is at the end of the previous match.
\n Match a LINE FEED, \u000A. 換行符
\N{UNICODE CHARACTER NAME} Match the named character.
\p{UNICODE PROPERTY NAME} Match any character with the specified Unicode Property. 所有的 Unicode Property 可以點擊查看
\P{UNICODE PROPERTY NAME} Match any character not having the specified Unicode Property.
\Q Quotes all following characters until \E.
\r Match a CARRIAGE RETURN, \u000D. 回車鍵
\s Match a white space character. White space is defined as [\t\n\f\r\p{Z}]. p{Z}包括 Unicode 行分隔、段落分隔、空格等,點擊查看
\S Match a non-white space character.
\t Match a HORIZONTAL TABULATION, \u0009. 水平制表
\uhhhh Match the character with the hex value hhhh.
\Uhhhhhhhh Match the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is \U0010ffff. 必須提供32位的 Unicode
\w Match a word character. Word characters are [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}].
\W Match a non-word character.
\x{hhhh} Match the character with hex value hhhh. From one to six hex digits may be supplied.
\xhh Match the character with two digit hex value hh.
\X Match a Grapheme Cluster. 字形簇
\Z Match if the current position is at the end of input, but before the final line terminator, if one exists.
\z Match if the current position is at the end of input.
\n Back Reference. Match whatever the nth capturing group matched. n must be a number ≥ 1 and ≤ total number of capture groups in the pattern. n 是一個數字,對應著第幾個子表達式
\0ooo Match an Octal character. ooo is from one to three octal digits. 0377 is the largest allowed Octal character. The leading zero is required; it distinguishes Octal constants from back references.
[pattern] Match any one character from the pattern. 中括號代表只匹配其中之一
. Match any character. 如果類型為 dotMatchesLineSeparators,則可以匹配換行符,否則不能匹配
^ Match at the beginning of a line.
$ Match at the end of a line.
\ Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ . /

下表列出了所有 Swift 下的元字符中的運算符。

運算符 描述 注釋
| Alternation. A|B matches either A or B.
* Match 0 or more times. Match as many times as possible.
+ Match 1 or more times. Match as many times as possible.
? Match zero or one times. Prefer one.
{n} Match exactly n times.
{n,} Match at least n times. Match as many times as possible.
{n,m} Match between n and m times. Match as many times as possible, but not more than m.
*? Match 0 or more times. Match as few times as possible.
+? Match 1 or more times. Match as few times as possible.
?? Match zero or one times. Prefer zero.
{n}? Match exactly n times.
{n,}? Match at least n times, but no more than required for an overall pattern match.
{n,m}? Match between n and m times. Match as few times as possible, but not less than n.
*+ Match 0 or more times. Match as many times as possible when first encountered, do not retry with fewer even if overall match fails (Possessive Match).
++ Match 1 or more times. Possessive match.
?+ Match zero or one times. Possessive match.
{n}+ Match exactly n times.
{n,}+ Match at least n times. Possessive Match.
{n,m}+ Match between n and m times. Possessive Match.
(...) Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match.
(?:...) Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses.
(?>...) Atomic-match parentheses. First match of the parenthesized subexpression is the only one tried; if it does not lead to an overall pattern match, back up the search for a match to a position before the "(?>"
(?# ... ) Free-format comment (?# comment ).
(?= ... ) Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position.
(?! ... ) Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position.
(?<= ... ) Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
(?<! ... ) Negative Look-behind assertion. True if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
(?ismwx-ismwx:... ) Flag settings. Evaluate the parenthesized expression with the specified flags enabled or -disabled. The flags are defined in Flag Options.
(?ismwx-ismwx) Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match.The flags are defined in Flag Options.

如果不想為了英語文檔而傷腦筋,推薦查看菜鳥教程之正則表達式來入門,但如果要更好的學習 Swift 正則,官方的文檔需要參考。

Part Two —— NSRegularExpression 類

不如用一個實例來說明。現在給出一個 String

let sentence = "I'd like to follow my fellow to the fallow to see a hallow harrow."

do {
    // [a-z] 表明該字母可以是a-z中的任意一個
    let regex = try NSRegularExpression(pattern: "f[a-z]llow", options: [])
    
    // matches 的類型是 NSTextCheckingResult 的數組
    let matches = regex.matches(in: sentence, options: [], range: NSRange(location: 0, length: sentence.count))
    print("\(matches.count) matches.")
    
} catch {
    print(error.localizedDescription)
}

結果如下:

3 matches.

而如何獲得 matches 中的具體匹配上的字符串呢?調用 NSTextCheckingResult 的 range 屬性,將這一范圍還原到原來的 sentence 中就可以了。

...
let matches = ...
print(...)
for (i, match) in matches.enumerated() {
        let substring = (sentence as NSString).substring(with: match.range)
        print("\(i) is " + substring + ".")
}
...

結果如下:

3 matches.
0 is follow.
1 is fellow.
2 is fallow.

還可以使用閉包來進行遍歷:

// 直接對每一個 match 進行處理
regex.enumerateMatches(in: sentence, options: [], range: NSRange(location: 0, length: sentence.count), using: { result, _, _ in
        guard let result = result else { return }
        let substring = (sentence as NSString).substring(with: result.range)
        print(substring)
})

結果如下:

follow
fellow
fallow
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • 一、正則表達式的用途(搜索和替換) 1.1.正則表達式(regular expression,簡稱regex)是一...
    IIronMan閱讀 10,148評論 0 14
  • 1、通過CocoaPods安裝項目名稱項目信息 AFNetworking網絡請求組件 FMDB本地數據庫組件 SD...
    陽明AGI閱讀 16,018評論 3 119
  • python的re模塊--細說正則表達式 可能是東半球最詳細最全面的re教程,翻譯自官方文檔,因為官方文檔寫的是真...
    立而人閱讀 23,003評論 4 46
  • iOS中使用正則表達式就不得不提NSRegularExpression,所以我們需要先搞清楚什么是NSRegula...
    sunmumu1222閱讀 2,404評論 0 4
  • 我為什么喜歡創業,因為我渴望財富,渴望掌控。沒有什么是比賺錢和運籌帷幄更讓我感到亢奮的了。 這...
    成都三味民宿閱讀 843評論 0 5