Swift 4 正則表達式的使用以及案例構造

正則表達式是一個強大的匹配功能,支持 C、python 等多種語言,新興時尚的 Swift,當然也少不了它。學習完本教程,您將感受到正則表達式賦予程序使用者的強大能力。

本教程首先介紹了 Swift 中各種匹配模式的使用,輔之以各色實例;然后講解 NSRegularExpression,即我們所要使用的蘋果提供的類;最后用一個比較復雜的實例挽總。本教程內容不光涉及正則表達式,也包括錯誤處理、閉包使用、文檔讀取與寫入等,如有疏漏乃至謬誤,請讀者不吝賜教。

Part One —— Swift 正則表達式

正則表達式說來也很簡單:給定一個 pattern (匹配模式,String 類型),看被檢測的對象 String 是否滿足這個 pattern,如果滿足了,你可以獲得對應的部分。

例如:apple是一個 pattern,它能夠匹配 apple treeI love apples.這樣的 String,獲得的結果都是 apple

除此之外,正則表達式支持特定符號代表的省略的值,例如:d.g可以匹配dogdigdag等等 String,這就讓正則的功能變得強大起來。

這些 pattern 有一套自己的規則,該規則是一般的語言所通用的,不同語言可能有部分微調。pattern 包括普通字符(例如,a 到 z 之間的字母)和特殊字符(稱為”元字符”)。下表列出了所有 Swift 下的元字符(metacharacters)中的字符表達式,來自官方文檔。

字符表達式 描述 注釋
\a Match a BELL, \u0007
\A Match at the beginning of the input. Differs from ^ in that \A will not match after a new line within the input. 始終匹配輸入的開端,不會 因為類型為 anchorsMatchLines 而改變,這是與^不同的地方。
\b, outside of a [Set] Match if the current position is a word boundary. Boundaries occur at the transitions between word (\w) and non-word (\W) characters, with combining marks ignored. 連字符不是字符邊界
\b, within a [Set] Match a BACKSPACE, \u0008. 退格鍵
\B Match if the current position is not a word boundary.
\cX Match a control-X character
\d Match any character with the Unicode General Category of Nd (Number, Decimal Digit.) 匹配數字,包括 Unicode 中的各種數字寫法。
\D Match any character that is not a decimal digit.
\e Match an ESCAPE, \u001B.
\E Terminates a \Q ... \E quoted sequence.
\f Match a FORM FEED, \u000C. 換頁符
\G Match if the current position is at the end of the previous match.
\n Match a LINE FEED, \u000A. 換行符
\N{UNICODE CHARACTER NAME} Match the named character.
\p{UNICODE PROPERTY NAME} Match any character with the specified Unicode Property. 所有的 Unicode Property 可以點擊查看
\P{UNICODE PROPERTY NAME} Match any character not having the specified Unicode Property.
\Q Quotes all following characters until \E.
\r Match a CARRIAGE RETURN, \u000D. 回車鍵
\s Match a white space character. White space is defined as [\t\n\f\r\p{Z}]. p{Z}包括 Unicode 行分隔、段落分隔、空格等,點擊查看
\S Match a non-white space character.
\t Match a HORIZONTAL TABULATION, \u0009. 水平制表
\uhhhh Match the character with the hex value hhhh.
\Uhhhhhhhh Match the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is \U0010ffff. 必須提供32位的 Unicode
\w Match a word character. Word characters are [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}].
\W Match a non-word character.
\x{hhhh} Match the character with hex value hhhh. From one to six hex digits may be supplied.
\xhh Match the character with two digit hex value hh.
\X Match a Grapheme Cluster. 字形簇
\Z Match if the current position is at the end of input, but before the final line terminator, if one exists.
\z Match if the current position is at the end of input.
\n Back Reference. Match whatever the nth capturing group matched. n must be a number ≥ 1 and ≤ total number of capture groups in the pattern. n 是一個數字,對應著第幾個子表達式
\0ooo Match an Octal character. ooo is from one to three octal digits. 0377 is the largest allowed Octal character. The leading zero is required; it distinguishes Octal constants from back references.
[pattern] Match any one character from the pattern. 中括號代表只匹配其中之一
. Match any character. 如果類型為 dotMatchesLineSeparators,則可以匹配換行符,否則不能匹配
^ Match at the beginning of a line.
$ Match at the end of a line.
\ Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ . /

下表列出了所有 Swift 下的元字符中的運算符。

運算符 描述 注釋
| Alternation. A|B matches either A or B.
* Match 0 or more times. Match as many times as possible.
+ Match 1 or more times. Match as many times as possible.
? Match zero or one times. Prefer one.
{n} Match exactly n times.
{n,} Match at least n times. Match as many times as possible.
{n,m} Match between n and m times. Match as many times as possible, but not more than m.
*? Match 0 or more times. Match as few times as possible.
+? Match 1 or more times. Match as few times as possible.
?? Match zero or one times. Prefer zero.
{n}? Match exactly n times.
{n,}? Match at least n times, but no more than required for an overall pattern match.
{n,m}? Match between n and m times. Match as few times as possible, but not less than n.
*+ Match 0 or more times. Match as many times as possible when first encountered, do not retry with fewer even if overall match fails (Possessive Match).
++ Match 1 or more times. Possessive match.
?+ Match zero or one times. Possessive match.
{n}+ Match exactly n times.
{n,}+ Match at least n times. Possessive Match.
{n,m}+ Match between n and m times. Possessive Match.
(...) Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match.
(?:...) Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses.
(?>...) Atomic-match parentheses. First match of the parenthesized subexpression is the only one tried; if it does not lead to an overall pattern match, back up the search for a match to a position before the "(?>"
(?# ... ) Free-format comment (?# comment ).
(?= ... ) Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position.
(?! ... ) Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position.
(?<= ... ) Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
(?<! ... ) Negative Look-behind assertion. True if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)
(?ismwx-ismwx:... ) Flag settings. Evaluate the parenthesized expression with the specified flags enabled or -disabled. The flags are defined in Flag Options.
(?ismwx-ismwx) Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match.The flags are defined in Flag Options.

如果不想為了英語文檔而傷腦筋,推薦查看菜鳥教程之正則表達式來入門,但如果要更好的學習 Swift 正則,官方的文檔需要參考。

Part Two —— NSRegularExpression 類

不如用一個實例來說明。現在給出一個 String

let sentence = "I'd like to follow my fellow to the fallow to see a hallow harrow."

do {
    // [a-z] 表明該字母可以是a-z中的任意一個
    let regex = try NSRegularExpression(pattern: "f[a-z]llow", options: [])
    
    // matches 的類型是 NSTextCheckingResult 的數組
    let matches = regex.matches(in: sentence, options: [], range: NSRange(location: 0, length: sentence.count))
    print("\(matches.count) matches.")
    
} catch {
    print(error.localizedDescription)
}

結果如下:

3 matches.

而如何獲得 matches 中的具體匹配上的字符串呢?調用 NSTextCheckingResult 的 range 屬性,將這一范圍還原到原來的 sentence 中就可以了。

...
let matches = ...
print(...)
for (i, match) in matches.enumerated() {
        let substring = (sentence as NSString).substring(with: match.range)
        print("\(i) is " + substring + ".")
}
...

結果如下:

3 matches.
0 is follow.
1 is fellow.
2 is fallow.

還可以使用閉包來進行遍歷:

// 直接對每一個 match 進行處理
regex.enumerateMatches(in: sentence, options: [], range: NSRange(location: 0, length: sentence.count), using: { result, _, _ in
        guard let result = result else { return }
        let substring = (sentence as NSString).substring(with: result.range)
        print(substring)
})

結果如下:

follow
fellow
fallow
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 227,837評論 6 531
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 98,196評論 3 414
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 175,688評論 0 373
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 62,654評論 1 309
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 71,456評論 6 406
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 54,955評論 1 321
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,044評論 3 440
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,195評論 0 287
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 48,725評論 1 333
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 40,608評論 3 354
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 42,802評論 1 369
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 38,318評論 5 358
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,048評論 3 347
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 34,422評論 0 26
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 35,673評論 1 281
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 51,424評論 3 390
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 47,762評論 2 372

推薦閱讀更多精彩內容

  • 一、正則表達式的用途(搜索和替換) 1.1.正則表達式(regular expression,簡稱regex)是一...
    IIronMan閱讀 10,132評論 0 14
  • 1、通過CocoaPods安裝項目名稱項目信息 AFNetworking網絡請求組件 FMDB本地數據庫組件 SD...
    陽明先生_X自主閱讀 16,000評論 3 119
  • python的re模塊--細說正則表達式 可能是東半球最詳細最全面的re教程,翻譯自官方文檔,因為官方文檔寫的是真...
    立而人閱讀 22,927評論 4 46
  • iOS中使用正則表達式就不得不提NSRegularExpression,所以我們需要先搞清楚什么是NSRegula...
    sunmumu1222閱讀 2,363評論 0 4
  • 我為什么喜歡創業,因為我渴望財富,渴望掌控。沒有什么是比賺錢和運籌帷幄更讓我感到亢奮的了。 這...
    成都三味民宿閱讀 835評論 0 5