Good use of regular expression, remove spaces between punctuation and words, normalize texts automatically


Regular expression (shortened as regex) is very powerful way to improve efficiency and change formats for us.
Here are example how we can leverage grouping in regrex, and remove extra spaces between punctuation and chaters in fron of it.
Thus we can normalize format errors in texts automatically.

Engish sentence example,remove spaces between punctuation and words

import re
sent = "What a wonderful day ,I want to go out and have a walk !"
sent=re.sub(r'\s+([?,.!;"])', r'\1', sent)

print(sent)
What a wonderful day,I want to go out and have a walk!

In the above regrex rule definition: r’\s+([?,.!;”])’, the parentheses define a group (the first group), the square brackets in the group represent all the punctuation marks we need to distinguish by regex, and \s+ represents multiple spaces. So the whole rules means we replace the format of spaces + group with group only, so it automatically removes any extra spaces before the punctuations.

Chinese sentence example,remove spaces between punctuation and words

import re
sent = "天气真好 ! 我要出去散步 。"
sent=re.sub(r'\s+([?,.!;"。,])', r'\1', sent)

print(sent)
天气真好! 我要出去散步。

Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC