Regular expression (shortened as regex) is very powerful way to improve efficiency and change formats for us.
Here are example how we can leverage grouping in regrex, and remove extra spaces between punctuation and chaters in fron of it.
Thus we can normalize format errors in texts automatically.
Engish sentence example,remove spaces between punctuation and words
import re |
What a wonderful day,I want to go out and have a walk!
In the above regrex rule definition: r’\s+([?,.!;”])’, the parentheses define a group (the first group), the square brackets in the group represent all the punctuation marks we need to distinguish by regex, and \s+ represents multiple spaces. So the whole rules means we replace the format of spaces + group with group only, so it automatically removes any extra spaces before the punctuations.
Chinese sentence example,remove spaces between punctuation and words
import re |
天气真好! 我要出去散步。