Ed Johnson-Williams

A better way of removing punctuation from a string in Python

Published:

Tags: Python

This post is as a future reminder for me as much as anything.

I made a Python program called game-image-resizer a few months ago. It takes a list of board games, finds each board game on BoardGameGeek’s API, downloads the best image for each game, does some resizing and editing of the image, and then saves it using a useful filename.

That final stage of saving as a useful filename meant taking the board game name, making it lower case, removing punctuation, and replacing spaces with underscores.

I did it like this – roughly using the information in this StackOverflow discussion.


from string import punctuation

# making string lower case
working_string = working_string.lower()

# removing punctuation
remove_punctuation = str.maketrans('', '', punctuation)
working_string = working_string.translate(remove_punctuation)

# replacing spaces and double-spaces with an underscore
working_string = working_string.replace("  ", "_")
working_string = working_string.replace(" ", "_")

I found a better, easier-to-use way of doing this earlier this morning on a Reddit post in /r/Python.

It uses Inflection – a “string transformation library”. Inflection does all sorts of things including inflection.parameterize(). Parameterize “replace[s] special characters in a string so that it may be used as part of a ‘pretty’ URL.”

This means I can now do the following which is a much nicer-to-read and nicer-to-write solution.


from inflection import parameterize

# Example board game names with upper case, punctuation, and non-ASCII characters
board_game_names = [
    "Dawn of the Zeds (Third edition)",
    "Flash Point: Fire Rescue – Honor & Duty",
    "Orléans",
    "Mechs vs. Minions",
    "Tzolk'in: The Mayan Calendar",
    "T.I.M.E Stories",
    "Aeon's End",
]

for name in board_game_names:
    parameterized_name = parameterize(name, separator="_") # Default is `separator='-'`
    print(parameterized_name) # Or whatever I want to do with it


Output

dawn_of_the_zeds_third_edition
flash_point_fire_rescue_honor_duty
orleans
mechs_vs_minions
tzolk_in_the_mayan_calendar
t_i_m_e_stories
aeon_s_end

Parameterize mostly just uses some regular expressions but it’s very useful. It has the effect of:

  1. Replacing non-ASCII characters with an ASCII approximation – using inflection.transliterate()
  2. Replacing any character with the separator if it isn’t one of:
    • a-z
    • A-Z
    • 0-9
    • a hyphen (-)
    • an underscore(_)
  3. Ensuring there is never more than one separator in a row
  4. Removing separators from the start or end of the string
  5. Making the string lower case