c# .net Adsense ADO.NET Linq Viruses/security asp.net MVC JQuery Angular-js Node-js SEO Java C++ SQL API Networking vb.net .Net Css JavaScript Generics c#.Net entity framework HTML Website host Website Construction Guide HTTP tutorial W3C tutorial Web Services JSON Psychology Ionic framework Angular ReactJS Python Computer Android
Python

Remove all special characters, punctuation except spaces from string - Python

| | python

In this tutorial I will show you how to remove all special characters, punctuation except spaces from string in Python.

The following program is to extract data from a URL using beautifulsoup package. If the title tag contain special characters then I want to remove it. 

CODE:

import string

from docx import Document
from bs4 import BeautifulSoup
import urllib.request

def remove_symbols(title):
trans = str.maketrans("", "", string.punctuation)
cleaned_title = title.translate(trans)
return cleaned_title

hdr = {"User-Agent": "My Agent"}

request = urllib.request.Request(url = 'https://tensix.com/oracle-bi-publisher-installation-error-inst-05058-a-lookup-of-the-address-for-this-machine/',
headers=hdr)
f = urllib.request.urlopen(request)
myfile = f.read()

soup = BeautifulSoup(myfile, 'html.parser')
title = soup.title.text.strip()
doc = Document()
doc.add_heading(title, 1)
cleaned_title = remove_symbols(title)

print(cleaned_title)

But above code not removing full stops & numbers. I m going to use Regx to remove unwanted Characters.

CODE:

def remove_symbols(title):

for k in title.split("\n"):
return re.sub(r"[^a-zA-Z0-9]+", ' ', k)