I got error message while running this following code to read the pdf file and extract text from it.
CODE:
import PyPDF2
pdfFileObject = open('pythonhelp.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
page=pdfReader.getPage(0)
print(page.extract_text())
ERROR MESSAGE:
Superfluous whitespace found in object header b'1' b'0'
Superfluous whitespace found in object header b'2' b'0'
Superfluous whitespace found in object header b'3' b'0'
Superfluous whitespace found in object header b'54' b'0'
Superfluous whitespace found in object header b'65' b'0'
Superfluous whitespace found in object header b'68' b'0'
Superfluous whitespace found in object header b'53' b'0'
Superfluous whitespace found in object header b'15' b'0'
Superfluous whitespace found in object header b'14' b'0'
Superfluous whitespace found in object header b'13' b'0'
Superfluous whitespace found in object header b'23' b'0'
Superfluous whitespace found in object header b'22' b'0'
Superfluous whitespace found in object header b'21' b'0'
Superfluous whitespace found in object header b'31' b'0'
Superfluous whitespace found in object header b'30' b'0'
Superfluous whitespace found in object header b'29' b'0'
Superfluous whitespace found in object header b'39' b'0'
Superfluous whitespace found in object header b'38' b'0'
Superfluous whitespace found in object header b'37' b'0'
Superfluous whitespace found in object header b'51' b'0'
Superfluous whitespace found in object header b'50' b'0'
Superfluous whitespace found in object header b'49' b'0'
Superfluous whitespace found in object header b'52' b'0'
SOLUTION:
Add parameter "strict=false". this resolves my problem.
pdfReader = PyPDF2.PdfFileReader(pdfFileObject, strict=False)
VIDEO GUIDE:
Post your comments / questions
Recent Article
- How to Enable Virtualization in BIOS Security Settings in Intel Processors For Android Studio?
- Dependency 'androidx.activity:activity:1.8.0' requires libraries and applications that depend on it.
- AttributeError: 'NoneType' object has no attribute 'get_text' - Python
- ModuleNotFoundError: No module named 'openpyxl' - Python
- How to get thumbnail from vimeo video URL in Python?
- Remove all special characters, punctuation except spaces from string - Python
- OSError: cannot write mode RGBA as JPEG- Python
- expected str, bytes or os.PathLike object, not JpegImageFile - Python
Related Article