Python magic to rescue

May 23, 2021

Recently had a need to know the file type ( pdf , docx, doc ) for a downloaded file from a URL. Using requests library, it is easy to download the file given a URL and save it in a temp file say ‘xyz’. Challenge is now to determine the file type ( say pdf, docx or doc ) and rename the file from ‘xyz’ to ‘xyz.pdf’ or ‘xyz.docx’.

One approach was to derive this information from the ‘content-disposition’ header . For some reason, URL used for testing, content-disposition was always set to None.

After looking around for alternatives, came across magic library that returns mimetype for a given file. You can use this mimetype to rename the downloaded temp file to one with correct extension ( pdf, doc, docx ).

Python magic to rescue

Written by Rajesh Kanade

No responses yet