Python magic to rescue

Rajesh Kanade
May 23, 2021

--

Recently had a need to know the file type ( pdf , docx, doc ) for a downloaded file from a URL. Using requests library, it is easy to download the file given a URL and save it in a temp file say ‘xyz’. Challenge is now to determine the file type ( say pdf, docx or doc ) and rename the file from ‘xyz’ to ‘xyz.pdf’ or ‘xyz.docx’.

One approach was to derive this information from the ‘content-disposition’ header . For some reason, URL used for testing, content-disposition was always set to None.

After looking around for alternatives, came across magic library that returns mimetype for a given file. You can use this mimetype to rename the downloaded temp file to one with correct extension ( pdf, doc, docx ).

--

--

Rajesh Kanade
Rajesh Kanade

Written by Rajesh Kanade

Rajesh Kanade, Founder, Grey Neurons Consulting Linked In : https://www.linkedin.com/in/rkanade/

No responses yet