pd3f

Aug 2020 preview pd3f

Going beyond PDF with pd3f.

pd3f is an Open-source PDF text extraction pipeline that is self-hosted, local-first and Docker-based. pd3f reconstructs the original continuous text with the help of machine learning.

Visit the main website for more information pd3f.com

The work was funded by the German Federal Ministry of Education and Research as part of the Prototype Fund.