
LizardKMD
🙏 2 karma
2024-08-06
PDF to Markdown
2
votes
1 answer
What’s the best doc parser you’ve used so far for extracting PDF text to Markdown? PDFs are tricky for LLMs. LLMs tend to hallucinate and produce incorrect results for table data extraction.
How do you guys solve it?
Lincoln
Oct 21, 2024
This appears to be a direct solution: https://github.com/getomni-ai/zerox Other than that, for OCR in general, Claude is the best. It follows instructions for parsing the data much better than the others. For PDF processing specifically, Gemini through AI studio directly processes the visual content of PDFs, not just the text. To use Claude on a PDF with tables, you would have to first convert the PDF to images.
Post