PDFMiner Python - Search News

SciDOCX: Scientific Document Conversion and MM-RAG Pipeline

PDF ├── DeepSeek-OCR │ ├── Markdown → Pandoc → DOCX (Document Conversion) │ └── JSONL Elements → Qwen2-VL-2B (MM-RAG Enrichment) └── Figures, Tables, Equations preserved throughout Need a retrieval ...

GitHub

A powerful PDF extraction library for Node.js built on Mozilla's pdf.js.

The JavaScript/TypeScript alternative to Python's pdfplumber - extract text, tables, graphics, and visual elements from PDF files with precision. If you're coming from Python and looking for ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

SciDOCX: Scientific Document Conversion and MM-RAG Pipeline

A powerful PDF extraction library for Node.js built on Mozilla's pdf.js.

Trending now