Skip to content

Get PDF Text Layer

This operation extracts the text layer from the PDF document provided.

POST /api/v1/Pdf/GetPdfTextLayer
NameTypeRequiredDefaultDescription
FilenamestringYes-Filename of the source PDF file
FileContentstringYes-Base64-encoded PDF content
StartPageintegerNo1Page number from which to begin text extraction
EndPageintegerNoLast pagePage number on which to end text extraction
PagesstringNo-Comma separated list of pages or page ranges (e.g., “1,3,5-7”)
TextEncodingTypestringNo”UTF8”Encoding type used for text extraction. Options: “UTF8”, “Latin1”, “BigEndianUnicode”, “UTF16”, “ASCII”
NameTypeDescription
textLayerstringThe text layer extracted from the PDF document

The operation extracts text content from the specified pages of a PDF document. It processes the document and identifies the text layer, which contains machine-readable text.

  • If both page ranges and specific pages are provided, they will be combined
  • The operation handles PDF files with embedded text layers
  • For scanned documents without a text layer, OCR processing would be required (not part of this operation)
  • The encoding type parameter allows handling different character encodings in the PDF

Credit Cost

Cost: 1 credit(s) per 5 pages

Note: Cost depends on the number of pages in the document