• Api Documentation
  • Api Documentation
  • pdftron.PDF
Show / Hide Table of Contents
  • pdftron
    • PDFNet
    • PDFNet.CharacterOrdering
    • PDFNet.CMSType
    • PDFNet.ConnectionErrorHandlingMode
    • PDFNet.ConnectionErrorProcDelegate
  • pdftron.Common
    • ByteRange
    • Matrix2D
    • PDFNetException
    • PDFNetException.ErrorCodes
    • ProgressMonitor
  • pdftron.Crypto
    • AlgorithmIdentifier
    • AlgorithmParams
    • DigestAlgorithm
    • DigestAlgorithm.Type
    • ObjectIdentifier
    • ObjectIdentifier.Predefined
    • RSASSAPSSParams
    • X501AttributeTypeAndValue
    • X501DistinguishedName
    • X509Certificate
    • X509Extension
  • pdftron.FDF
    • FDFDoc
    • FDFField
    • FDFFieldIterator
    • XFDFExportOptions
  • pdftron.Filters
    • ASCII85Encode
    • Filter
    • Filter.ReferencePos
    • FilterReader
    • FilterWriter
    • FlateEncode
    • MappedFile
    • MappedFile.OpenMode
    • MemoryFilter
    • StreamAdapterFilter
  • pdftron.Layout
    • ContentElement
    • ContentNode
    • ContentNodeIterator
    • FlowDocument
    • List
    • List.NumberFormat
    • ListItem
    • Paragraph
    • Paragraph.TextJustification
    • Table
    • TableCell
    • TableCell.CellAlignmentHorizontal
    • TableCell.CellAlignmentVertical
    • TableRow
    • TextRun
    • TextStyledElement
  • pdftron.PDF
    • Action
    • Action.FormActionFlag
    • Action.Type
    • ActionParameter
    • AdvancedImagingConvertOptions
    • AdvancedImagingModule
    • Annot
    • Annot.AnnotationState
    • Annot.BorderStyle
    • Annot.BorderStyle.Style
    • Annot.EventType
    • Annot.Flag
    • Annot.Type
    • BarcodeModule
    • BarcodeOptions
    • BarcodeOptions.BarcodeOrientation
    • BarcodeOptions.BarcodeProfile
    • BarcodeOptions.BarcodeTypeGroup
    • BarcodeOptions.OutputFormat
    • Bookmark
    • CADConvertOptions
    • CADConvertOptions.LayoutSortOrder
    • CADModule
    • CancelRequestRenderThread
    • CharData
    • CharIterator
    • CMSSignatureOptions
    • ColorPt
    • ColorSpace
    • ColorSpace.Type
    • ContentReplacer
    • ConversionOptions
    • Convert
    • Convert.EPUBOutputOptions
    • Convert.ExcelOutputOptions
    • Convert.ExcelOutputOptions.SearchableImageSetting
    • Convert.FlattenFlag
    • Convert.FlattenThresholdFlag
    • Convert.HTMLOutputOptions
    • Convert.HTMLOutputOptions.ContentReflowSetting
    • Convert.HTMLOutputOptions.SearchableImageSetting
    • Convert.OutputOptionsOCR
    • Convert.OutputOptionsOCR.LanguageChoice
    • Convert.OutputOptionsOCR.PreferredOCREngine
    • Convert.PowerPointOutputOptions
    • Convert.PowerPointOutputOptions.SearchableImageSetting
    • Convert.Printer
    • Convert.Printer.Mode
    • Convert.StructuredOutputOptions
    • Convert.StructuredOutputOptions.SectionConversionSetting
    • Convert.SVGOutputOptions
    • Convert.TiffOutputOptions
    • Convert.WordOutputOptions
    • Convert.WordOutputOptions.SearchableImageSetting
    • Convert.WordOutputOptions.WordOutputFormat
    • Convert.WPFConverterOptions
    • Convert.WPFConverterOptions.DrawHeaderFooter
    • Convert.XODOutputOptions
    • Convert.XODOutputOptions.AnnotationOutputFlag
    • Convert.XPSOutputCommonOptions
    • Convert.XPSOutputOptions
    • CubicCurveBuilder
    • DataExtractionModule
    • DataExtractionModule.DataExtractionEngine
    • DataExtractionOptions
    • Date
    • Destination
    • Destination.FitType
    • DiffOptions
    • DigitalSignatureField
    • DigitalSignatureField.DocumentPermissions
    • DigitalSignatureField.FieldPermissions
    • DigitalSignatureField.SubFilterType
    • DigitalSignatureFieldIterator
    • DisallowedChange
    • DisallowedChange.Type
    • DocumentConversion
    • DocumentConversionResult
    • DownloadedType
    • Element
    • Element.Type
    • ElementBuilder
    • ElementReader
    • ElementWriter
    • ElementWriter.WriteMode
    • EmbeddedTimestampVerificationResult
    • Field
    • Field.EventType
    • Field.Flag
    • Field.TextJustification
    • Field.Type
    • FieldIterator
    • FileSpec
    • Flattener
    • Flattener.FlattenMode
    • Flattener.Threshold
    • Font
    • Font.Encoding
    • Font.StandardType1Font
    • Font.Type
    • FontCharCodeIterator
    • Function
    • Function.Type
    • GeometryCollection
    • GeometryCollectionSnappingMode
    • GSChangesIterator
    • GState
    • GState.BlendMode
    • GState.GStateAttribute
    • GState.LineCap
    • GState.LineJoin
    • GState.RenderingIntent
    • GState.TextRenderingMode
    • Highlights
    • HTML2PDF
    • HTML2PDF.Proxy
    • HTML2PDF.Proxy.Type
    • HTML2PDF.TOCSettings
    • HTML2PDF.WebPageSettings
    • HTML2PDF.WebPageSettings.ErrorHandling
    • HTTPRequestOptions
    • Image
    • Image.InputFilter
    • Image2RGB
    • KeyStrokeActionResult
    • KeyStrokeEventData
    • MergeXFDFOptions
    • OCRModule
    • OCROptions
    • OfficeToPDFOptions
    • OfficeToPDFOptions.DisplayComments
    • OfficeToPDFOptions.StructureTagLevel
    • Optimizer
    • Optimizer.ImageSettings
    • Optimizer.ImageSettings.CompressionMode
    • Optimizer.ImageSettings.DownsampleMode
    • Optimizer.MonoImageSettings
    • Optimizer.MonoImageSettings.CompressionMode
    • Optimizer.MonoImageSettings.DownsampleMode
    • Optimizer.OptimizerSettings
    • Optimizer.TextSettings
    • OptionsBase
    • Page
    • Page.Box
    • Page.EventType
    • Page.Rotate
    • PageIterator
    • PageLabel
    • PageLabel.Style
    • PageSet
    • PageSet.Filter
    • PathData
    • PathData.PathSegmentType
    • PatternColor
    • PatternColor.TilingType
    • PatternColor.Type
    • PDF2HtmlReflowParagraphsModule
    • PDF2WordModule
    • PDFDC
    • PDFDCEX
    • PDFDoc
    • PDFDoc.EventType
    • PDFDoc.ExtractFlag
    • PDFDoc.InsertFlag
    • PDFDoc.SignaturesVerificationStatus
    • PDFDocInfo
    • PDFDocViewPrefs
    • PDFDocViewPrefs.PageLayout
    • PDFDocViewPrefs.PageMode
    • PDFDocViewPrefs.ViewerPref
    • PDFDraw
    • PDFNetInternalTools
    • PDFNetInternalToolsLogBackend
    • PDFNetInternalToolsLogLevel
    • PDFRasterizer
    • PDFRasterizer.ColorPostProcessMode
    • PDFRasterizer.OverprintPreviewMode
    • PDFRasterizer.Type
    • PDFViewAnnotationEditPermissionDelegate
    • PDFViewCtrl
    • PDFViewCtrl.LinkInfo
    • PDFViewCtrl.PagePresentationMode
    • PDFViewCtrl.PageViewMode
    • PDFViewCtrl.PanelType
    • PDFViewCtrl.PDFViewCtrlWindowType
    • PDFViewCtrl.PDFViewFindTextAsyncDelegate
    • PDFViewCtrl.PDFViewRenderWorkerDelegate
    • PDFViewCtrl.Selection
    • PDFViewCtrl.TextSelectionMode
    • PDFViewCtrl.ToolMode
    • PDFViewCurrentPageDelegate
    • PDFViewDownloadDelegate
    • PDFViewErrorDelegate
    • PDFViewThumbAsyncDelegate
    • PDFViewWPF
    • PDFViewWPF.ActionEventArgs
    • PDFViewWPF.CurrentPageNumberChangedHandler
    • PDFViewWPF.CurrentScrollChangedHandler
    • PDFViewWPF.CurrentZoomChangedHandler
    • PDFViewWPF.FindTextFinsihedHandler
    • PDFViewWPF.LayoutChangedHandler
    • PDFViewWPF.LinkInfo
    • PDFViewWPF.OnActionEventHandler
    • PDFViewWPF.OnConversionEventHandler
    • PDFViewWPF.OnRenderFinishedEventHandler
    • PDFViewWPF.OnSetdocHandler
    • PDFViewWPF.OnThumbnailGeneratedEventHandler
    • PDFViewWPF.OverprintPreviewMode
    • PDFViewWPF.PagePresentationMode
    • PDFViewWPF.PageViewMode
    • PDFViewWPF.PDFViewWPFConversionType
    • PDFViewWPF.Selection
    • PDFViewWPF.TextSelectionMode
    • Point
    • Print
    • PrinterMode
    • PrinterMode.DuplexMode
    • PrinterMode.NUp
    • PrinterMode.NUpPageOrder
    • PrinterMode.Orientation
    • PrinterMode.OutputColor
    • PrinterMode.OutputQuality
    • PrinterMode.PaperSize
    • PrinterMode.PrintContentTypes
    • PrinterMode.ScaleType
    • PrintToPdfModule
    • PrintToPdfOptions
    • QuadPoint
    • Rect
    • RectCollection
    • Redactor
    • Redactor.Appearance
    • Redactor.Redaction
    • Reflow
    • RefreshOptions
    • Separation
    • Shading
    • Shading.Type
    • ShapedText
    • ShapedText.FailureReason
    • ShapedText.ShapingStatus
    • Stamper
    • Stamper.HorizontalAlignment
    • Stamper.SizeType
    • Stamper.TextAlignment
    • Stamper.VerticalAlignment
    • StructuredOutputModule
    • SVGConvertOptions
    • SVGParser
    • TemplateDocument
    • TemplateDocumentResult
    • TextDiffOptions
    • TextExtractor
    • TextExtractor.CharRange
    • TextExtractor.Line
    • TextExtractor.ProcessingFlags
    • TextExtractor.Style
    • TextExtractor.Word
    • TextExtractor.XMLOutputFlags
    • TextRange
    • TextSearch
    • TextSearch.ResultCode
    • TextSearch.SearchMode
    • TileInTransit
    • TimestampingConfiguration
    • TimestampingResult
    • TrustVerificationResult
    • VerificationOptions
    • VerificationOptions.CertificateTrustFlag
    • VerificationOptions.SignatureVerificationSecurityLevel
    • VerificationOptions.TimeMode
    • VerificationResult
    • VerificationResult.DigestStatus
    • VerificationResult.DocumentStatus
    • VerificationResult.ModificationPermissionsStatus
    • VerificationResult.TrustStatus
    • ViewChangeCollection
    • ViewerOptimizedOptions
    • WordToPDFOptions
  • pdftron.PDF.Annots
    • Caret
    • CheckBoxWidget
    • Circle
    • ComboBoxWidget
    • FileAttachment
    • FileAttachment.Icon
    • FreeText
    • FreeText.IntentName
    • Highlight
    • Ink
    • Line
    • Line.CapPos
    • Line.EndingStyle
    • Line.IntentType
    • Link
    • Link.HighlightingMode
    • ListBoxWidget
    • Markup
    • Markup.BorderEffect
    • Movie
    • Polygon
    • PolyLine
    • PolyLine.IntentType
    • Popup
    • PushButtonWidget
    • RadioButtonGroup
    • RadioButtonWidget
    • Redaction
    • Redaction.QuadForm
    • RubberStamp
    • RubberStamp.Icon
    • Screen
    • Screen.IconCaptionRelation
    • Screen.ScaleCondition
    • Screen.ScaleType
    • SignatureWidget
    • Sound
    • Sound.Icon
    • Square
    • Squiggly
    • StrikeOut
    • Text
    • Text.Icon
    • TextMarkup
    • TextWidget
    • Underline
    • Watermark
    • Widget
    • Widget.HighlightingMode
    • Widget.IconCaptionRelation
    • Widget.ScaleCondition
    • Widget.ScaleType
  • pdftron.PDF.Details
    • AnnotManager
    • AnnotTile
    • BlendEffect
  • pdftron.PDF.OCG
    • Config
    • Context
    • Context.OCDrawMode
    • Group
    • OCMD
    • OCMD.VisibilityPolicyType
  • pdftron.PDF.PDFA
    • PDFACompliance
    • PDFACompliance.Conformance
    • PDFACompliance.ErrorCode
    • PDFAOptions
  • pdftron.PDF.PDFUA
    • PDFUAConformance
    • PDFUAConformance.Level
    • PDFUAOptions
  • pdftron.PDF.Struct
    • ContentItem
    • ContentItem.Type
    • SElement
    • STree
  • pdftron.SDF
    • CreateDelegate
    • DictIterator
    • DocSnapshot
    • NameTree
    • NameTreeIterator
    • NumberTreeIterator
    • Obj
    • Obj.ObjType
    • ObjSet
    • PDFTronCustomSecurityHandler
    • ResultSnapshot
    • SDFDoc
    • SDFDoc.SaveOptions
    • SecurityDescriptor
    • SecurityHandler
    • SecurityHandler.Permission
    • SecurityManager
    • SecurityManagerSingleton
    • SignatureHandler
    • SignatureHandlerId
    • StdSecurityHandler
    • StdSecurityHandler.AlgorithmType
    • UndoManager

Namespace pdftron.PDF

Classes

Action

Sets the Action that will be triggered when the document is opened.

ActionParameter

Container for parameters used in handling various actions

AdvancedImagingConvertOptions

AdvancedImagingModule

static interface to PDFTron SDKs AdvancedImaging functionality

Annot

Adds an annotation at the specified location in a page's annotation array.

Annot.BorderStyle

BorderStyle structure specifies the characteristics of the annotation's border. The border is specified as a rounded rectangle.

BarcodeModule

static interface to Apryse SDK's barcode extraction functionality

BarcodeOptions

Bookmark

Adds/links the specified Bookmark to the root level of document's outline tree.

CADConvertOptions

CADModule

static interface to PDFTron SDKs CAD functionality

CMSSignatureOptions

Optional data for CMS creation.

CancelRequestRenderThread

CharData

CharData is a data structure returned by CharIterator that is used to provide extra information about a character within a text run. The extra information includes positioning information, the character data and a number of bytes taken by the character.

CharIterator

CharIterator is an iterator type that can be used to traverse CharData in the current e_text element. For a sample use case, please take a look at ElementReaderAdv sample project.

ColorPt

ColorPt is an array of colorants (or tint values) representing a color point in an associated color space.

ColorSpace

This abstract class is used to serve as a color space tag to identify the specific color space of a Color object. It contains methods that transform colors in a specific color space to/from several color space such as DeviceRGB and DeviceCMYK.

For purposes of the methods in this class, colors are represented as arrays of color components represented as doubles in a normalized range defined by each ColorSpace. For many ColorSpaces (e.g. DeviceRGB), this range is 0.0 to 1.0. However, some ColorSpaces have components whose values have a different range. Methods are provided to inquire per component minimum and maximum normalized values.

ContentReplacer

ContentReplacer is a utility class for replacing content (text and images) in existing PDF (template) documents.

Users can replace content in a PDF page using the following operations:

  • Replace an image that exists in a target rectangle with a replacement image.
  • Replace text that exists in a target rectangle with replacement text.
  • Replace all instances of a specially marked string with replacement string.
The following code replaces an image in a target region. It also replaces the text "[NAME]" and "[JOB_TITLE]" with "John Smith" and "Software Developer" respectively. Notice the square braces ('[' and ']') on the target strings in the original PDFDoc. These square braces are not included in the actual function calls below, as they're implicitly added.
PDFDoc doc("../../TestFiles/BusinessCardTemplate.pdf");
doc.InitSecurityHandler();
ContentReplacer replacer;
Page pg = doc.GetPage(1);
Image img = Image::Create(doc, "../../TestFiles/peppers.jpg");
replacer.AddImage(page.GetMediaBox(), img.GetSDFObj());
replacer.AddString("NAME", "John Smith");
replacer.AddString("JOB_TITLE", "Software Developer");
replacer.Process(page);

ConversionOptions

Convert

Converter is a utility class used to convert documents and files to PDF. Conversion of XPS, EMF and image files to PDF documents is performed internally. Other document formats are converted via native application and printing.

using namespace pdftron;
using namespace PDF;
PDFDoc pdfdoc;

     Convert.FromXps(pdfdoc, input_path + "simple-xps.xps" );
     Convert.FromEmf(pdfdoc, input_path + "simple-emf.emf" );
     Convert.ToPdf(pdfdoc, input_path + test docx file.docx );

     // Save the PDF document
     UString outputFile = output_path + "ConverterTest.pdf";
     pdfdoc.Save(outputFile, 0);
     pdfdoc.Close();
     }</code></pre>

The PDFTron PDFNet printer needs to be installed to convert document formats. On Windows installation of printer drivers requires administrator UAC. The printer is a virtual XPS printer supported on Vista and Windows 7, and on Windows XP with the XPS Essentials Pack.

To install the printer the process must be running as administrator. Execute:

ConvertPrinter.install();

Installation can take a few seconds, so it is recommended that you install the printer once as part of your deployment process. Duplicated installations will be quick since the presence of the printer is checked before installation is attempted.

There is no need to uninstall the printer after conversions, it can be left installed for later access. To uninstall the printer the process must be running as administrator. Execute:

ConvertPrinter.uninstall();

Convert.EPUBOutputOptions

A class containing options for ToEPUB functions

Convert.ExcelOutputOptions

A class containing options common to ToExcel functions

Convert.HTMLOutputOptions

A class containing options for ToHTML and ToEPUB functions

Convert.OutputOptionsOCR

A class containing OCR options common to the ToHtml, ToWord, ToExcel, ToPowerPoint functions

Convert.PowerPointOutputOptions

A class containing options common to ToPowerPoint functions

Convert.Printer

Convert::Printer is a utility class to install the a printer for print-based conversion of documents for Convert::ToPdf.

Convert.SVGOutputOptions

A class containing options for ToSvg functions

Convert.StructuredOutputOptions

A class containing StructuredOutput options

Convert.TiffOutputOptions

A class containing options for ToTiff functions

Convert.WPFConverterOptions

pdftron.PDF.Convert can be used to to convert .xaml files into PDF documents with control over headers and footers, main body placement, and column widths. Pagination is controlled by specifying the page size and body size (margins) and all pages are appended to a PDFDoc which can then be further manipulated using the PDFNet API.

Three types of XAML objects are convertible using pdftron.PDF.Convert:

  • FlowDocument's which describe content that is reflowable from page to page
  • FixedDocument's which describe content that has been placed on a fixed page
  • Blocks such as Canvas, RichTextBox, Section etc which can be wrapped or inserted directly into a FlowDocument

Limitations: There are many Xaml classes that cannot be added to FlowDocument or FixedDocuments and therefore this sample converter cannot convert them. Examples include Page, Window or Frame objects.

Convert.WordOutputOptions

A class containing options common to ToWord functions

Convert.XODOutputOptions

A class containing options for ToXod functions

Convert.XPSOutputCommonOptions

A class containing options common to ToXps and ToXod functions

Convert.XPSOutputOptions

A class containing options for ToXps functions

CubicCurveBuilder

Creates Cubic Curves from linear points

DataExtractionModule

static interface to PDFTron SDKs data extraction functionality

DataExtractionOptions

Date

Set document's creation date.

Destination

A utility method used to set the fist page displayed after the document is opened. This method is equivalent to PDFDoc::SetOpenAction(goto_action)

If OpenAction is not specified the document should be opened to the top of the first page at the default magnification factor.

DiffOptions

DigitalSignatureField

A class representing a digital signature form field.

DigitalSignatureFieldIterator

DigitalSignatureFieldIterator is an iterator type that can be used to traverse a list of digital signature form fields in a PDF document.

DisallowedChange

Data pertaining to a change detected in a document during a digital signature modification permissions verification step, the change being both made after the signature was signed, and disallowed by the signature's permissions settings.

DocumentConversion

Encapsulates the conversion of a single document from one format to another.

Element

Element is the abstract interface used to access graphical elements used to build the display list.

Just like many other classes in PDFNet (e.g. ColorSpace, Font, Annot, etc), Element class follows the composite design pattern. This means that all Elements are accessed through the same interface, but depending on the Element type (that can be obtained using GetType()), only methods related to that type can be called. For example, if GetType() returns e_image, it is illegal to call a method specific to another Element type (i.e. a call to a text specific GetTextData() will throw an Exception).

ElementBuilder

ElementBuilder is used to build new PDF.Elements (e.g. image, text, path, etc) from scratch. In conjunction with ElementWriter, ElementBuilder can be used to create new page content.

ElementReader

ElementReader can be used to parse and process content streams. ElementReader provides a convenient interface used to traverse the Element display list of a page. The display list representing graphical elements (such as text-runs, paths, images, shadings, forms, etc) is accessed using the intrinsic iterator. ElementReader automatically concatenates page contents spanning multiple streams and provides a mechanism to parse contents of sub-display lists (e.g. forms XObjects and Type3 fonts).

ElementWriter

ElementWriter can be used to assemble and write new content to a page, Form XObject, Type3 Glyph stream, pattern stream, or any other content stream.

EmbeddedTimestampVerificationResult

This class represents the result of verifying a secure embedded timestamp digital signature.

Field

Flatten/Merge existing form field appearances with the page content and remove widget annotation.

Form 'flattening' refers to the operation that changes active form fields into a static area that is part of the PDF document, just like the other text and images in the document. A completely flattened PDF form does not have any widget annotations or interactive fields.

FieldIterator

FieldIterator is an iterator type that can be used to traverse a list form fields in a PDF document. For more information, please PDFDoc.getFieldIterator().

FileSpec

Associates a file attachment with the document.

The file attachment will be displayed in the user interface of a viewer application (in Acrobat this is File Attachment tab). The function differs from Annot.CreateFileAttachment() because it associates the attachment with the whole document instead of an annotation on a specific page.

Flattener

Flattener is a utility class that can be used to create PDF's that render faster on devices with lower memory and speeds.

By using the FlattenMode::e_flatten option each page in the PDF will be reduced to a single background image, with the remaining text over top in vector format. Some text may still get flattened, in particular any text that is clipped, or underneath, other content that will be flattened.

On the other hand the FlattenMode::e_simple will not flatten simple content, such as simple straight lines, nor will it flatten Type3 fonts. Flattener is a optional PDFNet add-on that can be used to simplify and optimize existing PDF's to render faster on devices with lower memory and speeds.

PDF documents can frequently contain very complex page description (e.g. thousands of paths, different shadings, color spaces, blend modes, large images etc.) that may not be suitable for interactive viewing on mobile devices. Flattener can be used to speed-up PDF rendering on mobile devices and on the Web by simplifying page content (e.g. flattening complex graphics into images) while maintaining vector text whenever possible.

By using the FlattenMode::e_simple option each page in the PDF will be reduced to a single background image, with the remaining text over top in vector format. Some text may still get flattened, in particular any text that is clipped, or underneath, other content that will be flattened.

On the other hand the FlattenMode::e_fast will not flatten simple content, such as simple straight lines, nor will it flatten Type3 fonts.

Font

A font that is used to draw text on a page. It corresponds to a Font Resource in a PDF file. More than one page may reference the same Font object. A Font has a number of attributes, including an array of widths, the character encoding, and the font's resource name.

PDF document can contain several different types of fonts and Font class represents a single, flat interface around all PDF font types.

There are two main classes of fonts in PDF: simple and composite fonts.

Simple fonts are Type1, TrueType, and Type3 fonts. All simple fonts have the following properties:

  • Glyphs in the font are selected by single-byte character codes obtained from a string that is shown by the text-showing operators. Logically, these codes index into a table of 256 glyphs; the mapping from codes to glyphs is called the font's encoding. Each font program has a built-in encoding. Under some circumstances, the encoding can be altered by means described in Section 5.5.5 "Character Encoding" in PDF Reference Manual.
  • Each glyph has a single set of metrics. Therefore simple fonts support only horizontal writing mode.
A composite font is one whose glyphs are obtained from a font like object called a CIDFont (e.g. CIDType0Font and CIDType0Font). A composite font is represented by a font dictionary whose Subtype value is Type0. The Type 0 font is known as the root font, while its associated CIDFont is called its descendant. CID-keyed fonts provide a convenient and efficient method for defining multiple-byte character encodings and fonts with a large number of glyphs. These capabilities provide great flexibility for representing text in writing systems for languages with large character sets, such as Chinese, Japanese, and Korean (CJK).

FontCharCodeIterator

FontCharCodeIterator is an iterator type that can be used to traverse a list of visible char codes in a font embedded in PDF. For more information, please take a look at Font.getCodeIterator().

Function

Although PDF is not a programming language it provides several types of function object that represent parameterized classes of functions, including mathematical formulas and sampled representations with arbitrary resolution. Functions are used in various ways in PDF, including device-dependent rasterization information for high-quality printing (halftone spot functions and transfer functions), color transform functions for certain color spaces, and specification of colors as a function of position for smooth shadings. Functions in PDF represent static, self-contained numerical transformations.

PDF::Function represents a single, flat interface around all PDF function types.

GSChangesIterator

GSChangesIterator is an iterator type that can be used to traverse a list of changes in the graphics state between subsequnet graphical elements on the page. For a sample use case, please take a look at ElementReaderAdv sample project.

GState

GState is a class that keeps track of a number of style attributes used to visually define graphical Elements. Each PDF::Element has an associated GState that can be used to query or set various graphics properties.

GeometryCollection

A Preprocessed PDF geometry collection

HTML2PDF

            <p>

'pdftron.PDF.HTML2PDF' is an optional PDFNet Add-On utility class that can be used to convert HTML web pages into PDF documents by using an external module (html2pdf).

The html2pdf modules can be downloaded from http: www.pdftron.com/pdfnet/downloads.html.

Users can convert HTML pages to PDF using the following operations:

  • Simple one line static method to convert a single web page to PDF.
  • Convert HTML pages from URL or string, plus optional table of contents, in user defined order.
  • Optionally configure settings for proxy, images, java script, and more for each HTML page.
  • Optionally configure the PDF output, including page size, margins, orientation, and more.
  • Optionally add table of contents, including setting the depth and appearance.
The following code converts a single webpage to pdf
using System;
using System.IO;
using pdftron;
using pdftron.Common;
using pdftron.SDF;
using pdftron.PDF;

using (PDFDoc doc = new PDFDoc())
{
	if ( HTML2PDF.Convert(doc, "http://www.gutenberg.org/wiki/Main_Page") )
		doc.Save(outputFile, SDFDoc.SaveOptions.e_linearized);
}

The following code demonstrates how to convert multiple web pages into one pdf, including any images and the background, but with lowered image quality to save space.

using System;
using System.IO;
using pdftron;
using pdftron.Common;
using pdftron.SDF;
using pdftron.PDF;

using (PDFDoc doc = new PDFDoc())
{
	HTML2PDF converter = new HTML2PDF();
	converter.SetImageQuality(25);

	HTML2PDF.WebPageSettings settings = new HTML2PDF.WebPageSettings();
	settings.SetPrintBackground(false);

	converter.InsertFromURL("http://www.gutenberg.org/wiki/Main_Page", settings);

	if ( HTML2PDF.Convert(doc, "http://en.wikipedia.org/wiki/Canada") )
		doc.Save(outputFile, SDFDoc.SaveOptions.e_linearized);
}

HTML2PDF.Proxy

Proxy settings to be used when loading content from web pages.

HTML2PDF.TOCSettings

Settings for table of contents.

HTML2PDF.WebPageSettings

Settings that control how a web page is opened and converted to PDF.

HTTPRequestOptions

Class for customizing network requests.

Highlights

Get a Highlights object based on an array of character ranges

Image

Image class provides common methods for working with PDF images.

Image2RGB

Image2RGB is a filter that can decompress and normalize any PDF image stream (e.g. monochrome, CMYK, etc) into a raw RGB pixel stream.

KeyStrokeActionResult

A class that contains information from a KeyStrokeAction.

KeyStrokeEventData

KeyStrokeEventData contains information for executing KeyStrokeAction

MergeXFDFOptions

OCRModule

static interface to PDFTron SDKs OCR functionality

OCROptions

OfficeToPDFOptions

Optimizer

            <p>The Optimizer class provides functionality for optimizing/shrinking

output PDF files.

'pdftron.PDF.Optimizer' is an optional PDFNet Add-On utility class that can be used to optimize PDF documents by reducing the file size, removing redundant information, and compressing data streams using the latest in image compression technology. PDF Optimizer can compress and shrink PDF file size with the following operations:

  • Remove duplicated fonts, images, ICC profiles, and any other data stream.
  • Optionally convert high-quality or print-ready PDF files to small, efficient and web-ready PDF.
  • Optionally down-sample large images to a given resolution.
  • Optionally compress or recompress PDF images using JBIG2 and JPEG2000 compression formats.
  • Compress uncompressed streams and remove unused PDF objects.

Optimizer.ImageSettings

A class that stores downsampling/recompression settings for color and grayscale images.

Optimizer.MonoImageSettings

A class that stores image downsampling/recompression settings for monochrome images.

Optimizer.OptimizerSettings

A class that stores settings for the optimizer

Optimizer.TextSettings

A class that stores text optimization settings.

OptionsBase

PDF2HtmlReflowParagraphsModule

static interface to PDFTron SDKs PDF to HTML functionality

PDF2WordModule

static interface to PDFTron SDKs PDF to Word functionality

PDFDC

            <p>

PDFDC is a utility class used to represent a PDF Device Context (DC). Windows developers can use standard GDI or GDI+ API-s to write on PDFDC and to generate PDF documents based on their existing drawing functions. PDFDC can also be used to implement file conversion from any printable file format to PDF.

PDFDC class can be used in many ways to translate from GDI to PDF:
  • To translate a single GDI drawing into a single page PDF document.
  • To translate a single GDI drawing into an object which can be reused many times throughout a PDF document (i.e. as a Form XObject).
  • To translate many GDI drawings into single page or multipage PDF document. ...
Very few code changes are required to perform the translation from GDI to PDF as PDFDC provides a GDI Device Context handle which can be passed to all GDI function requiring an HDC. PDFDC does not use a "Virtual Printer" approach so the translation should be of both high quality and speed. Unfortunately this also means that StartDoc, EndDoc, StartPage and EndPage cannot be called with an HDC created with PDFDC::Begin. For more advanced translations or creations of PDF documents, such as security handling, the use of other PDFNet classes will be required. An example use of PDFDC can be found in PDFDCTest.cpp:
 // Start with a PDFDoc to put the picture into, and a PDFDC to translate GDI to PDF
 PDFDoc pdfdoc;
 PDFDC pdfDc;
 // Create a page to put the GDI content onto
 Page page = pdfdoc.PageCreate();
 // Begin the translation from GDI to PDF.
 // Provide the page to place the picture onto, and the bounding box for the content.
 // We're going to scale the GDI content to fill the page while preserving the aspect
 // ratio.
 // Get back a GDI Device Context
 HDC hDC = pdfDc.Begin( page, page.GetCropBox() );
// ... perform GDI drawing ...
 // Complete the translation
 pdfDc.End(); 
 // Add the page to the document
 pdfdoc.PagePushBack(page);
 // Save the PDF document
 pdfdoc.Save("PDFDC_is_cool.pdf", SDF.SDFDoc.SaveOptions.e_remove_unused, NULL);

PDFDCEX

PDFDCEX is a utility class used to represent a PDF Device Context (DC). Windows developers can use standard GDI or GDI+ API-s to write on PDFDCEX and to generate PDF documents based on their existing drawing functions. PDFDCEX can also be used to implement file conversion from any printable file format to PDF. PDFDCEX class can be used in many ways to translate from GDI to PDF:

  • To translate a single GDI drawing into a single page PDF document.
  • To translate a single GDI drawing into an object which can be reused many times throughout a PDF document (i.e. as a Form XObject).
  • To translate many GDI drawings into single page or multipage PDF document. ...
Very few code changes are required to perform the translation from GDI to PDF as PDFDCEX provides a GDI Device Context handle which can be passed to all GDI function requiring an HDC. PDFDCEX does use a "Virtual Printer" approach so the translation should be of both high quality and speed. For more advanced translations or creations of PDF documents, such as security handling, the use of other PDFNet classes will be required. An example use of PDFDCEX can be found in PDFDCTest.cpp:
// Start with a PDFDoc to put the picture into, and a PDFDCEX to translate GDI to PDF
PDFDoc pdfdoc;
PDFDCEX pdfdcex;
// Begin the translation from GDI to PDF, provide the PDFDoc to append the translated
// GDI drawing to and get back a GDI Device Context
HDC hDC = pdfdcex.Begin(pdfdoc);
 ::StartPage(hDC);
// ... perform GDI drawing ...
//::EndPage(hDC);
 // Complete the translation
 pdfdcex.EndDoc();
 // Save the PDF document
pdfdoc.Save("PDFDCEX_is_cool.pdf", SDF.SDFDoc.SaveOptions.e_remove_unused, NULL);

PDFDoc

PDFDoc is a high-level class describing a single PDF (Portable Document Format) document. Most applications using PDFNet will use this class to open existing PDF documents, or to create new PDF documents from scratch.

The class offers a number of entry points into the document. For example,

  • To access pages use pdfdoc.getPageIterator() or pdfdoc.PageFind(page_num).
  • To access form fields use pdfdoc.GetFieldIterator(), pdfdoc.GetFieldIterator(name) or pdfdoc.GetField(name).
  • To access document's meta-data use pdfdoc.GetDocInfo().
  • To access the outline tree use pdfdoc.GetFirstBookmark().
  • To access low-level Document Catalog use pdfdoc.GetRoot().
The class also offers utility methods to slit and merge PDF pages, to create new pages, to flatten forms, to change security settings, etc.

PDFDocInfo

PDFDocInfo is a high-level utility class that can be used to read and modify document's metadata.

PDFDocViewPrefs

PDFDocViewPrefs is a high-level utility class that can be used to control the way the document is to be presented on the screen or in print.

PDFDocViewPrefs class corresponds to PageMode, PageLayout, and ViewerPreferences entries in the document's catalog. For more details please refer to section 8.1 'Viewer Preferences' in PDF Reference Manual.

PDFDraw

PDFDraw contains methods for converting PDF pages to images and to Bitmap objects. Utility methods are provided to export PDF pages to various raster formats as well as to convert pages to GDI+ bitmaps for further manipulation or drawing.

PDFNetInternalTools

Encapsulates the conversion of a single document from one format to another.

PDFRasterizer

PDFRasterizer is a low-level PDF rasterizer. The main purpose of this class is to convert PDF pages to raster images (or bitmaps). PDFRasterizer is a relatively low-level class. If you need to convert PDF page to an image format or a Bitmap, consider using PDF::PDFDraw. Similarly, if you are building an interactive PDF viewing application use PDF::PDFViewCtrl instead.

PDFViewCtrl

PDFViewCtrl is a utility class that can be used for interactive rendering of PDF documents. In .NET environment PDFViewCtrl is derived from System.Windows.Forms.Control and it can be used like a regular form (see PDFViewForm.cs in PDFView sample for C# for a concrete example). PDFViewCtrl is a control that implements a number of tool modes, dialog boxes like find and password, has some built-in form filling capabilities and a navigation panel for bookmarks, thumbview and layer views.

PDFView defines several coordinate spaces and it is important to understand their differences:

  • Page Space refers to the space in which a PDF page is defined. It is determined by a page itself and the origin is at the lower-left corner of the page. Note that Page Space is independent of how a page is viewed in PDFView and each page has its own Page space.

  • Canvas Space refers to the tightest axis-aligned bounding box of all the pages given the current page presentation mode in PDFView. For example, if the page presentation mode is e_single_continuous, all the pages are arranged vertically with one page in each row, and therefore the Canvas Space is rectangle with possibly large height value. For this reason, Canvas Space is also, like Page Space, independent of the zoom factor. Also note that since PDFView adds gaps between adjacent pages, the Canvas Space is larger than the space occupied by all the pages. The origin of the Canvas Space is located at the upper-left corner.

  • Screen Space (or Client Space) is the space occupied by PDFView and its origin is at the upper-left corner. Note that the virtual size of this space can extend beyond the visible region.

  • Scrollable Space is the virtual space within which PDFView can scroll. It is determined by the Canvas Space and the current zoom factor. Roughly speaking, the dimensions of the Scrollable Space is the dimensions of the Canvas Space timed by the zoom. Therefore, a large zoom factor will result in a larger Scrollable region given the same Canvas region. For this reason, Scrollable Space might also be referred to as Zoomed Canvas Space. Note that since PDFView adds gaps between pages in Canvas Space and these gaps are not scaled when rendered, the scrollable range is not exactly what the zoom factor times the Canvas range. For functions such as SetHScrollPos(), SetVScrollPos(), GetCanvasHeight(), and GetCanvasWidth(), it is the Scrollable Space that is involved.

PDFViewCtrl.LinkInfo

LinkInfo is a utility class that retains link information when used with GetLinkAt().

PDFViewCtrl.Selection

Selection is a utility class that allows access to PDFViewCtrl's current selection.

PDFViewWPF

PDFViewWPF.ActionEventArgs

Class for Transporting the OnAction event

PDFViewWPF.LinkInfo

LinkInfo is a utility class that retains link information when used with GetLinkAt().

PDFViewWPF.Selection

Selection is a utility class that allows access to PDFViewCtrl's current selection.

Page

Page is a high-level class representing PDF page object (see 'Page Objects' in Section 3.6.2, 'Page Tree,' in PDF Reference Manual).

Among other associated objects, a page object contains:

  • A series of objects representing the objects drawn on the page (See Element and ElementReader class for examples of how to extract page content).
  • A list of resources used in drawing the page
  • Annotations
  • Beads, private metadata, optional thumbnail image, etc.

PageIterator

PageIterator is an iterator type that can be used to traverse a list pages in a PDF document. For more information, please PDFDoc::GetPageIterator().

PageLabel

            <p>PDF page labels can be used to describe a page. This is used to 

allow for non-sequential page numbering or the addition of arbitrary labels for a page (such as the inclusion of Roman numerals at the beginning of a book). PDFNet PageLabel object can be used to specify the numbering style to use (for example, upper- or lower-case Roman, decimal, and so forth), the starting number for the first page, and an arbitrary prefix to be pre-appended to each number (for example, "A-" to generate "A-1", "A-2", "A-3", and so forth.)

PageLabel corresponds to the PDF Page Label object (Section 8.3.1, 'Page Labels' in the PDF Reference Manual.

Each page in a PDF document is identified by an integer page index that expresses the page's relative position within the document. In addition, a document may optionally define page labels to identify each page visually on the screen or in print. Page labels and page indices need not coincide: the indices are fixed, running consecutively through the document starting from 1 for the first page, but the labels can be specified in any way that is appropriate for the particular document. For example, if the document begins with 12 pages of front matter numbered in roman numerals and the remainder of the document is numbered in Arabic, the first page would have a page index of 1 and a page label of i, the twelfth page would have index 12 and label xii, and the thirteenth page would have index 13 and label 1.

For purposes of page labeling, a document can be divided into labeling ranges, each of which is a series of consecutive pages using the same numbering system. Pages within a range are numbered sequentially in ascending order. A page's label consists of a numeric portion based on its position within its labeling range, optionally preceded by a label prefix denoting the range itself. For example, the pages in an appendix might be labeled with decimal numeric portions prefixed with the string "A-" and the resulting page labels would be "A-1", "A-2",

There is no default numbering style; if no 'S' (Style) entry is present, page labels consist solely of a label prefix with no numeric portion. For example, if the 'P' entry (Prefix) specifies the label prefix "Appendix", each page is simply labeled "Appendix" with no page number. If the 'P' entry is also missing or empty, the page label is an empty string.

Sample code (See PableLabelsTest sample project for examples):

Create a page labeling scheme that starts with the first page in the document (page 1) and is using uppercase roman numbering style.

doc.SetPageLabel(1, PageLabel::Create(doc, PageLabel::e_roman_uppercase, "My Prefix ", 1));

Create a page labeling scheme that starts with the fourth page in the document and is using decimal arabic numbering style. Also the numeric portion of the first label should start with number 4 (otherwise the first label would be "My Prefix 1").

PageLabel L2 = PageLabel::Create(doc, PageLabel::e_decimal, "My Prefix ", 4);
doc.SetPageLabel(4, L2);

Create a page labeling scheme that starts with the seventh page in the document and is using alphabetic numbering style. The numeric portion of the first label should start with number 1.

PageLabel L3 = PageLabel::Create(doc, PageLabel::e_alphabetic_uppercase, "My Prefix ", 1);
doc.SetPageLabel(7, L3);

Read page labels from an existing PDF document.

PageLabel label = new PageLabel();
for (int i=1; i<=doc.GetPageCount(); ++i) {
label = doc.GetPageLabel(i);
if (label.IsValid()) {
string title = label.GetLabelTitle(i);
}

PageSet

PageSet is a container of page numbers ordered following a linear sequence. The page numbers are integers and must be greater than zero. Duplicates are allowed.

PathData

Contains the information required to draw the path. Contains an array of PathSegmentType Operators and corresponding path data Points. A point may be on or off (off points are control points). The meaning of a point depends on associated id (or segment type) in the path segment type array.

PatternColor

Patterns are quite general, and have many uses; for example, they can be used to create various graphical textures, such as weaves, brick walls, sunbursts, and similar geometrical and chromatic effects. Patterns are specified in a special family of color spaces named Pattern, whose 'color values' are PatternColor objects instead of the numeric component values used with other spaces. Therefore PatternColor is to pattern color space what is ColorPt to all other color spaces.

A tiling pattern consists of a small graphical figure called a pattern cell. Painting with the pattern replicates the cell at fixed horizontal and vertical intervals to fill an area. The effect is as if the figure were painted on the surface of a clear glass tile, identical copies of which were then laid down in an array covering the area and trimmed to its boundaries. This is called tiling the area.

The pattern cell can include graphical elements such as filled areas, text, and sampled images. Its shape need not be rectangular, and the spacing of tiles can differ from the dimensions of the cell itself.

The order in which individual tiles (instances of the cell) are painted is unspecified and unpredictable; it is inadvisable for the figures on adjacent tiles to overlap.

Point

The Class Point.

Print

Print is a utility class for printing PDF documents to printers.

PrintToPdfModule

An interface into Apryse SDKs Print To PDF functionality

PrintToPdfOptions

PrinterMode

PrinterMode is a utility class used to set printer options for printing PDF documents.

QuadPoint

Rect

Rect is a utility class used to manipulate PDF rectangle objects (refer to section 3.8.3 of the PDF Reference Manual).

Rect can be associated with a SDF/Cos rectangle array using Rect(Obj*) constructor or later using Rect::Attach(Obj*) or Rect::Update(Obj*) methods.

Rect keeps a local cache for rectangle points so it is necessary to call Rect::Update() method if the changes to the Rect should be saved in the attached Cos/SDF array.

RectCollection

Redactor

PDF Redactor is a separately licensable Add-on that offers options to remove (not just covering or obscuring) content within a region of PDF. With printed pages, redaction involves blacking-out or cutting-out areas of the printed page. With electronic documents that use formats such as PDF, redaction typically involves removing sensitive content within documents for safe distribution to courts, patent and government institutions, the media, customers, vendors or any other audience with restricted access to the content.

The redaction process in PDFNet consists of two steps:

a) Content identification: A user applies redact annotations that specify the pieces or regions of content that should be removed. The content for redaction can be identified either interactively (e.g. using 'pdftron.PDF.PDFViewCtrl' as shown in PDFView sample) or programmatically (e.g. using 'pdftron.PDF.TextSearch' or 'pdftron.PDF.TextExtractor'). Up until the next step is performed, the user can see, move and redefine these annotations.

b) Content removal: Using 'pdftron.PDF.Redactor.Redact()' the user instructs PDFNet to apply the redact regions, after which the content in the area specified by the redact annotations is removed. The redaction function includes number of options to control the style of the redaction overlay (including color, text, font, border, transparency, etc.).

PDFTron Redactor makes sure that if a portion of an image, text, or vector graphics is contained in a redaction region, that portion of the image or path data is destroyed and is not simply hidden with clipping or image masks. PDFNet API can also be used to review and remove metadata and other content that can exist in a PDF document, including XML Forms Architecture (XFA) content and Extensible Metadata Platform (XMP) content.

Redactor.Appearance

Class used to customize the appearance of the optional redaction overlay.

Redactor.Redaction

Reflow

Reflow annotations between PDF and HTML

RefreshOptions

SVGConvertOptions

SVGParser

Separation

Separation contains a memory buffer and CMYK components' information about rasterized separations used in PDFDraw::GetSeparationBitmaps and PDFRasterizer::RasterizeSeparations

Shading

Shading is a class that represents a flat interface around all PDF shading types:

  • In Function-based (type 1) shadings, the color at every point in the domain is defined by a specified mathematical function. The function need not be smooth or continuous. This is the most general of the available shading types, and is useful for shadings that cannot be adequately described with any of the other types.
  • Axial shadings (type 2) define a color blend along a line between two points, optionally extended beyond the boundary points by continuing the boundary colors.
  • Radial shadings (type 3) define a color blend that varies between two circles. Shadings of this type are commonly used to depict three-dimensional spheres and cones.
  • Free-form Gouraud-shaded triangle mesh shadings (type 4) and lattice gouraud shadings (type 5) are commonly used to represent complex colored and shaded three-dimensional shapes. The area to be shaded is defined by a path composed entirely of triangles. The color at each vertex of the triangles is specified, and a technique known as Gouraud interpolation is used to color the interiors. The interpolation functions defining the shading may be linear or nonlinear.
Coons patch mesh shadings (type 6) are constructed from one or more color patches, each bounded by four cubic B'zier curves.

A Coons patch generally has two independent aspects:

  • Colors are specified for each corner of the unit square, and bilinear interpolation is used to fill in colors over the entire unit square
  • Coordinates are mapped from the unit square into a four-sided patch whose sides are not necessarily linear. The mapping is continuous: the corners of the unit square map to corners of the patch and the sides of the unit square map to sides of the patch.
  • Tensor-product patch mesh shadings (type 7) are identical to type 6 (Coons mesh), except that they are based on a bicubic tensor-product patch defined by 16 control points, instead of the 12 control points that define a Coons patch. The shading Patterns dictionaries representing the two patch types differ only in the value of the Type entry and in the number of control points specified for each patch in the data stream. Although the Coons patch is more concise and easier to use, the tensor- product patch affords greater control over color mapping.

ShapedText

A sequence of positioned glyphs -- the visual representation of a given text string

Stamper

Stamper is a utility class that can be used to PDF pages with text, images, or with other PDF content in only a few lines of code.

Although Stamper is very simple to use compared to ElementBuilder/ElementWriter it is not as powerful or flexible. In case you need full control over PDF creation use ElementBuilder/ElementWriter to add new content to existing PDF pages as shown in the ElementBuilder sample project.

StructuredOutputModule

static interface to PDFTron SDKs PDF to Word, Excel, PowerPoint, HTML functionality

TemplateDocument

Encapsulates a template document that can merged with data to generate any number of PDFs.

TextDiffOptions

TextExtractor

            <p>

TextExtractor is used to analyze a PDF page and extract words and logical structures that are visible within a given region. The resulting list of lines and words can be traversed element by element or accessed as a string buffer. The class also includes utility methods to extract PDF text as HTML or XML.

Possible use case scenarios for TextExtractor include:
  • Converting PDF pages to text or XML for content repurposing.
  • Searching PDF pages for specific words or keywords.
  • Indexing large PDF repositories for indexing or content.
retrieval purposes (i.e. implementing a PDF search engine).
  • Classifying or summarizing PDF documents based on their text content.
  • Finding specific words for content editing purposes (such as splitting pages.
The main task of TextExtractor is to interpret PDF pages and offer a simple to use API to:
  • Normalize all text content to Unicode.
  • Extract inferred logical structure (word by word, line by line, or paragraph by paragraph).
  • Extract positioning information for every line, word, or a glyph.
  • Extract style information (such as information about the font, font size, font styles, etc) for every line, word, or a glyph.
  • Control the content analysis process. A number of options (such as removal of text obscured by images) is available to let the user direct the flow of content recognition algorithms that will meet their requirements.
  • Offer utility methods to convert PDF page content to text, XML, or HTML.

TextExtractor is analyzing only textual content of the page. This means that the rasterized (e.g. in scanned pages) or vectorized text (where glyphs are converted to path outlines) will not be recognized as text. Please note that it is still possible to extract this content using pdftron.PDF.ElementReader interface.

In some cases TextExtractor may extract text that does not appear to be on the visible page (e.g. when text is obscured by an image or a rectangle). In these situations it is possible to use processing flags such as 'e_remove_hidden_text' and 'e_no_invisible_text' to remove hidden text.

For full sample code, please take a look at TextExtract sample project.
//... Initialize PDFNet ...
PDFDoc doc = new PDFDoc(filein);
doc.initSecurityHandler();
Page page = doc.pageBegin().current();
TextExtractor txt = new TextExtractor();
txt.begin(page, 0, TextExtractor.ProcessingFlags.e_remove_hidden_text);
string text = txt.getAsText();
// or traverse words one by one...
TextExtractor.Word word;
for (TextExtractor.Line line = txt.GetFirstLine(); line.IsValid(); line=line.GetNextLine()) {
for (word=line.GetFirstWord(); word.IsValid(); word=word.GetNextWord()) {
string w = word.GetString();
}
}

TextExtractor.Line

TextExtractor::Line object represents a line of text on a PDF page. Each line consists of a sequence of words, and each words in one or more styles.

TextExtractor.Style

A class representing predominant text style associated with a given Line, a Word, or a Glyph. The class includes information about the font, font size, font styles, text color, etc.

TextExtractor.Word

TextExtractor.Word object represents a word on a PDF page. Each word contains a sequence of characters in one or more styles (see TextExtractor.Style).

TextRange

The TextRange class represents a contiguous range of text on a PDF page. It may be the result of a text search, or simply a couple of highlighted or underlined words.

Each text range contains a few pieces of information:

page: the number of the page this piece of text is on; position: the start position (text offset); length: the length.

You are able to retrieve further information about the text range, such as its coordinates, the text itself, as well as characters before and after.

TextSearch

TextSearch searches through a PDF document for a user-given search pattern. The current implementation supports both verbatim search and the search using regular expressions, whose detailed syntax can be found at:

http://www.boost.org/doc/libs/release/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

TextSearch also provides users with several useful search modes and extra information besides the found string that matches the pattern. TextSearch can either keep running until a matched string is found or be set to return periodically in order for the caller to perform any necessary updates (e.g., UI updates). It is also worth mentioning that the search modes can be changed on the fly while searching through a document.

Possible use case scenarios for TextSearch include:

  • Guide users of a PDF viewer (e.g. implemented by PDFViewCtrl) to places where they are intersted in;
  • Find interested PDF documents which contain certain patterns;
  • Extract interested information (e.g., credit card numbers) from a set of files;
  • Extract Highlight information (refer to the Highlights class for details) from files for external use.
  • Since hyphens ('-') are frequently used in PDF documents to concatenate the two broken pieces of a word at the end of a line, for example "TextSearch is powerful for finding patterns in PDF files; yes, it is really pow- erful." a search for "powerful" should return both instances. However, not all end-of-line hyphens are hyphens added to connect a broken word; some of them could be "real" hyphens. In addition, an input search pattern may also contain hyphens that complicate the situation. To tackle this problem, the following conventions are adopted:
    1. When in the verbatim search mode and the pattern contains no hyphen, a matching string is returned if it is exactly the same or it contains end-of-line or start-of-line hyphens. For example, as mentioned above, a search for "powerful" would return both instances.
    2. When in verbatim search mode and the pattern contains one or multiple hyphens, a matching string is returned only if the string matches the pattern exactly. For example, a search for "pow-erful" will only return the second instance, and a search for "power-ful" will return nothing.
    3. When searching using regular expressions, hyphens are not taken care implicitly. Users should take care of it themselves. For example, in order to find both the "powerful" instances, the input pattern can be "pow-{0,1}erful".
For a full sample, please take a look at the TextSearch sample project.
//... Initialize PDFNet ...
PDFDoc doc = new PDFDoc(filein);
doc.initSecurityHandler();
int mode = TextSearch.e_whole_word | TextSearch.e_page_stop;
UString pattern( "joHn sMiTh" );
TextSearch txt_search = new TextSearch();

//PDFDoc doesn't allow simultaneous access from different threads. If this //document could be used from other threads (e.g., the rendering thread inside //PDFView/PDFViewCtrl, if used), it is good practice to lock it. //Notice: don't forget to call doc.Unlock() to avoid deadlock. doc.Lock(); txt_search.Begin( doc, pattern, mode, -1, -1 ); while ( true ) { TextSearch.ResultCode result = txt_search.Run(); if ( result.GetCode() == TextSearchResult.e_found ) { Console.WriteLine("found one instance: " + result.GetResultStr()); } else { break; } }

//unlock the document to avoid deadlock. doc.UnLock();

TileInTransit

TimestampingConfiguration

A class representing a set of options for timestamping a document.

TimestampingResult

A class representing the result of testing a timestamping configuration.

TrustVerificationResult

The detailed result of a trust verification step of a verification operation performed on a digital signature.

VerificationOptions

Options pertaining to digital signature verification.

VerificationResult

The result of a verification operation performed on a digital signature.

ViewChangeCollection

Class for collecting changes to a PDFDoc and/or viewer, which can be passed to various functions to act on. Allows for chaining of modifications, which can then be updated by PDFNet in the best possible way.

ViewerOptimizedOptions

A class containing ViewerOptimizedOptions

WordToPDFOptions

Structs

TextExtractor.CharRange

TextExtractor.CharRange object represents a range of text based on Unicode character indices.

Enums

Action.FormActionFlag

Flags used by submit form actions. Exclude flag is also used by reset form action. No other action types use flags in the current version of PDF standard (ISO 2300).

Action.Type

Action types

Annot.AnnotationState

annotation appearances types

Annot.BorderStyle.Style

The border style

Annot.EventType

Event types for Annot

Annot.Flag

Flags specifying various characteristics of the annotation.

Annot.Type

Annotation types

BarcodeOptions.BarcodeOrientation

A set of flags used to specify the barcode orientation(s). Can be bitwise OR-ed to search for multiple orientations. Orientation only affects the following barcode types: e_linear, e_post_net_planet, e_four_state, e_gs1_databar_stacked, e_pdf417, e_micro_pdf417, e_patch_code and e_pharma_code.

BarcodeOptions.BarcodeProfile

An enumeration used to specify the barcode detection profile.

BarcodeOptions.BarcodeTypeGroup

A set of flags used to specify a subset of barcode types. Can be bitwise OR-ed to combine multiple groups.

BarcodeOptions.OutputFormat

An enumeration used to specify the format of the data output.

CADConvertOptions.LayoutSortOrder

Layout sorting options

ColorSpace.Type

Types of colorspace

Convert.ExcelOutputOptions.SearchableImageSetting

Convert.FlattenFlag

FlattenFlag

Convert.FlattenThresholdFlag

FlattenThresholdFlag

Convert.HTMLOutputOptions.ContentReflowSetting

Convert.HTMLOutputOptions.SearchableImageSetting

Convert.OutputOptionsOCR.LanguageChoice

Convert.OutputOptionsOCR.PreferredOCREngine

Convert.PowerPointOutputOptions.SearchableImageSetting

Convert.Printer.Mode

Convert.StructuredOutputOptions.SectionConversionSetting

Convert.WordOutputOptions.SearchableImageSetting

Convert.WordOutputOptions.WordOutputFormat

Convert.XODOutputOptions.AnnotationOutputFlag

DataExtractionModule.DataExtractionEngine

Data Extraction Engines

Destination.FitType

View Destination Fit Types

DigitalSignatureField.DocumentPermissions

DigitalSignatureField.FieldPermissions

DigitalSignatureField.SubFilterType

DisallowedChange.Type

DocumentConversionResult

DownloadedType

DownloadedType lists the events triggered by calling OpenURLAsync.

Element.Type

Element types

ElementWriter.WriteMode

Enumeration describing the placement of the element written to a page.

Field.EventType

Event types for field.

Field.Flag

Flags specifying various characteristics of the fields.

Field.TextJustification

form of quadding (justification) to be used in displaying the text fields.

Field.Type

interactive form field type

Flattener.FlattenMode

Flattener.Threshold

Font.Encoding

Font.StandardType1Font

Font.Type

Font types

Function.Type

functions types

GState.BlendMode

The standard separable blend modes available in PDF.

GState.GStateAttribute

GState properties

GState.LineCap

LineCap types

GState.LineJoin

LineJoin types

GState.RenderingIntent

GState.TextRenderingMode

Text Rendering modes

GeometryCollectionSnappingMode

HTML2PDF.Proxy.Type

Set the type of proxy to use.

If e_default, use whatever the html2pdf library decides on. If e_none, explicitly sets that no proxy is to be used. If e_http or e_socks5 then the corresponding proxy protocol is used.

HTML2PDF.WebPageSettings.ErrorHandling

How to handle objects that failed to load.

Image.InputFilter

InputFilter types

OfficeToPDFOptions.DisplayComments

Word document comment options

OfficeToPDFOptions.StructureTagLevel

Level of detail for structure tags.

Optimizer.ImageSettings.CompressionMode

Different Compression Modes for color and grayscale images.

Optimizer.ImageSettings.DownsampleMode

Different Downsample Modes for color and grayscale images.

Optimizer.MonoImageSettings.CompressionMode

mono-image compression mode

Optimizer.MonoImageSettings.DownsampleMode

mono-image downsample mode

PDFDoc.EventType

Event types for PDFDoc

PDFDoc.ExtractFlag

PDFDoc.InsertFlag

PDFDoc.SignaturesVerificationStatus

PDFDocViewPrefs.PageLayout

PageLayout specifies the page layout to be used when the document is opened

PDFDocViewPrefs.PageMode

PageMode specifies how the document should be displayed when opened

PDFDocViewPrefs.ViewerPref

ViewerPref enumeration specifies how various GUI elements should behave when the document is opened.

PDFNetInternalToolsLogBackend

PDFNetInternalToolsLogLevel

PDFRasterizer.ColorPostProcessMode

ColorPostProcessMode is used to modify colors after rendering.

PDFRasterizer.OverprintPreviewMode

Determines if overprint is used.

PDFRasterizer.Type

PDFNet includes two separate rasterizer implementations utilizing different graphics libraries. The default rasterizer is 'e_BuiltIn' which is a high-quality, anti-aliased and platform independent rasterizer. This rasterizer is available on all supported platforms. On Windows platforms, PDFNet also includes GDI+ based rasterizer. (deprecated and will be removed in a future version of PDFNet) This rasterizer is included mainly to provide vector output for printing, for EMF/WMF export, etc. For plain image rasterization we recommend using the built-in rasterizer.

PDFViewCtrl.PDFViewCtrlWindowType

PDFViewCtrl.PagePresentationMode

PDFViewCtrlPagePresentationMode lists common modes of presenting PDF pages.

PDFViewCtrl.PageViewMode

PageViewMode lists common modes of viewing PDF pages.

PDFViewCtrl.PanelType

PDFViewCtrl.TextSelectionMode

TextSelectionMode lists different text selection modes that can be used to highlight text.

PDFViewCtrl.ToolMode

PDFViewCtrl class supports a number of 'built-in' tool modes. ToolMode enumerates tool modes supported by PDFViewCtrl.

PDFViewWPF.OverprintPreviewMode

PDFViewWPF.PDFViewWPFConversionType

ConversionType lists the events triggered by calling OpenUniversalDocument

PDFViewWPF.PagePresentationMode

PagePresentationMode lists common modes of presenting PDF pages.

PDFViewWPF.PageViewMode

PageViewMode lists common modes of viewing PDF pages.

PDFViewWPF.TextSelectionMode

TextSelectionMode lists different text selection modes that can be used to highlight text.

Page.Box

PDF page can define as many as five separate boundaries to control various aspects of the imaging process (for more details please refer to Section 10.10.1 'Page Boundaries' in PDF Reference Manual).

Page.EventType

Event types for Page

Page.Rotate

specify page rotations in degrees

PageLabel.Style

The numbering style to be used for the numeric portion of page label.

PageSet.Filter

PageSet filters

PathData.PathSegmentType

Enumaration used to indicate operator type.

PatternColor.TilingType

PatternColor.Type

PrinterMode.DuplexMode

Enumerated values for specifying how the printed pages are flipped when duplexing

PrinterMode.NUp

Enumerated values for specifying the layout of multiple document pages onto output pages

PrinterMode.NUpPageOrder

Enumerated values for specifying the ordering of document pages onto output pages

PrinterMode.Orientation

Enumerated values for specifying the orientation of output pages

PrinterMode.OutputColor

Enumerated values for specifying the color mode for printing

PrinterMode.OutputQuality

Enumerated values for specifying the quality of the printing

PrinterMode.PaperSize

Paper sizes.

PrinterMode.PrintContentTypes

Enumerated values for specifying the document content to print

PrinterMode.ScaleType

Enumerated values for specifying the scaling of document pages

Shading.Type

ShapedText.FailureReason

ShapedText.ShapingStatus

Stamper.HorizontalAlignment

Stamper.SizeType

Size Types

Stamper.TextAlignment

Stamper.VerticalAlignment

TemplateDocumentResult

TextExtractor.ProcessingFlags

Processing options that can be passed in Begin() method to direct the flow of content recognition algorithms.

TextExtractor.XMLOutputFlags

Flags controlling the structure of XML output in a call to GetAsXML().

TextSearch.ResultCode

The code indicating the reason when a search returns.

TextSearch.SearchMode

Search modes that control how searching is conducted.

VerificationOptions.CertificateTrustFlag

An enumeration representing the level of trust associated with a particular certificate. Multiple flag values can be combined using bitwise operators.

VerificationOptions.SignatureVerificationSecurityLevel

An enumeration representing the level of security to use when verifying digital signatures.

VerificationOptions.TimeMode

An enumeration representing the least-secure type of reference-time to use when verifying digital signatures. One can choose the time of signing (not very secure), timestamp time (more secure), current time (most secure, lower verification rate). Note: this is orthogonal to the expiry verification mode (shell/chain/hybrid).

VerificationResult.DigestStatus

VerificationResult.DocumentStatus

VerificationResult.ModificationPermissionsStatus

VerificationResult.TrustStatus

Delegates

Convert.WPFConverterOptions.DrawHeaderFooter

Delegates for drawing headers and footers.

PDFViewAnnotationEditPermissionDelegate

PDFViewCtrl.PDFViewFindTextAsyncDelegate

A delegate that is called once FindTextAsync is completed.

PDFViewCtrl.PDFViewRenderWorkerDelegate

PDFViewCurrentPageDelegate

A prototype for a delegate that will be called whenever current page number changes.

PDFViewDownloadDelegate

Download event handling. A delegate that is called during download events triggered by calling OpenURLAsync.

PDFViewErrorDelegate

Sets the error handling function to be called in case an error is encountered during page rendering.

PDFViewThumbAsyncDelegate

A delegate that will be called after GetThumbAsync retrieves a thumbnail from the on-disk thumbnail cache.

PDFViewWPF.CurrentPageNumberChangedHandler

Delegate for when the current page changes

PDFViewWPF.CurrentScrollChangedHandler

Routed event, raised when scroll is changed.

PDFViewWPF.CurrentZoomChangedHandler

Delegate for when the current zoom level changes

PDFViewWPF.FindTextFinsihedHandler

Delegate for when a text search is finished

PDFViewWPF.LayoutChangedHandler

Delegate for when the layout changes This event is used to notify any subscriber that the layout has changes, so that they can adjust their content on the screen to line up with the document. Use this in conjunction with CurrentZoomChanged to cover all cases where the PDFViewWPF might change appearance.

PDFViewWPF.OnActionEventHandler

PDFViewWPF.OnConversionEventHandler

Delegate for when the PDFViewWPF is processing a conversion

PDFViewWPF.OnRenderFinishedEventHandler

Delegate for when the PDFViewWPF has finished rendering a region of the PDF Document

PDFViewWPF.OnSetdocHandler

Delegate for when a doc is set. Once this even is raised, the doc should be ready for zooming and scrolling

PDFViewWPF.OnThumbnailGeneratedEventHandler

Delegate for when PDFViewWPF has finished rendering a requested thumbnail.

In This Article
  • Classes
  • Structs
  • Enums
  • Delegates
Back to top Generated by DocFX