When it comes to manipulating documents within your applications, the development options are endless. Organizations around the globe, regardless of the niche, regularly incorporate multitudes of innovative functionalities into each of their business scenarios for improving productivity. Extracting different types of information from multi-format documents is one such requisite. However, one primary concern is the accuracy or validity of the extracted data, not all software applications provide developers with highly accurate data extraction functionality.
Therefore, when looking for applications which could provide you with precise extraction of raw and formatted text as well as metadata from many different types of well-known file formats on .NET platform, GroupDocs.Parser for .NET must be considered. Apart from the basic data extraction features this document text extraction API does provide, app developers can use it for extracting text and metadata from various text and presentation templates with the
help of the latest API version. Another important feature is the ability to programmatically fetch tables from PDF documents within your .NET apps. And while working with this functionality, you can create table bounds manually or let the API identify the layout in automatic mode.
In addition to this, you have access to the features of detecting media type of your password-protected Office OpenXML documents and batch document processing –
http://bit.ly/2QuFPsr
Following code samples show how to extract text and metadata from templates:
// Extracting Text
void ExtractText(string fileName)
{
// Extract a text from the file
var text = Extractor.Default.ExtractText(fileName);
// Print an extracted text
Console.WriteLine(text);
}
// Extracting Metadata
void ExtractMetadata(string fileName)
{
// Extract metadata from the file
var metadata = Extractor.Default.ExtractMetadata(fileName);
// Print extracted metadata
foreach (var m in metadata)
{
// Print a metadata key
Console.Write(m.Key);
Console.Write(": ");
// Print a metadata value
Console.WriteLine(m.Value);
}
}
Below code sample shows how to detect media type in password-protected Office OpenXML documents:
// Create load options
LoadOptions loadOptions = new LoadOptions();
// Set a password
loadOptions.Password = "password";
// Get a default composite media type detector
var detector = CompositeMediaTypeDetector.Default;
// Create a stream to detect media type by content (not file extension)
using (var stream = File.OpenRead(Common.GetFilePath(fileName)))
{
// Detect a media type
var mediaType = detector.Detect(stream, loadOptions);
// Print a detected media type
Console.WriteLine(mediaType);
}
Top comments (0)