TODO

Path: TODO
Last Update: Wed Aug 18 17:05:04 -0600 2010

v0.8

  • optimise PDF::Reader::Reference#from_buffer
    • ruby-prof shows the match() call in this function is a real killer
  • add extra callbacks
    • list implemented features
      • encrypted? tagged? bookmarks? annotated? optimised?
  • Allow more than just page content and metadata to be parsed (see spec section 3.6.1)
    • bookmarks?
    • outline?
    • articles?
    • viewer prefs?
  • Don‘t remove comment when tokenising in the middle of a string
  • Tweak encoding mappings to differentiate between bytes that are invalid for an encoding, and bytes that are unchanged. poppler seems to do this in a quite reasonable way. Original Encoding -> Glyph Names -> Unicode. As of 0.6 we go straight from the Original encoding to Unicode.
  • detect when a font‘s encoding is a CMap (generally used for pre-Unicode, multibyte asian encodings), and display a user friendly error
  • Improve interpretation of non content stream data (ie metadata). recognise dates, etc
  • Support Cross Reference Streams (spec 3.4.7)
  • Fix inheritance of page attributes. Resources has been done, but plenty of other attributes are inheritable. See table 3.2.7 in the spec

v0.9

  • Add a way to extract raster images
    • see XObjects section of spec (section 4.7)
  • Add a way to extract font data?

Sometime

  • Support for CJK text (convert to UTF-8 like all other encodings. See Section 5.9 of the PDF spec)
    • Will require significantly improved handling of CMaps, including creating a bunch of predefined ones
  • Work out why specs/data/zlib*.pdf isn‘t parsed correctly when all the major PDF viewers can display it correctly
  • Ship some extra receivers in the standard package, particuarly ones that are useful for running rspec over generated PDF files
  • When we encounter Identity-H encoded text with no ToUnicode CMap, render the glyphs and treat them as images, as there‘s no sensible way to convert them to unicode
  • Add support for additional filters: ASCIIHexDecode, ASCII85Decode, LZWDecode, RunLengthDecode, CCITTFaxDecode, JBIG2Decode, DCTDecode, JPXDecode, Crypt?
  • Add support for additional encodings:
    • PDFDocEncoding
    • Identity-V(I think this relates to vertical text. Not sure how we‘d support it sensibly)
  • Investigate how R->L text is handled
  • Add support for object streams (spec section 3.4.6)

[Validate]