Online Server Support

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Tuesday, 7 August 2007

Optimisation data for HTML5 parser implementors

Posted on 15:58 by Unknown
By Ian Hickson, editor of the HTML5 specification

Last month, just before I left on vacation, I posted three sets of data to help implementors of the HTML5 parser specification optimise their code. There are several implementations coming along, for example those that are part of the html5lib project and the one behind Validator.nu.

The three sets of data that I posted are all derived from parsing several billion documents from Google's Web search index using a parser I wrote in Sawzall.

The first set of data gives the relative aggregate distribution of invocations of the "in head", "in body", and "in table" insertion modes, for each of the insertion modes. This allows implementors to determine, for instance, that invoking the "in body" code while in a cell must be very efficient, while invoking the "in body" code from the "after frameset" code need not be as efficient, in case the implementor has a strategy that optimises one at the cost of another. See: documentation, data.

The second set of data gives the relative aggregate distribution of tokens for each phase/insertion mode pair. This can help implementors that are using a cascade of if statements decide on the right order for their statements. For instance, the most common token type seen in the "in body" insertion mode is character data, and the second most token is the start tag token for an a element, but the isindex start tag was almost never seen. This tells implementors that they should check for characters and a start tags long before checking for isindex tags. See: documentation, data.

The last set of data examines the number of attributes per element. It allows implementors to decide on the optimum memory allocation strategy for attributes. For example, since most elements have 9 or fewer attributes, the data structure that stores attributes can be optimised for simply having 9 attributes, using little memory, and if an element has more than this number of attributes, the implementation can switch to a separate implementation that is more memory-heaving but is optimised for large numbers of attributes. See: data.

I hope this data is useful!

Email ThisBlogThis!Share to XShare to Facebook
Posted in html, html5 | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Google Summer of Code & Danish Linux Forum
    Posted by Leslie Hawthorn, Open Source Team The Danish Linux Conference is celebrating its tenth anniversary this year, and the date is com...
  • Weekly Google Code Roundup for July 2-6th
    By Dion Almaer, Google Developer Programs Having the July 4th holiday smack in the middle of the week creates a strange week when it is hard...
  • Weekly Google Code Roundup for June 11-15th
    By Dion Almaer, Google Developer Programs In API and developer-product news... I will start by going meta. Linking to a roundup from a round...
  • Weekly Google Code Roundup for July 16-20th
    By Dion Almaer, Google Developer Programs This week we have the pleasure of having MashupCamp hosted walking distance from the Googleplex. I...
  • Weekly Google Code Roundup for July 23-27th
    By Dion Almaer, Google Developer Programs It has been a busy time for conferences. From MashupCamp last week, to OSCON and The Ajax Experien...
  • Google Gadget Ventures
    By Tom Stocky, Google Developer Programs Good news for Google Gadget developers. We've just launched Google Gadget Ventures , a new pil...
  • Weekly Google Code Roundup for July 8-12th
    By Dion Almaer, Google Developer Programs In API and developer-product news... Othman Laraki talked about the Gears roadmap and development ...
  • Google Developer Day sessions move to San Jose Convention Center
    Posted by Andrew Bowers, Google Developer Programs Thanks to the incredible interest in Google Developer Day, we've moved the session po...
  • Google Sitemaps Launches
    Today, Google launched Google Sitemaps , a new service designed for webmasters that enables them to automatically submit their web pages to ...
  • Google Developer Podcast Episode Four: Mark Limber on Google SketchUp
    By Dion Almaer, Google Developer Programs Using iTunes? We have published the fourth episode of the Google Developer Podcast, which feature...

Categories

  • 20% project
  • 3d
  • accessibility
  • advogato
  • ajax
  • ajax search
  • ajax search books news apis
  • amarok
  • android
  • apache
  • apis
  • apis. charts
  • apple
  • atom publishing protocol
  • axsjax
  • barcodes
  • blogger
  • building ajax apps
  • c++
  • caja
  • calendar
  • camino
  • chronoscope
  • cifs
  • cms
  • collada
  • community
  • conferences
  • cricket
  • cryptography
  • danish linux forum
  • developer
  • django
  • documentation
  • dojo
  • dot net
  • dreamweaver
  • drupal
  • eclipse
  • eclipsecon
  • education
  • email
  • events
  • feeds
  • firevox
  • fosdem
  • freebsd
  • freenet
  • gadgets
  • gcc
  • gdata
  • gdd07
  • geoserver
  • getpaid
  • ghop
  • gnome
  • gnome women's summer outreach program
  • Google
  • google apps for your domain
  • google chart api
  • google checkout
  • google code
  • google code project hosting
  • google code search
  • google data apis
  • google developer day
  • google earth
  • google gadgets
  • google gears
  • google grants
  • google mashup editor
  • google summer of code
  • google web toolkit
  • green linux
  • gsoc
  • gtags
  • guice
  • GWSOP
  • gwt
  • haproxy
  • hibernate
  • howto
  • hpux
  • html
  • html5
  • igoogle
  • image search
  • Imara
  • interviews
  • java
  • javascript
  • joomla
  • joomladayus2007
  • joomladayusa
  • karaoke
  • KDE
  • KDE 4.0
  • kernel
  • kernel summit
  • kml
  • linux
  • linux foundation
  • linux summit
  • linux virtual server
  • linuxconf eu
  • LoCo
  • london
  • mac
  • MacFuse
  • maps
  • meetup
  • MIT CSAIL
  • mobile
  • mylar
  • MySQL
  • mythtv
  • named
  • netbsd
  • nss
  • objective-c
  • OCaml
  • ocr
  • ODF
  • oha
  • OOXML
  • open source
  • openajax alliance
  • opensocial
  • openssl
  • oreilly
  • oscon
  • oscon2007
  • oss devs
  • ossjam
  • osx
  • pactester
  • phone
  • picasa
  • picasa web
  • plone
  • plone sprint
  • podcast
  • portugal
  • programming
  • py3k
  • python
  • python sprint
  • reader
  • research
  • samba
  • scalability
  • screencast
  • security
  • shindig
  • silverstripe
  • sitemaps
  • sixapart
  • sketchup
  • soc
  • solaris
  • spa2007
  • speakers
  • standards
  • student programs
  • subversion
  • summer of code
  • syndication
  • testing
  • themes
  • topp
  • ubucon
  • ubuntu
  • unit test
  • unix
  • video
  • Vim
  • weekly roundup
  • windows
  • windows programming
  • Winter of Code
  • youtube
  • zurich
  • ZXing

Blog Archive

  • ►  2008 (7)
    • ►  January (7)
  • ▼  2007 (159)
    • ►  December (8)
    • ►  November (13)
    • ►  October (16)
    • ►  September (11)
    • ▼  August (16)
      • Weekly Google Code Roundup: New Gears, GWT out of ...
      • Updates from the Latest Python Sprint
      • Ganeti: Open source virtual server management soft...
      • Google Web Toolkit out of beta as of 1.4 release
      • YouTube: Now with GData Goodness
      • Weekly Google Code Roundup: Reaching the Sky and W...
      • Introducing Calendar Gadgets
      • Google Funds COLLADA Support for Mac and Linux
      • Google Developer Podcast Episode Seven: Mashups in...
      • Dreamweaver Tools for Google
      • Plant a Seed, Watch It Grow: Improvements to GeoSe...
      • Open Source Developers @ Google Speaker Series: Mi...
      • Weekly Google Code Roundup for August 10th
      • Optimisation data for HTML5 parser implementors
      • Weekly Google Code Roundup for July 30th to August...
      • Google Developer Podcast Episode Six: The Hibernat...
    • ►  July (11)
    • ►  June (14)
    • ►  May (13)
    • ►  April (12)
    • ►  March (19)
    • ►  February (14)
    • ►  January (12)
  • ►  2006 (98)
    • ►  December (10)
    • ►  November (14)
    • ►  October (13)
    • ►  September (11)
    • ►  August (14)
    • ►  July (9)
    • ►  June (5)
    • ►  May (5)
    • ►  April (6)
    • ►  March (4)
    • ►  February (2)
    • ►  January (5)
  • ►  2005 (40)
    • ►  December (4)
    • ►  November (1)
    • ►  October (3)
    • ►  September (2)
    • ►  August (5)
    • ►  July (3)
    • ►  June (11)
    • ►  May (2)
    • ►  April (4)
    • ►  March (5)
Powered by Blogger.

About Me

Unknown
View my complete profile