open-uri, Easy-to-Use and Extensible Virtual File System

Posted by kev Fri, 14 Oct 2005 21:55:00 GMT

Presented by Tanaka Akira akr at m17n dot org

This one was really really fast.. here’s what I got… – Kev

Table of Contents

  • Who am I?
  • How to user open-uri
  • Why open-uri?
  • open-uri and net/http
  • How to design easy-to-use api

Who am I

Who am I (1)

The author of open-uri and several standard libraries:

open-uri.rb, pathname.rb, time.rb, pp.rb, prettyprint.rb, resolv.rv, resolv-replace.rb, tsort.rb

Who am I (2)

Contribution for various classes and methods

  • IO without stdio
  • IO#read and readpartial
  • Time Time.utc, Time@utc_offset
  • allocate marsha1dump marsha1load
  • Regexp#top_s Regexp.union
  • Process.daemon
  • fork kills all other threads

Who am I (3)

I report many bugs, over 100/year

  • core dump
  • test failure
  • build problem
  • mismatch between doc. and imp.
  • etc

Who am I (4)

I wrote several non-standard libraries.

  • htree
  • webapp
  • amarshal

How to Use open-uri

Simple Usage

require open-uri open(“http://www.ruby-lang.org”) { |f| print f.read }

Similar to open files

Why Open-uri

  • Easy to use api
  • VFS: not only http

open-uri and net/http

net/http has too many ways

  • Net::HTTP.get_print
  • Net::HTTP.ge
  • Net::HTTP.start {|h| h.get} etc

confuses users

open-uri has Fewer ways

open(uri) {|f| } uri.open {|f| } uri.read

Save user’s memory reuse users knowledge

net/http: get and print

Net::HTTP.get_print( URI(“http://host”)) print Net::HTTP.get(URI(“http://host”))

open-uri: get and print

open(“http://host”) {|f| print f.read }

print URI(“http://host”)…

get and print

net/http

  • Net::HTTP.get_print print only
  • Net::HTTP.get: good

open uri …

Why Easy?

open(“http://host)

  • No new construct
  • Users don’t need to learn

open-uri respects user knowledge

net/http: headers

Net::HTTP.start(“host”) {|h| r = h.get …. }

  • No URI anymore
  • No Net::HTTP.get anymore
  • Net::HTTP.start, net.. and body used instead

open-uri headers

  • Still URI
  • still open method
  • Easy to use

net/http: SSL

  • Different library: net/https
  • Net:HTTP.new nad Net:HTTP.start
  • Different port
  • Server verification…

open-uri: SSL

  • Still URI
  • Still open method
  • Server verification by default
  • No new library
  • No new methods, few things to learn

net/http: proxy

  • New method: Net::HTTP::Proxy

open-uri: proxy

% http_proxy = http://blah % export http_proxy

  • Conventional environemtn variable supported
  • No new methods. An user might know this already
  • Fewer things to learn

net/http: basic auth

  • New class: Net::HTTP::Get
  • New method: Net::HTTP#request

open-uri: basic auth

  • Still URI
  • Still open method
  • New option: :httpbasicauthentication
  • No new methods, few things to learn

How to design Easy-to-Use API

  • Save brain power
  • Evolve gradually

Fewer Things to Learn

  • Fewer constructs for pragmatic usages
  • Huffman coding
  • DRY
  • No configuration is good ocnfiguration
  • Reuse user knowledge
  • Infrastructure friendly

Fewer Constructs for Pragmatic Usages

*Open vs Net::HTTP.get, Net::HTTP#get etc

  • This is not minimalism
  • The target of “fewer” is not all constructs

Pragmatic usages should be supported by small constructs.

Fewer Constructs(2)

Diagram.. link later hopefully Freqently use convenience methods, rarely use many primatives

Ex. nethttp and open-uri

Methods frequently used: net/http: Net::HTTP.start, Net::HTTP#get open-uri open

open-uri’s fewer constructs supports many more features

Huffman Coding

  • Shorter for freqeuent things
  • Longer for rare things

Optimize for frequent things.

Ex: p

So longer methods for rarely used things, shorter methods for frequently used things

Ex p

p obj

  • Very frequently used
  • Bad name in common sense
  • Almost no problem because everyone knows

Ex. pp and y

  • Bad name in common sense
  • Problematic than p because not everyone knows

Ex. tos and tostr

  • to_s shorter. frequently used.
  • to_str longer, used internally

Ex. def

  • def shorter, frequently used
  • define_method longer. not encouraged

Ex time.rb

  • Time.parse frequently used
  • Time.strptime generic, needs to learn the format.
  • Time.parse is less flexible but enough for most cases, and easy to learn

Candidates for Huffman Coding

  • Method name
  • Other name
  • Convenience method
  • Language syntax
  • etc

Length of Huffman Coding

  • Number of characters
  • Number of nodes in AST
  • Editor keystrokes
  • etc

Encourage Good Style

  • Programmers like short code
  • Short code should be designed as good style

DRY – Don’t repeat yourself

Violations are common

No Configuration is Good Configuration

Things should work well out of box.

  • SSL CA certs
  • http_proxy environment variable

Bad Examples

  • ext/iconv/config.charset
  • soapuseproxy
  • require “irb/completion”
  • RUBYOPT=rubygems

Reuse User Knowledge

oepn-uri reuse user knowledge

  • open is used to access an external resource

Reusable Knowledge

  • Ruby builtin (popular) metho
  • consistency
  • Unix
  • Standards: POSIX, RFC, etc
  • Metaphor

Consitency

  • bang methods
  • eachwithindex
  • etc

Consistency violation:

  • Time#utc is destructive

Metaphor

  • HTTP is a kind of network file system
  • oepn-uri doesn’t support beyond file system: POST, etc

Infrastructure Friendly

  • emacs, vi
  • line oriented tools
  • shell and file system
  • web browser

Prefer

“It is easy using the legacy tool XXX” over “It is easy using the new tool YYY”

Evolve Gradually

  • Adaptive Huffman coding
  • How to find bad API
  • How to avoid incompatability

Adaptive Huffman Coding

What methods are used frequently?

  • Long method name at first
  • Alias to short name later
  • Define convenience methods for idioms

Adaptive Huffman Coding(2)

  • Short names and operators should be used carefully
  • Use a long name if hesitate
  • Alias is not a bad thing (TMTOWTDI)
  • Primitives should have long names
  • Define new method for idiom

Operators

  • CGI#[] and CGI#params
    • CGI was defined unsuitably.
  • Hash #[]
    • primitive: Hash#fetch

How to find bad api

  • Repeated surprise
  • Often cannot remember

Repeated Surprise

Example

  • Time#utc is destructive
  • Iconv.iconv returns an array

Often Cannnot Remember

Manual is required again and again for same issue.

  • RubyUnit
  • optparse

Idiom

  • Repeated code
  • Violate DRY
  • An idiom may be good or bad

Bad idiom example

  • Iconv.iconv()[0]

How to Avoid Incompatibility

Extension without Incompatibility:

  • New method
  • New keyword argument
  • con contants

Introducing new names has no compatability problem (in most case)

Incompatible Change is a Bad Thing

But fixing bad API…

Incompatible Change

API Migration Example

  • net/http: API version
  • cgi: special implementation for transition period

net/http API version

Net::HTTP has two APIs

  • Ruby 1.6 API version 1.1
  • Ruby 1.8 API version 1.2

net.http: switch API version

  • It tens to forget restore API version
  • Global switch, not thread save

cgi: special implementation for a transition period

CGI#[] returns

  • Ruby 1.6 an array of params
  • Ruby 1.8: Transition period
  • future? : a first parameter or nil

CGI#[] returns something tweaked on 1.8

Try to work as both Array and String

  • Ruby 1.8.0 subclass of String
  • Ruby 1.8.1 subclass of DelegateClass(String)
  • Ruby 1.8.2 …

fork: Warning after change

Does fork kill other threads in child process?####

  • Ruby 1.6: No
  • Ruby 1.8: Yes

fork: warning after change

  • Ruby 1.6: No warning
  • Ruby 1.8.0: No warning
  • Ruby: 1.8.1: warning: fork terminates thread
  • Ruby: 1.8.2: No warning

IO#read: warning before change

IO#read will block even if O_NONBLOCK is set

  • Ruby 1.8: doesn’t block
  • Ruby 1.9: does block

  • Ruby 1.8.2: No warning

  • Ruby 1.8.3: Warning
  • Ruby 1.9 : No warning

Easy-to-Use vs Security

Easy to Use vs Sec

  • HTTP_PROXY
  • http://user:pass@host/
  • redirection and taint
  • File.open(uri)

VFS: Virtual File System

Why VFS?

Typical simple program

  • Load na external resource
  • Process the resource
  • Store the result VFS ease the first step.

What is VFS

VFS provides

  • open a http/ftp resource
  • read a resource … …

VFS and polymorphism

The polymorphism can be implemented by

  • usual method dispatch calls ….

Polymorphic open

If open-uri is in effect *open(“http://…”) calls URI(“http://…”).open

  • same for ftp etc

Any URI can be opened if the URI has open method

Other Resources

LDAP

Other Operations

  • URI().read
  • Other operations should be defined for polymorphic to Pathname future

Sec Considerations

  • open(“|…”)
  • File.open is not affected

Summary

  • How to design Easy-To_Use API
    • Save brain power
    • Evolve gradually
  • VFS by open-uri

Q/A

Some guy writing a book: Should I teach Array.push or Array<

Response: I think the api should lean towards teaching.

DHH: Are you going to do what you’re reading for writing?

Response: POST should be supproted in the future, but write.. eh.. not as useful.

AC: Warning would be more useful -not- at runtime.. Response: Inaudible

Posted in ,  | no comments | no trackbacks

Comments

Trackbacks

Use the following link to trackback from your own site:
http://glu.ttono.us/articles/trackback/14

Comments are disabled