open-uri, Easy-to-Use and Extensible Virtual File System
Posted by kev Fri, 14 Oct 2005 21:55:00 GMT
Presented by Tanaka Akira akr at m17n dot org
This one was really really fast.. here’s what I got… – Kev
Table of Contents
- Who am I?
- How to user open-uri
- Why open-uri?
- open-uri and net/http
- How to design easy-to-use api
Who am I
Who am I (1)
The author of open-uri and several standard libraries:
open-uri.rb, pathname.rb, time.rb, pp.rb, prettyprint.rb, resolv.rv, resolv-replace.rb, tsort.rb
Who am I (2)
Contribution for various classes and methods
- IO without stdio
- IO#read and readpartial
- Time Time.utc, Time@utc_offset
- allocate marsha1dump marsha1load
- Regexp#top_s Regexp.union
- Process.daemon
- fork kills all other threads
Who am I (3)
I report many bugs, over 100/year
- core dump
- test failure
- build problem
- mismatch between doc. and imp.
- etc
Who am I (4)
I wrote several non-standard libraries.
- htree
- webapp
- amarshal
How to Use open-uri
Simple Usage
require open-uri
open(“http://www.ruby-lang.org”) {
|f|
print f.read
}
Similar to open files
Why Open-uri
- Easy to use api
- VFS: not only http
open-uri and net/http
net/http has too many ways
- Net::HTTP.get_print
- Net::HTTP.ge
- Net::HTTP.start {|h| h.get} etc
confuses users
open-uri has Fewer ways
open(uri) {|f| } uri.open {|f| } uri.read
Save user’s memory reuse users knowledge
net/http: get and print
Net::HTTP.get_print( URI(“http://host”)) print Net::HTTP.get(URI(“http://host”))
open-uri: get and print
open(“http://host”) {|f| print f.read }
print URI(“http://host”)…
get and print
net/http
- Net::HTTP.get_print print only
- Net::HTTP.get: good
open uri …
Why Easy?
open(“http://host)
- No new construct
- Users don’t need to learn
open-uri respects user knowledge
net/http: headers
Net::HTTP.start(“host”) {|h| r = h.get …. }
- No URI anymore
- No Net::HTTP.get anymore
- Net::HTTP.start, net.. and body used instead
open-uri headers
- Still URI
- still open method
- Easy to use
net/http: SSL
- Different library: net/https
- Net:HTTP.new nad Net:HTTP.start
- Different port
- Server verification…
open-uri: SSL
- Still URI
- Still open method
- Server verification by default
- No new library
- No new methods, few things to learn
net/http: proxy
- New method: Net::HTTP::Proxy
open-uri: proxy
% http_proxy = http://blah % export http_proxy
- Conventional environemtn variable supported
- No new methods. An user might know this already
- Fewer things to learn
net/http: basic auth
- New class: Net::HTTP::Get
- New method: Net::HTTP#request
open-uri: basic auth
- Still URI
- Still open method
- New option: :httpbasicauthentication
- No new methods, few things to learn
How to design Easy-to-Use API
- Save brain power
- Evolve gradually
Fewer Things to Learn
- Fewer constructs for pragmatic usages
- Huffman coding
- DRY
- No configuration is good ocnfiguration
- Reuse user knowledge
- Infrastructure friendly
Fewer Constructs for Pragmatic Usages
*Open vs Net::HTTP.get, Net::HTTP#get etc
- This is not minimalism
- The target of “fewer” is not all constructs
Pragmatic usages should be supported by small constructs.
Fewer Constructs(2)
Diagram.. link later hopefully Freqently use convenience methods, rarely use many primatives
Ex. nethttp and open-uri
Methods frequently used: net/http: Net::HTTP.start, Net::HTTP#get open-uri open
open-uri’s fewer constructs supports many more features
Huffman Coding
- Shorter for freqeuent things
- Longer for rare things
Optimize for frequent things.
Ex: p
So longer methods for rarely used things, shorter methods for frequently used things
Ex p
p obj
- Very frequently used
- Bad name in common sense
- Almost no problem because everyone knows
Ex. pp and y
- Bad name in common sense
- Problematic than
pbecause not everyone knows
Ex. tos and tostr
- to_s shorter. frequently used.
- to_str longer, used internally
Ex. def
- def shorter, frequently used
- define_method longer. not encouraged
Ex time.rb
- Time.parse frequently used
- Time.strptime generic, needs to learn the format.
- Time.parse is less flexible but enough for most cases, and easy to learn
Candidates for Huffman Coding
- Method name
- Other name
- Convenience method
- Language syntax
- etc
Length of Huffman Coding
- Number of characters
- Number of nodes in AST
- Editor keystrokes
- etc
Encourage Good Style
- Programmers like short code
- Short code should be designed as good style
DRY – Don’t repeat yourself
Violations are common
No Configuration is Good Configuration
Things should work well out of box.
- SSL CA certs
- http_proxy environment variable
Bad Examples
- ext/iconv/config.charset
- soapuseproxy
- require “irb/completion”
- RUBYOPT=rubygems
Reuse User Knowledge
oepn-uri reuse user knowledge
- open is used to access an external resource
Reusable Knowledge
- Ruby builtin (popular) metho
- consistency
- Unix
- Standards: POSIX, RFC, etc
- Metaphor
Consitency
- bang methods
- eachwithindex
- etc
Consistency violation:
- Time#utc is destructive
Metaphor
- HTTP is a kind of network file system
- oepn-uri doesn’t support beyond file system: POST, etc
Infrastructure Friendly
- emacs, vi
- line oriented tools
- shell and file system
- web browser
Prefer
“It is easy using the legacy tool XXX” over “It is easy using the new tool YYY”
Evolve Gradually
- Adaptive Huffman coding
- How to find bad API
- How to avoid incompatability
Adaptive Huffman Coding
What methods are used frequently?
- Long method name at first
- Alias to short name later
- Define convenience methods for idioms
Adaptive Huffman Coding(2)
- Short names and operators should be used carefully
- Use a long name if hesitate
- Alias is not a bad thing (TMTOWTDI)
- Primitives should have long names
- Define new method for idiom
Operators
- CGI#[] and CGI#params
- CGI was defined unsuitably.
- Hash #[]
- primitive: Hash#fetch
How to find bad api
- Repeated surprise
- Often cannot remember
Repeated Surprise
Example
- Time#utc is destructive
- Iconv.iconv returns an array
Often Cannnot Remember
Manual is required again and again for same issue.
- RubyUnit
- optparse
Idiom
- Repeated code
- Violate DRY
- An idiom may be good or bad
Bad idiom example
- Iconv.iconv()[0]
How to Avoid Incompatibility
Extension without Incompatibility:
- New method
- New keyword argument
- con contants
Introducing new names has no compatability problem (in most case)
Incompatible Change is a Bad Thing
But fixing bad API…
Incompatible Change
API Migration Example
- net/http: API version
- cgi: special implementation for transition period
net/http API version
Net::HTTP has two APIs
- Ruby 1.6 API version 1.1
- Ruby 1.8 API version 1.2
net.http: switch API version
- It tens to forget restore API version
- Global switch, not thread save
cgi: special implementation for a transition period
CGI#[] returns
- Ruby 1.6 an array of params
- Ruby 1.8: Transition period
- future? : a first parameter or nil
CGI#[] returns something tweaked on 1.8
Try to work as both Array and String
- Ruby 1.8.0 subclass of String
- Ruby 1.8.1 subclass of DelegateClass(String)
- Ruby 1.8.2 …
fork: Warning after change
Does fork kill other threads in child process?####
- Ruby 1.6: No
- Ruby 1.8: Yes
fork: warning after change
- Ruby 1.6: No warning
- Ruby 1.8.0: No warning
- Ruby: 1.8.1: warning: fork terminates thread
- Ruby: 1.8.2: No warning
IO#read: warning before change
IO#read will block even if O_NONBLOCK is set
- Ruby 1.8: doesn’t block
Ruby 1.9: does block
Ruby 1.8.2: No warning
- Ruby 1.8.3: Warning
- Ruby 1.9 : No warning
Easy-to-Use vs Security
Easy to Use vs Sec
- HTTP_PROXY
- http://user:pass@host/
- redirection and taint
- File.open(uri)
VFS: Virtual File System
Why VFS?
Typical simple program
- Load na external resource
- Process the resource
- Store the result VFS ease the first step.
What is VFS
VFS provides
- open a http/ftp resource
- read a resource … …
VFS and polymorphism
The polymorphism can be implemented by
- usual method dispatch calls ….
Polymorphic open
If open-uri is in effect *open(“http://…”) calls URI(“http://…”).open
- same for ftp etc
Any URI can be opened if the URI has open method
Other Resources
LDAP
Other Operations
- URI().read
- Other operations should be defined for polymorphic to Pathname future
Sec Considerations
- open(“|…”)
- File.open is not affected
Summary
- How to design Easy-To_Use API
- Save brain power
- Evolve gradually
- VFS by open-uri
Q/A
Some guy writing a book: Should I teach Array.push or Array< Experts are going to use a condensed short form, but they'll need to use it. Do you have advice for people writing apis to write code so its easier to read.
Response: I think the api should lean towards teaching.
DHH: Are you going to do what you’re reading for writing?
Response: POST should be supproted in the future, but write.. eh.. not as useful.
AC: Warning would be more useful -not- at runtime.. Response: Inaudible

