Python’s Yield

This week, I’ve been playing with implementing an HTTP client in Python. Why Python? It seemed like a straightforward language for this sort of thing, and the fact that I don’t know it at all is a bonus learning opportunity! In any case, I make no claim to be any kind of authority at all on the language.  I am certain there are better ways to do all of these things, but I don’t know them-yet!

After a bit of googling and reading and copying and pasting, I ended up with the following methods.

view plaincopy to clipboardprint?

  1. def getLine(s):  
  2.     line = ”  
  3.     for l in iter(lambda: s.recv(1), ‘\r’):  
  4.         if (l != ‘\n’):  
  5.             line += l  
  6.     return line  
  7.   
  8. def getHeader(s)  
  9.     for line in iter(lambda:getLine(s), ”):  
  10.         yield line  
  11.   
  12. def getContentLength(header):      
  13.     for line in header:  
  14.         if (re.match(“Content-Length: \d+$”, line)):  
  15.             return int(line[line.find(“: “)+2 : len(line)])  
  16.     return -1  
  17.   
  18. def fetchlines(s):  
  19.     header = getHeader(s)  
  20.     contentLength = getContentLength(header)  
  21.   
  22. # other methods to get message  


Assume fetchlines is the entry to the above methods. First, we are getting header lines up to the first empty line, then, we are pulling the content length from those lines.  In later code (not shown), I get the body of the message based on the content length. But I had a strange problem. Here’s the header I was receiving:

HTTP/1.1 200 OK
Date: Wed, 26 Jan 2011 06:36:39 GMT
Server: Apache/2.2.4 (Unix) mod_ssl/2.2.4 OpenSSL/0.9.8a DAV/2 PHP/5.2.1
Last-Modified: Tue, 14 Dec 2010 17:23:19 GMT
ETag: "1104fb7-1ea9-4976213a4cfc0"
Accept-Ranges: bytes
Content-Length: 7849
Vary: Accept-Encoding
Content-Type: text/html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
...


I was expecting to pop everything down through the Content-Type line into header, then get the content length value in contentLength, then, socket reading being a one-way operation, the message reading code would start right on in at the DOCTYPE line.  Instead, I kept having a strange problem:  every time I started in on reading the message body, I ended up at the Vary line instead!  I wasn’t sure why: I was specifically looking to read down to first blank line, and there was no blank line before Vary.  I looked for hidden \r characters; nothing.

After some reading, I discovered the problem: yield.  Though looking at a few examples, I’d been under the impression that yield was essentially a nice and terse  loop-returning-collection construct; i.e. it would run through, collect up all the results, return as a collection all by itself.  I did remember hearing something about yield being *weird* in C#, but I hadn’t actually ever used it, and couldn’t remember was the problem was.   Besides, this was Python, and it seemed to be working, except for this little issue. How magical.

Uh… no.

Turns out this a case where it may have sorta looked like a duck, but it was something else entirely…  Though I was treating the function’s return value like the list I believed it to be, getHeaders wasn’t returning a list.  It was returning an iterator to a generator function.  A generator function, when called through an iterator, will run through its body, return a value back  to the accessor of the iterator at the yield, save state, and just hang out until you call for the next iterator item, at which point, it will resume on the line after the yield.  So the loop in getHeaders wasn’t actually run when the method was initially called-instead, each run through that loop is done once for each run through getContentLength’s loop… and since getContentLenth exits once it gets the content length, it never pulls the rest of the items from the iterator, and so, the Vary line never gets pulled in getHeaders… the blank line condition is never even hit, and the Vary line is still waiting to get pulled when it was time to read the message body-not what we want!

A quick fix? take out the yield and use a regular list. It adds an extra line or two over the yield, but this way, I’m sure the full header gets read before pulling the message body.

view plaincopy to clipboardprint?

  1. def getHeader(s):  
  2.     header =[]  
  3.     for line in iter(lambda:getLine(s), ”):  
  4.         header.append(line)  
  5.         print line  
  6.     return header  


Lessons to take from this: 1) yield is NOT generating a list, it’s an iterator pointing to a generator, which is entirely  different, and on a broader note, 2) copying and pasting code that you don’t understand can result in behavior you don’t understand-or even worse, behavior that you only think you understand!


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *