I’m Steve Atkins
an email developer
from Palo Alto

about me

I’m Welsh-Californian.
I code. I cook. I make.
I make the Internet a better place.

recent projects

emailstuff
Email standards tools
DkimCore
Simplified email authentication
Status updating…

Beef and Broccoli Stir-fry

- - posted in recipes

This is an excellent stir-fry, but does take a little prep.

Freeze 10-12 oz of sirloin for 45 minutes or so, to make it easier to slice. Cut it into ~1/8 inch strips.

Mix the white of one egg, 1 tbsp of corn starch, 1 tsp of baking soda and a pinch of salt. Toss the beef in it, making sure each piece is coated.

Let chemistry happen with the beef for 15-20 minutes. Meanwhile, prep some vegetables. A clove or two of garlic and about the same amount of ginger, cut fine. Half a yellow onion, cut into thin half-rounds. Several scallions – I used 2, but 4 or 5 might be better, cut into 2 inch lengths. Some red bell pepper, cut into strips the same size as everything else. And for some heat, half a dozen dried arbol chillis left mostly whole but with most of the seeds removed. And some broccoli florets, cut fairly small, about two or three cups.

For a sauce, mix 3 tbsp oyster sauce, a tbsp of mirin, a tbsp of rice wine vinegar and a tbsp of sesame oil.

In a wok with a cup and a half of oil deep fry the beef, probably in two batches of two or three minutes each. Use tongs or chopsticks to make sure the beef is separated. Remove to a plate.

Drain off and discard most of the oil. Add the ginger and garlic, stir for a few seconds. Add the chillis, shallots and onion. Let fry for a minute or two. Add the broccoli and stir-fry for a couple of minutes. Add the pepper and fry for another couple of minutes. Add the beef and stir in the sauce then let everything get comfortable for a couple of minutes.

Serve over rice.

Fixing OS X VMWare Client Integration 6.0

- -

Dyld Error Message:
  Library not loaded: /build/toolchain/mac32/openssl-1.0.1m/lib/libcrypto.1.0.1.dylib
  Referenced from: /Applications/VMware Client Integration Plug-in.app/Contents/Library/lib/libssl.1.0.1.dylib
  Reason: image not found


  for i in *.dylib ; do echo $i ;  otool -L $i | grep toolchain ; done


  sudo install_name_tool -change "/build/toolchain/mac32/openssl-1.0.1m/lib/libcrypto.1.0.1.dylib" "@loader_path/libcrypto.1.0.1.dylib" libssl.1.0.1.dylib

I Used to Be an Engineer

- -

Looking through archive.org I found an old web page of mine - it was archived in February 1999, but if I recall correctly was written three or four years before that.

And, hey, look! I used to be an engineer!

Adder Architectures

Disclaimer

HTML generated semi-automatically from an old email to the hotwired mailing list that was written very late at night. Don’t treat as gospel….

Notation

xi is bit i of the number x, in standard twos complement form.

  • · is logical and
  • + is logical or
  • ¤ is logical xor

Basics

We’re adding two n-bit numbers a & b, and a carry-in c0 to give an n-bit result z, and a carry-out cn

z = a plus b plus c0

The operands and result are of the form

a = { an-1, an-2, … a1, a0 }

and so on.

Logically addition is this:

For all 0 <= i < n
zi = ai ¤ bi ¤ ci
ci+1 = xi · yi + ci · (xi + yi) [1]

This is often reformulated in terms of propagate (p) and generate (g) signals:

gi = ai · bi [2]
pi = ai ¤ bi [3]

(The alternative pi = ai + bi is sometimes used)

(Often a kill signal (k) is used too: ki = not(ai + bi) )

Then the carry signals can be given as:

ci+1 = gi + ci · pi substitute [2] & [3] in [1] [4]

and

zi = pi ¤ ci

The expensive bit about addition is generating ci. Once all the ci are generated the zi can be generated in constant time (not a negligable time, but at least it’s constant with operand length).

Different adder architectures

Ripple Carry (RCA)

A ripple carry adder (RCA) implements [1] directly. You all understand ripple carry adders, so I’ll just say that they are cheap and slow.

The critical path in an RCA is from c0, a0 or b0 through all the full adders to zn-1 or cn.

Carry lookahead (CLA)

For a four bit adder we can expand [4] to give

c1 = g0 + c0·p0 [5]
c2 = g1 + g0·p1 + c0·p0·p1 [6]
c3 = g2 + g1·p2 + g0·p1p2 + c0·p0·p1·p2 [7]
c4 = g3 + g2·p3 + g1·p2·p3 + g0·p1·p2·p3 + c0·p0·p1·p2·p3 [8]

This is reasonable for 4 bits, but will get grotesquely large in terms of are and fan-out if taken much further.

One option is to divide the operands into groups of four bits, use carry-lookahead within each group and ripple the carries between groups.

Divide and Conquer

If we can improve on speed of rippling carries within each group, then surely we can improve on the speed of rippling carries between groups by a similar approach.

Consider Group0, consisting of (g0 - g3, p0 - p3)

Defining a `group generate’ g* and a `group propagate’ p*:

For Group0:

g*0 = g3 + g2·p3 + g1·p2·p3 + g0·p1·p2·p3
p*0 = p0·p1·p2·p3

and identically for all other groups.

Now

c4 = g80 + c0·p*0
c8 = g*1 + g*0·p*1 + c0·p*0·p*1
c12 = g*2 + g*1·p*2 + g*0·p*1·p*2 + c0·p*0·p*1·p*2
c16 = [deleted - it’s huge….]

These are identical to equations [5-8] used to generate carries within groups.

There’s no need to stop there.

The four-bit groups can be combined into sixteen-bit groups of groups.

Each can produce generate (g**) and propagate (p**) signals, combined as above to give c16, c32, c48 and c64.

This divide and conquer approach will eventually generate ci for the entire word width.

<This would be a lot clearer with diagrams>

For board level adders CLA is pretty good, ‘cos it can use standard MSI lookahead generator ICs for nearly everything.

It’s not bad in CMOS, but it’s not particularly good either. I’d use one of the following architectures in preference. The only exception might be in a cell-based design where an optimised lookahead cell is provided.

Carry Skip adder

“In VLSI technology the carry-skip adder is comparable in speed to the carry look-ahead technique (for commonly used word lengths) while it requires less chip area and consumes less power.”
– Computer Arithmetic Algorithms, Israel Koren

…and that’s why it’s the adder I’d use for a 32 bit system, and probably for a 64 bit system too.

Carry propagation can skip any stage for which pi = 1 (ie ai != bi). Several consecutive stages can be skipped if pi =1 for each stage.

A carry-skip adder is divided into groups of consecutive stages, with a simple ripple carry scheme in each group.

Each group generates a group propagate signal, p*i.

For Groupi, consisting of k stages j, j+1, … j+k-1

p*i = pj · pj+1 · … · pj+k-1

This is used to allow an incoming carry into the group to skip over all the stages in the group and generate a group carry out.

Groupi-Carryout = cj+k + p*i · Groupi-Carryin

(cj+k is the normal ripple carry out from the most significat stage in the group)

The critical path through a carry-skip adder is via ripple carry through, one of the groups, and via the skip carry chain through the remainder of the groups.

(Think about it. A carry coming out of a group via the ripple chain must have been generated within the group - if it was generated before the group, p*i would have been 1 and the stage would have been skipped. So the critical path will travel through only one group).

In a carry-skip adder the groups will not all be the same size. The optimal division of an n-bit adder into carry-skip groups depends on the characteristics of the target technology.

A 32 bit adder might have 10 groups of sizes {1, 2, 3, 4, 5, 6, 5, 3, 2, 1} for a typical technology.

VLSI implementations of carry-skip adders tend to be quite small - using the 32 bit adder given above the extra cost over an RCA is about 20 extra gates.

While a single level of carry-skip speeds things up a lot a second level of skip, skipping over more than one group can speed things up a little further for very little extra cost.

Carry Select adder

The reason carry propagation is slow in a ripple adder is because each stage needs to have ai, bi and ci available before it can calculate ci+i. One way of removing this dependency on ci is to calculate both ai + bi + 0, and ai + bi + 1, then choose the appropriate result when ci becomes available.

This is the basic trick of the carry select adder.

For 32bit operands the 32 stage adder could consist of four 8 bit groups.

Group0 is purely an 8bit ripple carry adder, with c0 as it’s carry input. The sum output from this RCA goes directly to the adder output.

Group1 has two 8bit ripple cary adders, one with a Cin of 0, the other with a Cin of 1. The sum outputs from these two RCAs go to a 2:1 multiplexor controlled by the Cout of Group0. The output of the mux goes to the adder output.

Group2 has two 8bit RCAs, the same as group1. The sum outputs of this go to two 2:1 muxes. One is controlled by the Cout of the Group1 chain with a Cin of 0, so the mux output is what the sum would be if the Cout of Group0 were 0. The other is controlled by the Cout of the Group1 chain with a Cin of 1, giving the sum if the Cout of Group1 were 1.

These two mux outputs are fed to another 2:1 mux, controlled by the Cout of Group0, to select the correct sum to send to the adder output.

Group3 is similar. The sums from the two RCAs go through two 2:1 muxes controled by the two Couts of the two RCAs in Group2, giving two values, one if the Cout of Group1 is 1, one if it is 0.

These two values are passed to another pair of 2:1 muxes, controlled by the two Couts of Group1, giving two values, one if the Cout of Group0 is 1, the other if it is 0.

The correct one of these two values is selected by a final 2:1 controlled by the Cout of Group0.

<This is far too complicated for ASCII art at this time of night. If there’s any interest I might HTMLify it>

Carry select adders tend to be slightly faster than skip adders, particularly for wide operands. They will typically consume a little over twice the area of a ripple carry adder. The layout for a select adder tends to be very regular, which can be a big advantage for datapath compilers/tilers.

One of the DEC Alphas uses a slight variant on this approach. There’s a paper in Vol 27, No 11, November 1992 of the IEEE Journal of Solid-State Circuits giving a few paragraphs about the adder, and a nice diagram.

Prefix adders

These are bizarre to think about, but very powerful, particularly for longer word lengths.

It’s a little mathematical in places….

Define an operator ‘o’:

(g,p) o (g’,p’) = (g + (p · g’), p · p’)

Define Gi & Pi:

(G1,P1) = (g1,p1)
(Gi,Pi) = (gi,pi) o (Gi-1,Pi-1) 2 <= i <= n

Then

ci = Gi for 1 <= i <= n [1]

There’s a fairly easy, but not too interesting, inductive proof of [1].

Next, ‘o’ is associative, ie

(g1,p1) o ( (g2,p2) o (g3,p3) )
= ( (g1,p1) o (g2,p2) ) o (g3,p3) for all (gi,pi) [2]

Again I’ll miss out the proof - it’s just an ‘expand both sides and notice they are identical’.

So to find ci it suffices to calculate

(Gi,Pi) = (gi,p1) o (gi-1,pi-1) o ··· o (g1,p1)

and by [2] this can be calculated in any order.

The Brent-Kung architecture

(The original reference is Brent & Kung, IEEE Transactions on Computers, Vol C-31,No 3, March 1982)

First consider the simple problem of calculating just cn, for n=16. Since ‘o’ is associative (G16,P16) can be generated by a binary tree:

Each wire in the diagram carries a pair of signals (g,p). (gi,pi) are fed in at the base.

Each node ‘O’ performs this operation:

(gin,pin) o (g’in,p’in)
            |
            |
            O
            |\
            | \
            |  \
            |   \
(gin,pin) (g’in,p’in)

(G16,P16)
                 |
                 |
                 O
                 |\____
                 |     \______
                 |             \________
                 |                      \
                 O                       O
                 |\                      |\
                 | \__                   | \__
                 |    \__                |    \__
                 |       \__             |       \__
                 |          \            |          \
                 O           O           O           O
                 |\_         |\_         |\_         |\_
                 |  \_       |  \_       |  \_       |  \_
                 |    \      |    \      |    \      |    \
                 O     O     O     O     O     O     O     O
                 |\    |\    |\    |\    |\    |\    |\    |\  
                 | \   | \   | \   | \   | \   | \   | \   | \
                 |  \  |  \  |  \  |  \  |  \  |  \  |  \  |  \  
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0

(gi,pi)

This will give (G16,p16) and hence c16. To generate c15 downto c1 another similar tree is required:

ci

                16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                 |  O  |  O  |  O  |  O  |  O  |  O  |  O  |  |
                 |  |\ |  |\ |  |\ |  |\ |  |\ |  |\ |  |\ |  |
                 |  | \|  | \|  | \|  | \|  | \|  | \|  | \|  |
                 |  |  O  |  |  |  O  |  |  |  O  |  |  |  |  |
                 |  |  |\_|  |  |  |\_|  |  |  |\_|  |  |  |  |
                 |  |  |  \_ |  |  |  \_ |  |  |  \_ |  |  |  |
                 |  |  |  | \|  |  |  | \|  |  |  | \|  |  |  |
                 |  |  |  |  O  |  |  |  |  |  |  |  |  |  |  |
                 |  |  |  |  |\_|  |  |  |  |  |  |  |  |  |  |
                 |  |  |  |  |  \__|  |  |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  \__|  |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  \  |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  |\ |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  | \|  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                 O  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                 |\____|  |  |  |  |  |  |  |  |  |  |  |  |  |
                 |  |  \_______ |  |  |  |  |  |  |  |  |  |  |
                 |  |  |  |  | \________ |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  | \|  |  |  |  |  |  |  |
                 O  |  |  |  |  |  |  |  O  |  |  |  |  |  |  |
                 |\ |  |  |  |  |  |  |  |\ |  |  |  |  |  |  |
                 | \__ |  |  |  |  |  |  | \__ |  |  |  |  |  |
                 |  | \__ |  |  |  |  |  |  | \__ |  |  |  |  |
                 |  |  | \__ |  |  |  |  |  |  | \__ |  |  |  |
                 |  |  |  | \|  |  |  |  |  |  |  | \|  |  |  |
                 O  |  |  |  O  |  |  |  O  |  |  |  O  |  |  |
                 |\_|  |  |  |\_|  |  |  |\_|  |  |  |\_|  |  |
                 |  \_ |  |  |  \_ |  |  |  \_ |  |  |  \_ |  |
                 |  | \|  |  |  | \|  |  |  | \|  |  |  | \|  |
                 O  |  O  |  O  |  O  |  O  |  O  |  O  |  O  |
                 |\ |  |\ |  |\ |  |\ |  |\ |  |\ |  |\ |  |\ |
                 | \|  | \|  | \|  | \|  | \|  | \|  | \|  | \|
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
                15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0

(gi,pi)

There are many varying architectures for prefix adders, driven by speed/area/layout complexity tradeoffs.

Technical Book Giveaway

- - posted in books, giveaway

I have a bunch of technical books that I’m giving away or recycling in the next week or so.

They’re mostly at this goodreads page but there’s also an elderly hardcopy set of Linux man pages and some 7.0 era Adobe product manuals, in the unlikely case you’re interested in such a thing.

There’s quite a lot of software development books – XML, XSLT, Windows development, Javascript, CSS, Flash, Flex, Apollo, C++, WebObjects, Perl, Python, SOAP, Erlang, HTML, Mathematica, Linux Home Automation, MFC, tcl/Tk, Java, Qt, Winsock, ATL, ActiveX, ColdFusion, Visual J++, Access.

Some system administration books – Linux Network Administrators Guide, VOIP, system security, Windows Networking.

And some more academic texts – Data Mining (a current, excellent, book that’s a duplicate), Optoelectronics, Engineering Mathematics, Quantum Physics, Neural Networks, Calculus, Vector Analysis. Some of them are old (1950s) but still relevant (and the “revised” editions are still published as standard texts).

If you want anything drop me a line (steve@blighty.com) in the next week or so and I’ll put it to one side, if we’re likely to run into each other in the next couple of months. If you’re really excited by something I’d consider mailing it. Everything else is getting recycled.

Thai Sweet Chili Sauce

- - posted in recipe

  • Two jalapeños, halved and seeded
  • 4 large garlic cloves, peeled
  • ½ cup brown sugar
  • ¼ cup white vinegar
  • ¼ cup water
  • 2 tbsp dry sherry
  • 2 tbsp thai fish sauce
  • 1 tbsp corn starch
  • 2 tbsp water

Put everything but the last two ingredients in a blender and puree until the solids are all pretty small.

Bring the mixture to a boil in a small pan over medium-high heat. Reduce to a simmer and cook for about three minutes.

Make a cornstarch slurry, whisk it in to the hot mixture and simmer it for another minute.

That’s it. It makes a very nice sweet chili sauce. I used green jalapeños so it’s not the traditional red colour – I might try it with red jalapeños or serranos next time.

Mixing three or four parts of this sauce with one part of soy sauce makes a pretty good savoury sauce for pot stickers or chicken strips or spring rolls.

Malaysian-ish Bok Choi

- - posted in recipe

  • ½ large white onion, sliced
  • 1 – 1½ baby bok choi, sliced
  • 1 14oz can coconut milk
  • 2 cloves garlic, finely chopped
  • 1 tbsp sambal oelek or other chile paste

Bring the coconut milk to a simmer and stir in the garlic and chile paste, add the onion and the bok choi stems and simmer for 15 minutes. Add the bok choi leaves, season and simmer for another 2 minutes. Great with rice.

Accomplishment

- - posted in pebble, teatime

I’ve spent a couple of days playing with my new Pebble, and I’ve gone from “I vaguely remember C, I guess” to “I have a finished app, in the Pebble app store”.

It’s been very satisfying to complete something from start to finish, something that seldom happens in my professional life.

So what did I build? It’s an app that lets you select a type of tea from a list, tells you how much leaf tea you should use and what temperature of water, then runs a countdown timer so that you brew it for the right length of time. When your tea is ready, it vibrates.

Pebble-the-company are announcing something at CES tomorrow, probably their new 2.0 OS and it’s associated app store ecosystem – so I decided to sign up for an app store account and publish it. They’re geared up for commercial developers, so they required some marketing banners and other collateral. Yay for Photoshop, Illustrator and some stock icons.

The app store isn’t live yet, but the whole thing is available from my github page.

Pebble Watch

- - posted in pebble

I ordered a Pebble as a late Xmas present for myself.

It’s a nice bit of hardware – eInk display, 80MHz ARM CPU, three axis accelerometer and bluetooth to connect to a phone, all in less than 50 grammes.

Out of the box it has some fancy watch faces and the ability to display notifications from apps running on the phone, so inbound SMS or facebook updates display on the watch.

But it also has a fairly decent SDK that lets you develop native apps to run on the watch in C, and talk to either native phone apps or portable javascript apps hosted by the phone, to give a configuration UI and to get access to geolocation and web services.

Half an hour in I have “hello world” running on it. This might be interesting – though it’s been years since I’ve written more than a few lines of C.

Rice Pudding

- - posted in recipe

  • 1.5 oz pudding rice
  • 2 tbsp brown sugar
  • 20 fl oz milk (2% or full)
  • 1 tbsp butter
  • nutmeg

This recipe comes from the 1958 edition of Good Housekeeping’s Basic Cookery In Pictures.

If you can’t get pudding rice then other short grain rices might work – arborio or a short-grain japanese rice.

Put the rice, sugar and milk in a broad pyrex or ceramic bowl or pie dish, dot the surface with shavings of butter and grate some nutmeg over it.

Bake at 300F for about two hours.