Brendan Long

Email address timing attacks

Today, I found an interesting article about password timing attacks. The basic point is that if you check a password one character at a time, the amount of time it takes to recieve a "bad password" response tells you how many characters you got right.

As an example, imagine a computer that takes 1 ms to check each character of a password (and does everything else instantaneous, and has no other threads), using this function to validate passwords:

def compare_strings(expected, actual):
    for c1, c2 in zip(expected, actual):
        if c1 != c2:
            return False
    return True

If the attacker guesses a completely wrong password, it will take 1 ms. Once they get one character right, it takes 2 ms. Two correct characters takes 3 ms, and so on.

What this means is that once the attacker detects the time increasing, they know that the current character is correct and can move to the next one. It's basically a real-life Hollywood hacker situation — where an attacker can watch as each character of the password is broken.

The huge security problem is that the difficulty of breaking a password increases linearly with its length, rather than exponentially like we want. With a 7 character password made of up all lowercase letters, it would only take a maximum of 208 guesses (26 * 7) to figure out the password if it was vulnerable to this attack, while a non-vulnerable system would need about 200 billion guesses (267).

The defense against this attack is actually pretty simple — just check the entire string, even after you know that that it's wrong:

def compare_strings(expected, actual):
    valid = 0
    for c1, c2 in zip(expected, actual):
        # Note: The 'and' operator short-circuits, which we don't want
        valid += c1 == c2
    return valid == 0

The good news is that string comparisons aren't generally a large part of your processing time, so it takes thousands of measurements to determine if there's any difference. The bad news is that even with a thousand measurements for each character, 208,000 is a much smaller number than 200 billion.

The other good news is that people who know what they're doing are comparing hashes, not passwords, so we're only leaking information about the hash. It's still dangerous though, since the attacker can try passwords faster than they're supposed to (if I know the hash starts with "5", I can hash all of the passwords in my dictionary, and only try ones where the hash starts with "5" too).


Anyway, this got me thinking: What about user names? In any website where usability is important, user names are email addresses (or at least, email addresses can be used to log in). But, user names are even more vulnerable to an attack like this. Most people would consider it low priority, since all you could learn is that someone registered for a website using a particular email address, but it leaks private information, so it may be important in some cases.

The reason there's a timing problem is that it's wasteful to hash a password if I know that a user named doesn't exist, so most sites would skip hashing the password if they don't need to:

def login(username, password):
    user = get_user_from_db(username)
    if user is None:
        return False
    return check_password(user.password, password)

If your password hashing algorithm is any good, this login function will spend significantly more time attempting a login for a valid email address, so an attacker could determine email address validity with as little as two (fast) requests.

The method I've though of to handle this is meant to do two things:

  • Make user name validity uncorrelated with time spent validating a login.
  • Don't waste server time hashing a password if you already know the user doesn't exist.

The obvious solution is to guess how long hashing the password will take and do an asyncronous wait until that point before returning an invalid response, but that's would be pretty difficult to get right, and would depend on way too many variables (like how fast your computer is and the load).

The idea I had is.. If making all of the times the same is too hard, why not make them all different? My solution is to wait a random amount of time, based on the user name:

def login(username, password):
    user = get_user_from_db(username)
    valid = user is not None and check_password(user.password, password)
    some_number = hash(username)
    time.sleep(some_number)
    return valid  

Just design your hash function so a user name is hashed into a single number in a range significantly longer than the amount of time it takes to hash a password. The server doesn't have to work, since it's just sleeping (realistically, you want to do an asynchronous wait so the thread keeps working), and from an attacker's perspective, the amount of time it takes to check a login has nothing to do with if the user name is valid.

Example: Checking a password takes 10 ms, and your user interface guidelines say you must return a response within 500 ms. Set up your hash function to return a random number between 0 and 490 ms. If a response takes 100 ms, is the email address valid? At that point, it's a lot less work, and just about as accurate for an attacker to just guess.

The reason we generate a hash of the user name instead of just picking a random number is that with a random delay, you could take a large number of samples and do a regression test to determine if the difference between validating two different user names is statistically significant. With this method, the difference between any two user names' validation times is statistically significant, whether or not either of them is valid. A large number of samples would just tell you what the delay's range is.

There is the downside that at the very bottom and top of the range, the attacker gets valid information — if the response takes less than 10 ms, then it's definitely an invalid user. If it took more than 490 ms, it probably is a valid user (not certainly though, since there could be other delays).

I figure this system is better than nothing, but I'm definitely interested if anyone can think of something even better.

EDIT: Apparently most OpenID providers are vulnerable to timing attacks on their signatures. and apparently an older version of OpenSSL could actually leak certain private keys. So, this is very relevant if you're crazy enough to write your own security code.

Posted

1632, Eric Flint, and the Baen Free Library

I recently moved this blog over to Posterous, since I realized that I was putting off posting things because my old custom blog software was too complicated. Now that I have that settled, I suddenly don't have much to talk about, and I realized it's because I've been spending so much time reading recently.

Since I haven't done anything else, I figured I might as well start talking about the books I've been reading. At the very least, it should get me back in the habit of writing something.


The Baen Free Library is a collection of ebooks from Baen Books, designed to prove that marketing is more important than minor losses from piracy. The way it works is that Baen authors are encouraged to put the first book or two of each series they write on the free library. Anyone can download them, and once they're hooked, they'll buy the rest of the series, and maybe try the authors' other books.

I think it's brilliant, and I had found a book I liked on it previously, but I've been busy and already have a stack of books I haven't even started, so I haven't used it much.

A couple of weeks ago, I found out that the library was created by Eric Flint — one of Baen Book's authors, so I figured I should see if he's actually any good. It turns out he also wrote (or co-wrote) the Ring of Fire series, which has been unusually open to fanfiction.

Anyway, I highly recommend 1632, and as many of its sequels as you can find time for (the books are very long). The story is about a small town in Virginia being transported through time and space to Germany in 1632 (so technically, the Holy Roman Empire), and the problems caused by advanced technology and knowledge in a small group of people with very few people and not much manufacturing capacity.

For example, having advanced technology is great, but how do you keep it working when you don't have replacement parts? And what if you don't have the parts needed to make the replacement parts? What good are guns if diseases do the real damage in war? And what happens when you throw a thriving democracy in the middle of an area with feudal lords?

It's a great read, and it's free, so why not? The only downside is that by the time you're done, you'll know way too much about Europe in the early modern era.

Posted

Advanced Makefiling

This weekend, my roommate asked me to make a website for him. I wrote it in PHP so I could just make some templates and he could import them, but it bothered me that I was rendering a static website with PHP. The easiest solution seemed to be running every file though PHP before uploading the site, so I needed a Makefile.

The Makefile

PROJECT := example
PACKAGE := $(PROJECT).tar.gz

# Get all content/*.php and convert to ./*.html
HTML_FILES := $(patsubst content/%.php, %.html, $(wildcard content/*.php))

# Get all js/*.js and convert to js/*-min.js
JAVASCRIPT_FILES := $(patsubst %.js, %-min.js, $(filter-out %-min.js, $(wildcard js/*.js)))

# Get all css/*.css and convert to *-min.css
CSS_FILES := $(patsubst %.css, %-min.css, $(filter-out %-min.css, $(wildcard css/*.css)))

RED := \033[31m
GREEN := \033[32m
CLEAR := \033[0m

.PHONY: all
.PHONY: clean
.PHONY: dist

all: $(HTML_FILES) $(JAVASCRIPT_FILES) $(CSS_FILES) ;

clean:
        rm -f $(HTML_FILES) $(JAVASCRIPT_FILES) $(CSS_FILES) $(PACKAGE)

dist: $(PACKAGE) ;

$(PACKAGE): $(HTML_FILES) $(JAVASCRIPT_FILES) $(CSS_FILES)
        @tar -caf $(PACKAGE) $(HTML_FILES)
        @echo -e "Built package $(RED)$@$(CLEAR)"

%.html: index.php content/%.php
        @echo -e "Generating $(GREEN)$@$(CLEAR) from $<"
        @php index.php --page=$(<:%.php=%) > $@
        @tidy -m -utf8 -q $@

%-min.js: %.js
        @echo -e "Generating $(GREEN)$@$(CLEAR) from $<"
        @yuicompressor -o $@ $<

%-min.css: %.css
        @echo -e "Generating $(GREEN)$@$(CLEAR) from $<"
        @yuicompressor -o $@ $<

There's quite a bit in this file that was new to me, so I figured it might be helpful for my loyal readers to explain it.

Implicit Rules

Implicit rules are any Make file that involves a wildcard. The wildcard in this case is '%'. For example, this rule will be used any time we need to compile .c files into .o files:

%.o : %.c
        gcc -o $@ $<

Notice that it reads like a normal rule, just using a wildcard: "When you need any .o file, create it from a .c file with the same name". Also notice the automatic variables, $@ and $<. $@ is the target (the file to be created) and $< is the name of the first target (the file to make it out of). There are also other automatic variables, but these were the most useful to me.

Functions

GNU Make comes with several built-in functions. You call them in this form:

$(function_name, param1, param2, etc)

The text functions were the ones I used most, like subst and patsubst (terribly named), which are variations on find/replace, and filter and filter-out (better names). I could imagine the others being useful too (strip, sort, etc.). There are also several other kinds of function whichs seem useful (see the link).

.PHONY targets

The way Make works is by checking the target file's timestamp and rebuilding if it's old. This becomes a problem if you don't actually create the target file — for example, with the all, clean and dist targets. You don't create an "all" file, it's just a name. Make's way of dealing with this is phony targets, which basically tells it "the target isn't actually a file", so it doesn't check that file, it just always trys to build. This has two advantages:

  • It's very slightly faster (because it's not checking for a file that should never exist).
  • It's slightly safer (because if that file every does exist, it'll still work right.

Using a phony target looks like this:

.PHONY : all
.PHONY : clean
.PHONY : dist

Empty targets

This is a minor one, but if you want an empty target, just end it with a semi-colon (;):

all: $(DEPENDS) ;

And that's all for today's show. Tune in next week.

Posted

Why I love NumPy

Doing linear algebra homework reminds me why I love NumPy so much. Instead of doing a 2-page homework assignment and then finding out at the end that something is wrong, I can check at every step. For example, to reduce a matrix A to the identity, using only elementary matrices:

#!/usr/bin/env python2
from itertools import chain
import numpy as np

# Given by the assignment
a = np.matrix([
        [-1, 1, 1],
        [3, 1, 0],
        [-2, 1, 1]
])

# Add to e as I find them
e = []

e.append(np.matrix([
        [1, 0, -1],
        [0, 1, 0],
        [0, 0, 1]
]))

e.append(np.matrix([
        [1, 0, 0],
        [0, 1, 1],
        [0, 0, 1]
]))

e.append(np.matrix([
        [1, 0, 0],
        [-1, 1, 0],
        [0, 0, 1]
]))

e.append(np.matrix([
        [1, 0, 0],
        [0, 1, 0],
        [2, 0, 1]
]))

e.append(np.matrix([
        [1, 0, 0],
        [0, 1, -2],
        [0, 0, 1]
]))

e.append(np.matrix([
        [1, 0, 0],
        [0, 0, 1],
        [0, 1, 0]
]))

e.append(np.matrix([
        [1, 0, 0],
        [0, 1, 1],
        [0, 0, 1]
]))

e.append(np.matrix([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, -1]
]))

# e_n * e_n-1 * ... * e_0 * a
# Could have used the multiplication operator, but writing
# the lambda was faster than looking it up.
print(np.vectorize(int)(reduce(lambda x, y: x * y, chain(reversed(e), [a]))))

Just run this after every step and I can be sure that my calculations are correct.

Also, what's not to love about that last line?

Posted

Now with SSL goodness

Note: This post was about my previously self-hosted blog, so any references to "this site" mean my old Linode. I thought it was interesting enough to keep.

I set up an SSL certificate for this site today and it was surprisingly easy.

Last time I wanted a free certificate, it was a huge pain. This time it was actually easy. A Google search for "ssl certificates stackoverflow" lead me to the Cheapest web certificates question, which has StartCom's StartSSL listed.

Once I was there, it was relatively simple to set up a certificate. Their validation just consists of sending an email to the owner of the domain, which was convenient enough. They have a way to generate private keys for you, using a hardware random number generator, but it didn't work for me, so I just used OpenSSL:

openssl genrsa -des3 -out brendanlong.key 2048
openssl req -new -key brendanlong.com -out req.csr
# type in a bunch of info

Anyway, once I had a certificate signing request, I pasted it into the site and got a signed certificate.

After that, I needed to make Nginx serve this site using it. Nginx apparently doesn't support certificate chaining, but this site gave a solution to how to serve the certificate and the intermediates: just concatenate them. I did that, then put them on the server, and added this to my nginx.conf:

ssl  on;
ssl_certificate  /etc/nginx/ssl/brendanlong.crt;
ssl_certificate_key  /etc/nginx/ssl/brendanlong.key;

And now you can access this site at https://www.brendanlong.com/.

I also had to update my Google JavaScript and CSS references to use https to avoid the Firefox warning, "This site is partially encrypted".

I'm considering making this the default, since it costs me nothing, but there's also nothing sensitive on this site — the only information that's ever sent is emails (and my email address is posted) and color settings. With that in mind, the main thing holding me back is the extra page load time. This site loads incredibly fast on everything from a normal browser to a phone, but SSL requires an extra two round-trips, which would make it significantly slower on some connections.

Posted

Fixing a clock with NTP

My room-mate's computer's clock has been broken since we first installed Linux on it. It was never a big deal, but it's been annoying me for a long time.

Recently, my laptop started to have a similar problem. A friend had used Windows on it, and so the clock was set to local time instead of UTC. Fixing that finally gave me the inspiration to fix my room-mate's computer as well.

I had NTP running, so after about 11 minutes of being on, my laptop's time would fix itself. The problem was that the time wasn't being saved. Apparently Arch Linux added a hwclock daemon since I first installed it, so I just needed to add that to my daemons array and the problem was solved.

Fixing that on the room-mate's computer didn't solve the problem though, and further investigation revealed that the clock's drivers weren't working. His motherboard has poor Linux support, so it was surprising, and unfortunately there's no much I can do about that. What I can do is use NTP.

Setting up NTP put his computer in the same situation as mine: The clock was off until NTP fixed it, which sometimes took a long time. Luckily, there's a program to set the date immediately, called ntpdate. The only problem was that I couldn't just put it in rc.local; it needed to run after the network was up.

NetworkManager has a dispatcher, which will run all of the scripts in a directory (/etc/NetworkManager/dispatcher.d) whenever a network interface starts or stops. Given that, it was easy to set up a solution:

#!/bin/bash
# File: /etc/NetworkManager/dispatcher.d/ntp

# This file tells us if the script has already
# been run. It must be in tmpfs, so it will be
# deleted when the computer shuts down.
tmpfile=/dev/shm/ntp-sync

servers="0.us.pool.ntp.org \
         1.us.pool.ntp.org \
         2.us.pool.ntp.org \
         3.us.pool.ntp.org"

# The first time a network interface goes up,
# update the time using NTP.
if [ "$2" = "up" ]; then
        if [ ! -f $tmpfile ]; then
                ntpdate $servers
                touch $tmpfile
        fi
fi

With this, every time his machine starts up, the time is off, and then about 30 seconds later it's correct (about the amount of time it takes to type your password on the login screen).

Problem solved.. ish.

Posted

How to find palendromes in Python

In school this week, the idea of finding palindromes in Python came up, as a homework assignment in a class I took last semester. A friend asked me for some guidance in how to do this in Python, so I showed him my "simple" solution:

#!/usr/bin/env python2
import operator
import string
import sys

words = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else sys.argv[0]

result = reduce(lambda x, y: x and y, map(lambda x, y: x == y, *map(lambda x: (x[:len(x)/2], x[:(len(x) - 1) / 2:-1]), (reduce(operator.add, map(lambda x: x if x.isalpha() else "", words)),))[0]))

print(words)
print(result)

See, it's only a single line.. and so clear too.

If you happen to be one of those crazy people who uses functions besides map and reduce, it could be shortened to this:

#!/usr/bin/env python2
import operator
import string
import sys

words = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else sys.argv[0]

result = all(map(operator.eq, *map(lambda x: (x[:len(x)/2], x[:(len(x) - 1) / 2:-1]), ("".join(filter(str.isalpha, words)),))[0]))

print(words)
print(result)

Of course, there's also the super-lame solution, which doesn't use map or reduce at all (this is the version I actually turned in last semester):

#! /usr/bin/python
# Note: This program works in Python 2.7 and Python 3.1
#       (and possibly other versions)
# Brendan Long
try:
        # Python 2 - zip returns a list, izip returns a generator
        from itertools import izip as zip
except ImportError:
        # Python 3 - zip returns a generator, no need to change
        pass
import sys

def pal(in_string):
        """"
        check whether in_string is a palindrome
        return True if so, False otherwise
        """
        
        # Take the first half of the string and the second half
        # reversed.
        # If the string has odd length, we just drop the middle
        # character because it doesn't matter.
        forward        = in_string[:len(in_string)//2]
        backward = in_string[:(len(in_string) - 1)//2:-1]
        
        # If the beginning iterator and ending iterator always match
        # up, the string is a palendrome.
        return all((begin == end
                     for begin, end
                     in zip(forward, backward)))


def clean(in_string):
        """
        remove non letters from in_string
        turn uppercase letters into lower case
        return result
        """
        return ''.join(filter(str.isalpha, in_string)).lower()

if __name__ == "__main__":
        # read arguments from command line
        input = " ".join(sys.argv[1:])
        print("%s: %s" % (input, pal(clean(input))))
Posted