Transactional memory #
The limitations of locking #
- Locks force tradeoffs between:
- Degree of concurrency (performance)
 - Chance of races, deadlock (correctness)
 
 - Coarse grained locking: lowers concurrency, higher chance of correctness
- e.g. single lock for the data structure
 
 - Fine-grained locking: high concurrency, lower chance of correctness
- e.g. hand-over-hand locking
 
 - Better synchronization abstractions?
 
Review: ensuring atomicity via locks #
void deposit(Acct account, int amount) {
    lock(account.lock);
    int tmp = bank.get(account);
    tmp += amount;
    bank.put(account, tmp);
    unlock(account.lock);
}
- Deposit: Read-modify-right operation; want deposit to be atomic with respect to other bank operations on the account
- Locks help ensure this atomicity by ensuring mutual exclusion on the account
 
 
Programming with transactions #
- What if we could declare a section of code atomic?
 
void deposit(Acct account, int amount) {
    atomic {
        int tmp = bank.get(account);
        tmp += amount;
        bank.put(account, tmp);
    }
}
- Atomic construct is declarative: programmer states what to do (maintain code atomicity), not how to do it
- System could implement this in multiple ways - using locks, for example
 - Optimistic concurrency: maintain serialization only in situations of true contention (R-W or W-W conflicts)
 
 
Declarative vs. imperative abstractions #
- Declarative: programmer defines what should be done (e.g. execute 1000 independent tasks)
 - Imperative: programmer states how it should be done (e.g. spawn N worker threads, assign work to threads via removing work from shared task queue)
 
Transactional memory (TM) semantics #
- Memory transaction
- Atomic and isolated sequence of mem. accesses
 - Inspired by database transaction
 
 - Atomicity: all or nothing
- Upon transaction commit, all memory writes in transaction take effect as one
 - On transaction aborts, none of the writes appear to take effect, as if transaction never happened
 
 - Isolation
- No other processor can observe writes before transaction commits
 
 - Serializability
- Transactions appear to commit in single serial order
 - However: no exact order of commits guaranteed by semantics of transaction
 
 
Motivating example: Java HashMap #
Map: key -> value, implemented as hash table with linked list per bucket
Sequential implementation #
public Object get(Object key) {
    int idx = hash(key);
    HashEntry e = buckets[idx];
    while (e != null) {
        if (equals(key, e.key)) {
            return e.value;
        }
        e = e.next;
    }
    return null;
}
- Bad: not thread safe (when sync. needed)
 - Good: no lock overhead when sync. not needed
 
Java 1.4 solution: synchronized layer #
public Object get(Object key) {
    synchronized (myHashMap) {
        return myHashMap.get(key);
    }
}
- Convert any map to thread-safe variant
 - Use explicit, coarse-grained mutual locking specified by programmer
 - Good: thread-safe, easy implementation
 - Bad: limits concurrency, poor stability
 - A better option: finer-grained locking (e.g. lock per bucket)
- Thread-safe, but incurs lock overhead even if sync. not needed
 
 
Transactional HashMap #
- Simply enclose all operations in atomic block
public Object get(Object key) { atomic { return m.get(key); } } - Good: thread-safe, easy to program
 - Performance and scalability: depends on workload and implementation of atomic (to be discussed)
 
Another motivation: failure atomicity #
Initially:
void transfer (A, B, amount) {
    synchronized(bank) {
        try {
            withdraw(A, amount);
            deposit(B, amount);
        } catch (exception1) {
            rollback(code1);
        } catch (exception2) {
            rollback(code2);
        }
    }
}
- Complexity of manually catching exceptions:
- Programmer has to specify edge cases, can miss some
 - Undo code may be tricky to implement partially
 
 
With transactions:
void transfer (A, B, amount) {
    atomic(bank) {
        withdraw(A, amount);
        deposit(B, amount);
    }
}
- System now responsible for processing exceptions
- All exceptions (except those managed explicitly by programmer)
 - Transaction aborted and memory updates are undone
 
 
Another motivation: composability #
void transfer(A, B, amount) {
    synchronized(A) {
        synchronized(B) {
            withdraw(A, amount);
            withdraw(B, amount);
        }
    }
}
- Deadlocks when A, B try to transfer to each other
 - Composing lock-based code can be tricky: requires system-wide policy to get correct, but these can break software modularity
 
With transactions:
void transfer(A, B, amount) {
    atomic(bank) {
        withdraw(A, amount);
        deposit(B, amount);
    }
}
- Transactions theoretically compose gracefully
- Programmer declares global intent: atomic execution of transfer; no need to know about global implementation strategy
 
 - Transaction in 
transfersubsumes any defined inwithdrawanddeposit- System manages concurrency as well as possible: serialization when needed (contention), concurrenct otherwise
 
 
The difference between atomicity and locks #
- Atomic: high-level declaration of atomicity
- Does not specify the implementation
 
 - Lock: low-level blocking primitive
- Does not provide atomicity or isolation on its own
 - Can be used to implement atomic block, but can be used for purposes beyond atomicity
- Cannot replace all locks with atomic regions
 
 - Atomicity eliminates many data races, but programming with atomic blocks can still suffer from atomicity violations
 
 
Implementing transactional memory #
- TM systems must provide atomicity and isolation
- All writes must take effect all at once
 - All or nothing: failure leads to no write taking effect
 - No reads observed before all writes commit
 
 
Data versioning policy #
- Manage uncommitted (new) and previously committed (old) versions of data for concurrent transactions
 - Eager versioning: undo-log based
- Update memory immediately, maintain “undo log” in case of abort
 - Good: faster commit (data already in memory)
 - Bad: slower abort, fault tolerance issues (consider crash in middle of transaction)
 - Need to check something else is not reading/writing in middle of the eager versioning update block
 
 - Lazy versioning: write-buffer based
- Log memory updates in transaction write buffer, flush buffer on failure
 - Update actual memory location on commit
 - Good: faster abort (just clear log), no fault tolerance issues
 - Bad: slower commits
 
 
Conflict detection policy #
- Must detect and handle conflicts between transactions
- Read-write conflict: transaction A reads address X, which was written to by pending (but not yet committed) transaction B
 - Write-write conflict: transactions A and B are both pending, and both write to address X
 
 - System must track a transaction’s read set and write set
- Read-set: addresses read during the transaction
 - Write-set: addresses written during the transaction
 
 - Pessimistic detection: check for conflicts immediately during loads and stores
- Contention manager decides to stall or abort transaction on detection of conflict
 - Good: detect conflicts early -> faster and less work on abort
 - Bad: no forward progress guarantees, more aborts in some cases
 - Bad: fine-grained communication: check on each load/store
 - Bad: detection on critical path
 
 - Optimistic detection: detects conflicts when transaction attempts to commit
- On conflict, give priority to committing transaction; other transactions may abort later on
 - Good: forward progress guarantees
 - Good: bulk communication and conflict detection
 - Bad: detects conflicts late, can still have fairness problems