The generic and AESNI implementations used different conventions
regarding counter wrapping in GCM. The generic code was based on
function block128_inc_be, for which the counter is a 128-bit value.
Whereas the AESNI code used intrinsic function _mm_add_epi64, and
therefore wrapping at 2^64.
In NIST.SP.800-38d the GCM specification mandates to use incrementing
function inc32, wrapping after 2^32 blocks. This commit changes both
generic and AESNI implementations to align to the specification and
adds a test vector specially crafted to start encryption with IV block
0xfffffffffffffffffffffffffffffffe.