Bit Shifting using primitive byte in Java

I want to decode a document encoded using variable byte. The continuation bit is 1 (not 0 as usual). For each byte that I read i check if it is bigger than 128:

  • YES (>=128) I shift the previous value of 7 bits and add the value of the less significant bits.
  • NO (<128) I shift the previous value of 7 bits and add the value of the less significant bits. End.

I tried to implement this in Java. This is the result

static int[] decodeRawDoc(byte[] rawDoc, int[] document) {
    int k = 0;
    int n = 0;
    int a = 0;
    for (byte b : rawDoc) {
        if ((b & 0xff) >= 128) {
            a = (b << 25);
            n = n * 128 + (a >>> 25);
        } else {
            int num = n * 128 + b;
            document[k] = num;
            k++;
            n = 0;
        }
    }
    return document;
}

It works, but I don't like the >>>25 shifting part. Is there a more elegant way to do the same thing, preferably using bytes?

INPUT (byte rawDoc[]):

129 233 121 130 254 18 129 134 58 142 79 170 87 11 129 233 121 130 254 18 129 134 58 130 133 126 131 52 131 97 185 1 131 22 19 131 22 130 32 53 134 1 130 109 137 58 131 52 136 99 142 68 188 104 147 30 86 129 204 9 135 96 130 109 130 225 99 135 96 135 14 159 68 142 7 129 111 131 97 133 174 215 85 137 80 131 97 159 105 130 254 18 141 105 134 229 75 2 6 129 21 129 33 129 159 56 132 5 130 139 44 140 137 162 51 2 140 138 128 70 2 36 129 21 129 33 140 138 232 46 2 133 132 92 2 21 129 21 129 33 56 129 21 129 33 45 129 21 129 33 133 56 129 233 121 130 254 18 129 134 58 142 79 80 142 79 57 19 132 80 19 148 126 19 134 107 19 131 32 2 19 16 130 2 134 107 133 66 133 66 2 141 100 43 129 233 121 130 254 18 129 134 58 148 204 254 3 2

EXPECTED OUTPUT (int [] document):

29945 48914 17210 1871 5463 11 29945 48914 17210 33534 436 481 7297 406 19 406 288 53 769 365 1210 436 1123 1860 7784 2462 86 26121 992 365 45283 992 910 4036 1799 239 481 11250645 1232 481 4073 48914 1769 111307 2 6 149 161 20408 517 34220 25317683 2 25329734 2 36 149 161 25343022 2 82524 2 21 149 161 56 149 161 45 149 161 696 29945 48914 17210 1871 80 1871 57 19 592 19 2686 19 875 19 416 2 19 16 258 875 706 706 2 1764 43 29945 48914 17210 43204355 2

OUTPUT (W/ n = n * 128 + (b + 0xFF);)

2126969 2145938 2114234 18127 21719 11 2126969 2145938 2114234 2130558 16692 16737 23553 16662 19 16662 16544 53 17025 16621 17466 16692 17379 18116 24040 18718 86 2123145 17248 16621 2142307 17248 17166 20292 18055 16495 16737 279685973 17488 16737 20329 2145938 18025 2208331 2 6 16405 16417 2117432 16773 2131244 293753011 2 293765062 2 36 16405 16417 293778350 2 2179548 2 21 16405 16417 56 16405 16417 45 16405 16417 16952 2126969 2145938 2114234 18127 80 18127 57 19 16848 19 18942 19 17131 19 16672 2 19 16 16514 17131 16962 16962 2 18020 43 2126969 2145938 2114234 311639683 2

OUTPUT (W/ if ((b & 0xff) >= 0x80) n = (n * 128) + (b + 0x7f))

13433 32402 698 1743 5335 11 13433 32402 698 17022 308 353 7169 278 19 278 160 53 641 237 1082 308 995 1732 7656 2334 86 9609 864 237 28771 864 782 3908 1671 111 353 9136981 1104 353 3945 32402 1641 94795 2 6 21 33 3896 389 17708 23204019 2 23216070 2 36 21 33 23229358 2 66012 2 21 21 33 56 21 33 45 21 33 568 13433 32402 698 1743 80 1743 57 19 464 19 2558 19 747 19 288 2 19 16 130 747 578 578 2 1636 43 13433 32402 698 41090691 2

Jon Skeet
people
quotationmark

It sounds like really you're just wanting to mask the bottom seven bits - which is most simply done using &:

if ((b & 0xff) >= 0x80) {
    n = (n << 7) + (b & 0x7f);
}

I've changed everything to either use hex or shifting, as I believe that's clearer.

people

See more on this question at Stackoverflow