Is it better to use integer filter ranges or array|vector comparisons for explicit option set compositions?

One of my current open source projects utilizes different crypto-libs.

one of them is openssl. a openssl thing for the normal encrypt / decrypt api call is the options field. which basically just defines how the output should return.

During writing the documentation for all the properties / fields, I realized that instead of writing

valid_options = [OPTION_1, OPTION_2, OPTION_3, OPTION_1 | OPTION_2, ....]; 
if (!valid_options.in(passedOption)) {
 complain|return
}

I could just use

option_lower_boundary = 0
option_upper_boundary = OPTION_1 | OPTION_2 | OPTION_3

if (!option.between(option_lower_boundary, option_upper_boundary)) {
 complain|return
}

because they use disjunctions for the option parameter and it's binary representation of 0,1, 2, 4, ....

which can be any combination between

0 and 7 which instead of writing all possible combination anything within this range is a valid input.

I obviously think this is way more elegant .... but that's is just my perspective.

So what do you think? Should I be explicit or is using the mathematical concept of ranges better?

To some degree it's the every pure function can be exchanged with a table principle and the trade between "this can be between 0 and 7" or "is it 0,1,2,3,4,5,6,7" which is basically the difference between boundary checks vs explicit value checks ? ....

But the classic problem is implicit bias ... just because it makes sense to me should I leave it like this? I wrote the principle in the documentation and linked the check in the code base to the example how it works. Which is enough for me .... but is it?

Another question is about the documentation, i personally tend to write example code in my documentation because I want to write for beginners.

So I have docblocks above properties containing a larger documentation + examples + links to different sources which explain the words used. Do you think that's a good idea?

Why or why not? :)

thx :) for any feedback

Well, I think you are hitting a caught-between-worlds problem. In low-level languages, like C, one often uses bit-fields in order to communicate options in a lean way. One integer usually is the most performant variable type, and using bitfields it can hold quite a lot of (boolean) information. Passing it around and copying it is fast and it can easily live on the stack. For low-level devs, bit-fields are a common appearance and people know how to handle them.

Since I cannot argue for PHP, because I have not used it for a long time, let me discuss JS at least. JavaScript is, by all means, not low-level. Everything lives on the heap and it is good practice to do (slow) string comparisons in order to have very verbose source code, which is easy on the eyes of developers and designers alike. It is a web language, made for animation and visual improvement, not performance.

Hence I think, even though in OpenSSL, a C-based library, bit-fields are a common thing, in JavaScript, using objects which hold the options in a verbose way is more natural. I'd actually argue that in JS, you don't get a big performance or memory benefit from bit-fields, and they might feel alien to other JS devs.

That's why

/**
 * Holds options
 * @type {Object.<string, boolean>}
 */
const options = {
  opt1: true,
  opt2: false,
  opt3: true,
};

would be my preferred way to go. Try to get performance from algorithms in JS, not from memory layouts. JS engines usually are optimized to handle objects anyway. If you really need that kind of optimization, go for WebAssembly or (in case of NodeJS) FFI bindings.

I cannot change the interface in the end I can derive explicit states in my implementation as an optionset but I automatically increase the complexity of maintenance as well as the complexity of checks.

Can you wrap the API, though? With an options object, you wouldn't need to do any checks:

let bitfield = 0;

bitfield |= options.opt1 && OPTION1_BIT;
bitfield |= options.opt2 && OPTION2_BIT;
bitfield |= options.opt3 && OPTION3_BIT;

// or

const optArr = [];

options.opt1 && optArr.push(OPTION1);
options.opt2 && optArr.push(OPTION2);
options.opt3 && optArr.push(OPTION3);

if a boundary check in general is a valid decision over and explicit check for every case.

using an array does not need a boundary check, but would need to be checked against valid options. With an integer, checking bounds would limit the passed integer to the range of bits which are actually used. However, it would also mean that you have to change this very check in every place you use it once you introduce a new option. That might become quite a maintenance burden, or at least means some globally accessible function, which does the check...

about the way i commented/documented things within the source code

So I have docblocks above properties containing a larger documentation + examples + links to different sources which explain the words used. Do you think that's a good idea?

I think it's a bit utopic to document everything in such detail. Will beginners even work with the code base? Is it worth to spend so much money on documenting principles best left to a classroom? Wouldn't a "how-to-use" be enough?

In my opinion, code with a lot of documentation and samples is great for onboarding and new programmers, however, since I work for a company, I have a tight schedule, and I usually expect to work with professionals who either know common patterns and algorithms or are motivated (at least by money) to ask me or google themselves (given the pattern name). So I tend to keep documentation short, explain a few steps in algorithms, if not clear, but leave out extensive documentation, examples and links. At the moment, I have a team member, who does not know some patterns, so I implement certain pieces of code for them and just tell them how to use the API and where they can get more in-depth knowledge to fill the gap. It's not ideal, but we get the job done in a short amount of time :)

Can you wrap the API, though? With an options object, you wouldn't need to do any checks

I could it, that was kinda my question if someone could give me a real benefit. Currently I don't see the benefits of changing it into an Options-Objects. But you gave me an idea for a better maintainable setup.

using an array does not need a boundary check, but would need to be checked against valid options. With an integer, checking bounds would limit the passed integer to the range of bits which are actually used. However, it would also mean that you have to change this very check in every place you use it once you introduce a new option. That might become quite a maintenance burden, or at least means some globally accessible function, which does the check...

it's a trait in my case but yes it's 1 point for the check otherwise the option-object would be the more maintainable choice.

I think it's a bit utopic to document everything in such detail. Will beginners even work with the code base? Is it worth to spend so much money on documenting principles best left to a classroom? Wouldn't a "how-to-use" be enough?

it's open source and cryptographics has its own domain so I wanted to add a starting for all configuration settings possible

/**
     * @see https://security.stackexchange.com/questions/136180/tls-1-2-and-enable-only-aead-ciphers-suite-list
     * @see https://crypto.stackexchange.com/questions/27243/what-is-the-advantage-of-aead-ciphers
     * @see https://en.wikipedia.org/wiki/Authenticated_encryption
     *
     * used in AEAD cipher mode (GCM or CCM).
     *
     * @var string (optional)
     */

so they can start from there. With my contracts or at the companies I tend to not do this in such detail for things outside my concrete implementation. But for the frameworks I contributed to I had to write documentations for the users (developers). But yes it may be overkill and hopefully at least 1 dev will read it and be happy about it gg

anyhow :) thx for the great feedback :)

Sebastian I just saw I missed your answer :) you guys need to tag me otherwise I don't get a notification :) sorry :)

Is just using a parameter for each option a possibility? Or using a configuration object? Probably openssl does not work that way, but I'd consider using that internally and packing all the bits into openssl format at the last step.

Then the check is nicely encapsulated, and if you give it a meaningful name so that its immediately clear what it does, then I feel you can choose the brief, performant option. Asymptotically that's what you'll have to do anyway: you don't want 1024 conditions if there are 10 options, unless many combinations are disallowed.

yes they just use the bit pattern, classic low-level mathematician/informatics approach easy state representation and very efficient.

:) from a mathematical point of view as well on the amount of operations, I agree. This is one of the most efficient ways to represent all possible combinations without actually writing them.

Still in this case specific case the possible combinations are 2³ which actually could be represented explicitly, it's extracted and 'well documented'.

I just don't want to see if someone finds a good reason not to do it like this or maybe I overlooked something :)

thx for your feedback :)

What you're talking about here are basically "flags" -- and bitwise operations are ALWAYS fastest for flags since you can do comparisons in one operation instead of many.

You want to know if a 32 bit number is 0..7, you just do "flags & 0xFFFFFFF8" and if the result is non-zero, it's out of range. Just like checking for a single value, lets say your third flag (0x04) you just "flags & 0x00000004" and if it's non-zero it's set.

That's why you "or" them together. Admittedly you're limited to whatever your largest integer bit-size is for a variable, but overall it is the fastest approach 'under the hood'.

Comparing to an array is SLOW. MORE so in interpreted languages with loose typecasting. Whilst in C it ends up relatively simple and fast since you just shift a couple times to make the offset match the field size, you get into JavaScript and you don't have real arrays, you have painfully slow pointered lists.

So even ranges are often better in that regard, since a single range check is ALWAYS just two comparisons, whilst iterating through a flat array can be anywhere from one comparison to as many elements are in the array. You have 200 array elements, that's possibly as much as 200 compares.

... which is where a flattened binary tree starts to shine, since then your number of comparisons is the same as the number of bits needed to store the number of elements. You have 256 elements to compare against, a flattened btree would never take more than eight compares.

It's really hard to say more though since I've no idea your usage scenario, WHAT options you are setting, etc, etc. Are these actually flags, or are they values? If they're flags, and you have 32 or less of them, store it as a 32 bit integer and use "and or" to work on them.

... and I like turtles.

cool ;D you basically did the same as Marco Alka just the other way around, I am aware of this; mainly because my girlfriend is beside a multitude of skills a mathematician as well and for a while we did mathematical proofs for breakfast for fun. Which actually makes you appreciate math and state representation by shifting the base a lot more and i agree as hole heartedly as with him.

Personally I think using flags is one of the most elegant and simple ways to represent switches as well as filters.

And this is where we can get into an endless loop of wanted simplicity vs needed complexity from a readers perspective. Which I personally don't think there is a valid answer to just constraints and requirements.

I love your and marcos answer in general, but so far Mark is the only one who actually answered me.

Although to be fair, I tend to talk to much and should structure my questions in a clear way. The headline is answered perfectly from you both, which is awesome.

Thx, for clarifying that those are indeed flags and explaining why they are one of the most efficient ways to represent and check state.

I'm not sure I did the same -- or reverse -- as Marco Alka since he's using what I'd consider to be an object, which is array-like and would require more complex logic... and to use flags for single conditions you WOULD end up with nested if/if/if/if.

But then i'm still not sure I fully understood your question.

And this is where we can get into an endless loop of wanted simplicity vs needed complexity from a readers perspective. Which I personally don't think there is a valid answer to just constraints and requirements.

That treads into the concept of "false simplicity" -- it is possible to simplify something below the requirements of getting the job done easily. Your result can LOOK simple, but actually impedes the work.

Great article about that from a UI design perspective on Baymard Institute's site:

baymard.com/blog/false-simplicity

The section 3 "Mismatch Between UI and Task Complexity" is a UI example of the same thing that can happen in programming with API's and methodologies. The example of Apple's UI for disk formatting is perfect, because the new version is artsy and cleaner, with all sorts of cute animated doo-dads for showing/hiding the value you've selected likely meaning it takes more code/resources to do -- for an infrequent task where simply showing all the possible values -- aka the old way -- would have been more useful.

Machine language programmers have known false simplicity for ages, particularly in cases where high level languages are often not only slower, but also harder to understand, follow, or end up more work. When you work in ML/Assembler you start to realize that optimization is not one simple unified thing -- as optimizing for execution time, code size, and memory size are NOT all the same thing...

For example, let's say we're working in simple 16 bit x86 machine language and want to calculate a screen offset in VGA 320x200x256 mode. The big thing needed there is to multiply by 320. The "easy" code for this and the one that uses the least code memory is:

; assuming ax is our screen Y
mov  bx, 320
mul  bx
mov  di, ax

But on a 8088 that multiply is SLOW, to the tune of 120 clocks or MORE.

A faster way is to use shifts getting another register involved, We would want the result in DI anyways so:

; assuming ax is our screen Y
mov  cl, 6
shl  ax, cl ; ax << 6 == Y * 64
mov  di, ax
mov  cl, 2
shl  ax, cl ; (ax * 64) << 2 == Y * 256
add  di, ax ; Y * 64 + Y * 256 == Y * 320

BUT, since the 8086/8088 lack a 'shift by immediate' that wastes the extra CL register for multiple shifts, and those shifts take 8 clocks as a base and 4 more for each shift. We can unroll the two latter shifts into single operations of shift by one, which is supported, but that initial shift by six is a dog -- though not as bad as the MUL.

A cute trick for a shift by eight though? We're using 16 bit integers and the valid range is 0..200, so just exchange AL (the low 8 bits) and AH (the high 8 bits). We can then shift the OPPOSITE direction to get our * 64.

; assuming ax is our screen Y
xchg  al, ah ; same as ax = ax * 256 assuming bottom 8 bits empty
mov   di, ax ; == Y * 256
shr   ax, 1 ; Y * 256 >> 1 == Y * 128
shr   ax, 1 ; Y * 128 >> 1 == Y * 64
add   di, ax ; Y * 64 + Y * 256 == Y * 320

Smaller AND faster. BUT, we could go even faster by simply using a lookup table.

; assuming bx is our screen Y
; and scanlines is a 400 byte long array of values
shl  bx, 1 ; align index to word width
mov  di, [scanlines + bx] ; go to lookup table

Which is as small in code as the MUL, faster than any of the methods presented so far, but requires a 400 byte lookup table in the data area (variable space)

Now taking all of that into consideration, which one is "simplest'? Which one is the most optimized? Which is easiest to understand?

Those questions are why I like to know as many different ways as possible of doing things, and try NOT to 'ritualize' myself into any one solution to a problem. People often find solutions to problems then dogmatically apply it to everything like the proverbial carpenter who's only tool is a hammer; suddenly everything looks like nails to them.

In your case if these are flags, storing the flags on your object / function, having separate condition checks for those flags and/or setting up a pointer-style operation, would provide code clarity and help to make it self documenting. It simplifies maintenance, improves comprehension, whilst giving clear queues to the end user as to what does what.

Just look at PDO and the various PDO ini setting parameters... or PHP's error_reporting and how its flags work.

Sometimes it helps to hybrid the approaches too. I have something very similar to this in one of my JS codebases. The use of function calls does introduce some overhead, but it keeps the logic of the test conditions simple and easy to override if I want to change the object's prototyping and/or do some type of inheritance.

function whatever(flags) {
    for (var flag in this.flags)
        if (flags & this.flags[flag])
            this.flagHandlers[flag]();
}

whatever.flags = {
    something  : 0x01,
    another    : 0x02,
    yetAnother : 0x04,
    lastOne    : 0x08
}

whatever.flagHandlers = {
    something : functon() { console.log('something'); },
    another : functon() { console.log('another'); },
    yetAnother : functon() { console.log('yetAnother'); },
    lastOne : functon() { console.log('lastOne'); }
}

// example test

whatever(
    whatever.flags.another |
    whatever.flags.lastOne
);

It's an interesting approach to it, not suited for all situations but handy in some.

Really though that's the trick to any of this, it's not so much about which is "better", but which is better FOR WHAT?

Jason Knight you need to tag me :) otherwise I overlook. This is by far the best answer :) thx

Thread

Is it better to use integer filter ranges or array|vector comparisons for explicit option set compositions?

Responses(3)

Recent threads

Search Hashnode

Is it better to use integer filter ranges or array|vector comparisons for explicit option set compositions?

Responses(3)

Recent threads