Accessing and manipulating a 32bit integer as a byte array in C++ using unions
I don’t think I’ve ever used union
for anything, but today I came across a very interesting use case to avoid bit-shifting tricks when dealing with data embedded in numbers.
What’s a union?
A
union
is a user-defined type in which all members share the same memory location. This definition means that at any given time, a union can contain no more than one object from its list of members. It also means that no matter how many members a union has, it always uses only enough memory to store the largest member.
Example:
[pastacode lang=”cpp” manual=”union%20IntChar%20%7B%0A%20%20%20%20unsigned%20int%20i%3B%0A%20%20%20%20char%20c%3B%0A%7D%3B%0A%0AIntChar%20foo%3B%0Afoo.i%20%3D%2065%3B%20%2F%2F%20’A’%20in%20ASCII%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0Aprintf(%22i%3A%20%25d%2C%20c%3A%20%25c%5Cn%22%2C%20foo.i%2C%20foo.c)%3B%0A%0A%2F%2F%20i%3A%2065.%20c%3A%20A” message=”Represent an integer as a char, or char as int” highlight=”” provider=”manual”/]
We can also do the same with an anonymous union, and directly use the variables, which will change each other’s values
[pastacode lang=”cpp” manual=”union%20%7B%0A%20%20unsigned%20int%20i%3B%0A%20%20char%20c%3A%0A%7D%0Ac%3D’A’%3B%0Aprintf(%22i%3A%20%25d%2C%20f%3A%20%25c%5Cn%22%2C%20i%2C%20c)%3B%0A%2F%2F%20i%3A%2065%2C%20c%3A%20A” message=”” highlight=”” provider=”manual”/]
Let’s apply this feature to a 32 bit integer (4 bytes) and a 4 byte array
This I believe might come in handy if you need to use integers as arrays of 8 bit numbers because you can use the array ‘[]’ operator to access the individual bytes in the number without having to do bit shifting tricks (>>,<<,&,|) to extract them or manipulate them
[pastacode lang=”cpp” path_id=”51909d0741b8901a8e59d704104c2ef7″ file=”int_as_array.cpp” highlight=”” lines=”” provider=”gist”/]
Build and output:
$ g++ int_as_array.cpp && ./a.out
a: 0xaabbccdd
a[0]: 0xdd
a[1]: 0xcc
a[2]: 0xbb
a[3]: 0xaa
a: 0xaabbccff
DO NOT USE THIS TECHNIQUE IN PRODUCTION CODE:
Here’s a word about this trick from my very esteemed friend (and elite coder) Dave Nicponski
Doing this correctly is highly dependent upon following and understanding the language spec, fyi
— Dave Nicponski ✍️ (@virus_dave) November 12, 2021
I should clarify. You basically CANNOT do this safely in general. Specifically, "It's undefined behavior to read from the member of the union that wasn't most recently written." (with very restrictive exceptions). You may see "reasonable" behavior depending on the compiler…
— Dave Nicponski ✍️ (@virus_dave) November 12, 2021
…used, which may provide additional guarantees not present in the C++ spec, but it's not portable, and a compliant compiler would be allowed to do just about anything it wants to break your program (like return a zero value on the read, or skip the preceding write, etc).
— Dave Nicponski ✍️ (@virus_dave) November 12, 2021
Things to remember when compiling/linking C/C++ software
Things to remember when compiling/linking C/C++ software
by Angel Leon. March 17, 2015;
Updated August 29, 2019.
Updated last on February 27, 2023
Include Paths
On the compilation phase, you will usually need to specify the different include paths so that the interfaces (.h, .hpp) which define structs, classes, constants, and functions can be found.
With gcc
and llvm
include paths are passed with -I/path/to/includes
, you can pass as many -I
as you need.
In Windows, cl.exe
takes include paths with the following syntax:
/I"c:\path\to\includes\
you can also pass as many as you need.
Some software uses macro definition variables that should be passed during compile time to decide what code to include.
Compilation flags
These compilation-time variables are passed using -D
,
e.g. -DMYSOFTWARE_COMPILATION_VARIABLE
-DDO_SOMETHING=1
-DDISABLE_DEPRECATED_FUNCTIONS=0
These compilation time flags are by convention usually put into a single variable named CXXFLAGS
, which is then passed to the compiler as a parameter for convenience when you’re building your compilation/make script.
Object files
When you compile your .c, or .cpp files, you will end up with object files.
These files usually have .o
extensions on Linux, on Windows they might be under .obj
extensions.
You can create an .o
file for a single or for many source files.
Static Library files
When you have several .o
files, you can put them together as a library, a static library. In Linux/Mac these static libraries are simply archive files, or .a
files. In windows, static library files exist under the .lib
extension.
They are created like this in Linux/Mac:
ar -cvq libctest.a ctest1.o ctest2.o ctest3.o
libctest.a
will contain ctest1.o
,ctest2.o
and ctest2.o
They are created like this on Windows:
LIB.EXE /OUT:MYLIB.LIB FILE1.OBJ FILE2.OBJ FILE3.OBJ
When you are creating an executable that needs to make use of a library, if you use these static libraries, the size of your executable will be the sum of all the object files statically linked by the executable. The code is right there along the executable, it’s easier to distribute, but again, the size of the executable can be bigger than it needs to… why? because, sometimes, many of the .o
files, or even the entire .a
file you’re linking against might be a standard library that many other programs need.
Shared Libraries (Dynamic Libraries)
So shared or dynamic libraries were invented so that different programs or libraries would make external (shared) references to them, since they’re “shared” the symbols defined in them don’t need to be part of your executable or library.
Your executable contain symbols whose entry points or offset addresses might point to somewhere within themselves (symbols you defined in your code), but they will also have symbols defined in shared libraries. Shared libraries are only loaded once in physical memory by the OS, but its symbol’s offset are virtually mapped to the memory table of each process, so you’ll process will see the same library symbols in different addresses that some other process that uses the library.
Thus, not just making the size of your executable as small as it needs to be, but you won’t need to spend more physical memory loading the library for every process/program that needs its symbols.
On Linux shared files exist under the .so
(shared object) file extension, on Mac .dylib
(dynamic library), and in Windows they’re called .dll
(dynamic link libraries)
Another cool thing about dynamic libraries, is that they can be loaded during runtime, not just linked at compile time. An example of runtime dynamic libraries are browser plugins.
In Linux, .so
files are created like this:
gcc -Wall -fPIC -c *.c
gcc -shared -Wl,-soname,libctest.so.1 -o libctest.so.1.0 *.o
-Wall
enables all warnings.-c
means compile only, don’t run the linker.-fPIC
means “Position Independent Code”, a requirement for shared libraries in Linux.-shared
makes the object file created shareable by different executables.-Wl
passes a comma separated list of arguments to the linker.-soname
means “shared object name” to use.-o <my.so>
means output, in this case the output shared library
In Mac .dylib
files are created like this:
clang -dynamiclib -o libtest.dylib file1.o file2.o -L/some/library/path -lname_of_library_without_lib_prefix
In Windows, .dll
files are created like this:
LINK.EXE /DLL /OUT:MYLIB.DLL FILE1.OBJ FILE2.OBJ FILE3OBJ
Linking to existing libraries
When linking your software you may be faced with a situation on which you want to link against several standard shared libraries.
If all the libraries you need exist in a single folder, you can set the LD_LIBRARY_PATH
to that folder. By common standard all shared libraries are prefixed with the word lib
. If a library exists in LD_LIBRARY_PATH
and you want to link against it, you don’t need to pass the entire path to the library, you simply pass -lname
and you will link your executable to the symbols of libname.so
which should be somewhere inside LD_LIBRARY_PATH
.
Tip: You should probably stay away from altering your LD_LIBRARY_PATH
, if you do, make sure you keep its original value, and when you’re done restore it, as you might screw the build processes of other software in the system which might depend on what’s on the LD_LIBRARY_PATH
.
What if libraries are in different folders?
If you have some other libbar.so
library on another folder outside LD_LIBRARY_PATH
you can explictly pass the full path to that library /path/to/that/other/library/libbar.so
, or you can specify the folder that contains it -L/path/to/that/other/library
and then the short hand form -lbar
. This latter option makes more sense if the second folder contains several other libraries.
Useful tools
Sometimes you may be dealing with issues like undefined symbol
errors, and you may want to inspect what symbols (functions) are defined in your library.
On Mac there’s otool
, on Linux/Mac there’s nm
, on Windows there’s depends.exe
(a GUI tool that can be used to see both dependencies and the symbol’s tables. Taking a look at the “Entry Point” column will help you understand clearly the difference between symbols linking to a shared library vs symbols linking statically to the same library)
Useful command options
See shared library dependencies on Mac with otool
otool -L libjlibtorrent.dylib
libjlibtorrent.dylib:
libjlibtorrent.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1213.0.0)
See shared symbols with nm
(Linux/Mac)
With nm, you can see the symbol’s name list.
Familiarize yourself with the meaning of the symbol types:
T
(text section symbol)U
(undefined – useful for thoseundefined symbol
error),I
(indirect symbol).
If the symbol is local (non-external) the symbol type is presented in lowercase letters, for example a lowercase u
represents an undefined reference to a private external in another module in the same library.
nm
‘s documentation says that if you’re working on Mac and you see that the symbol is preceeded by +
or -
it means it’s an ObjectiveC method, if you’re familiar with ObjectiveC you will know that +
is for class methods and -
is for instance methods, but in practice it seems to be a bit more explicit and you will often see objc
or OBJC
prefixed to those methods.
nm
is best used along with grep
😉
Find all Undefined symbols
nm -u libMacOSXUtilsLeopard.jnilib
_CFRelease
_LSSharedFileListCopySnapshot
_LSSharedFileListCreate
_LSSharedFileListInsertItemURL
_LSSharedFileListItemRemove
_LSSharedFileListItemResolve
_NSFullUserName
_OBJC_CLASS_$_NSArray
_OBJC_CLASS_$_NSAutoreleasePool
_OBJC_CLASS_$_NSDictionary
_OBJC_CLASS_$_NSMutableArray
_OBJC_CLASS_$_NSMutableDictionary
_OBJC_CLASS_$_NSString
_OBJC_CLASS_$_NSURL
__Block_copy
__NSConcreteGlobalBlock
__dyld_register_func_for_add_image
__objc_empty_cache
__objc_empty_vtable
_calloc
_class_addMethod
_class_getInstanceMethod
_class_getInstanceSize
_class_getInstanceVariable
_class_getIvarLayout
My C++ code compiles but it won’t link
Linking is simply “linking” a bunch of .o files to make an executable.
Each one of these .o’s may be compiled on their own out of their .cpp files, but when one references symbols that are supposed to exist in other .o’s and they’re not to be found then you get linking errors.
Perhaps through forward declarations you managed your compilation phase to pass, but then you get a bunch of symbol not found errors.
Make sure to read them slowly, see where these symbols are being referenced, you will see that these issues occur due to namespace visibility in most cases.
Perhaps you copied the signature of a method that exists in a private space elsewhere into some other namespace where your code wasn’t compiling, all you did was make it compilable, but the actual symbol might not be visible outside the scope where it’s truly defined and implemented.
Function symbols can be private if they’re declared inside anonymous namespaces, or if they’re declared as static
functions.
An example:
Undefined symbols for architecture x86_64:
"FlushStateToDisk(CValidationState&, FlushStateMode)", referenced from:
Network::TxMessage::handle(CNode*, CDataStream&, long long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, bool, bool) in libbitcoin_server.a(libbitcoin_server_a-TxMessage.o)
Here, when I read the code of Network::TxMessage::handle(...)
there was a call to FlushStateToDisk
, which was declared in main.h
, and coded in main.cpp
. My TxMessage.cpp
did include main.h
, the compilation was fine, I had a TxMessage.o
file and a main.o
, but the linker was complaining.
The issue was that FlushStateToDisk
was declared as a static
, therefore only visible inside main.o
, once I removed the static
from the declaration and implementation the error went away and my executable was linked. Similar things happen when functions are declared in anonymous spaces in other files, even if you forward declare them on your local .h
In other cases your code compiles and you get this error linking because your library can’t be added using -lfoo, and adding its containing folder to -L doesn’t cut it, in this case you just add the full path to the library in your compilation command: gcc /path/to/the/missing/library.o ... my_source.cpp -o my_executable
Reminder:
DO NOT EXPORT CFLAGS, CPPFLAGS and the like on your .bash_profile
/.bashrc
, it can lead to unintended building consequences in many projects. I’ve wasted so many hours due to this mistake.
How to enable source highlighting when doing `less mycodefile.ext`
less mycodefile.ext
How to enable source highlighting when doing -
Install source-highlight
sudo apt install source-highlight
-
Configure it on your
.bash_profile
lessWithSourceHighlightSetup() {
# location of the script may vary
src_hilite_pipe_script=`dpkg -L libsource-highlight-common | grep lesspipe`
export LESSOPEN="| ${src_hilite_pipe_script} %s"
export LESS=' -R '
}
lessWithSourceHighlightSetup
- Use it on any code file
less -N /path/to/mycode.ext
Pascal Triangle Generator in Python, and then in Haskell – The Gubatron Method
Here’s in python, imperatively, and then in functional style without the need for loops.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def pascal(n): | |
if n == 1: | |
return [ 1 ] | |
if n == 2: | |
return [ 1, 1 ] | |
prev = pascal(n–1) | |
results = [] | |
for i in range(n): | |
if i == 0: | |
continue | |
if i == n–1: | |
break | |
results.append(prev[i] + prev[i–1]) | |
return [1] + results + [1] | |
# functional style, no loops | |
def pascal_fp(n): | |
if n == 1: | |
return [ 1 ] | |
prev = pascal_fp(n–1) | |
return list(map(lambda x,y:x+y, [0] + prev, prev + [0])) |
Here’s in Haskell, I call it the gubatron’s method, explained in the comments.
Saw it by looking at a pattern while trying to solve it in paper, it just clicked.
Not sure if this is how other people code this solution.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
— Gubatron's method | |
— n=3 [1, 2, 1] | |
— copy the list and append a 0 on the left of the first | |
— and append a 0 at the end of the second | |
— [0, 1, 2, 1] | |
— [1, 2, 1, 0] | |
— add them up! | |
— n=4 [1, 3, 3, 1] | |
— | |
— append 0s to both sides and add them up | |
— n=4 [1, 3, 3, 1] | |
— [0, 1, 3, 3, 1] | |
— [1, 3, 3, 1, 0] | |
— n=5 [1, 4, 6, 4, 1] | |
— and so on | |
— add two lists, for clarity | |
addLists :: Num c => [c] -> [c] -> [c] | |
addLists l1 l2 = zipWith (+) l1 l2 | |
pascal :: (Eq a1, Num a1, Num a2) => a1 -> [a2] | |
pascal 1 = [ 1 ] | |
pascal n = | |
let prev = pascal(n–1) | |
zero_prev = [0] ++ prev | |
prev_zero = prev ++ [0] | |
in | |
addLists zero_prev prev_zero | |
— [1,2,3] -> "1 2 3" | |
listToString = unwords. map show | |
— mapM_ -> map monadic so no weird IO errors are triggered | |
printTriangle n = mapM_ putStrLn (map listToString (map pascal [1..n])) | |
main = do | |
input <- getLine | |
printTriangle . (read :: String -> Int) $ input |
Python in Functional Style: How to add 2 lists of integers without using loops
Usually you’d add a list of integers this way:
[pastacode lang=”python” manual=”a%20%3D%20%5B2%2C%202%2C%202%2C%202%5D%0Ab%20%3D%20%5B2%2C%202%2C%202%2C%202%5D%0Ac%20%3D%20%5B%5D%0Afor%20i%20in%20range(len(a))%3A%0A%20c.append(a%5Bi%5D%20%2B%20b%5Bi%5D)” message=”” highlight=”” provider=”manual”/]
You can do it functionally without any loops in different ways:
Using map and a lambda that adds them up
[pastacode lang=”python” manual=”c%20%3D%20list(map(lambda%20x%2Cy%3A%20x%2By%2C%20a%2C%20b))” message=”” highlight=”” provider=”manual”/]
or you can import the add operator as a named function
[pastacode lang=”python” manual=”from%20operator%20import%20add%0Ac%20%3D%20list(map(add%2C%20a%2C%20b))” message=”” highlight=”” provider=”manual”/]
Ever zipped two lists into a list of tuples?
There’s another more convoluted way if you want to play with “zip”.
Imagine a jacket zipper and the teeth on each side of the zipper is one element on each one of the list.
When you zip the lists a and b, you end up with a list of tuples of matching elements from the given lists.
[pastacode lang=”python” manual=”%3E%3E%3E%20list(zip(a%2Cb))%0A%5B(2%2C%202)%2C%20(2%2C%202)%2C%20(2%2C%202)%2C%20(2%2C%202)%5D” message=”” highlight=”” provider=”manual”/]
you could now map a function to add the elements within each tuple on that list.
[pastacode lang=”python” manual=”%3E%3E%3E%20list(map(lambda%20tup%3A%20tup%5B0%5D%2Btup%5B1%5D%2C%20zip(a%2Cb)))%0A%5B4%2C%204%2C%204%2C%204%5D” message=”” highlight=”” provider=”manual”/]
Notice how we don’t convert to list after zip, we can work directly with the zip iterator, we only convert to list with the final map iterator.
Python 2 & 3 Note:
In Python 2 it’s not necessary to use list(), the map() and zip() methods return lists there. But stay away from Python 2, a lot of projects are now discontinuing support.
[linux/ubuntu] How to suppress useless mod_openssl/lighttpd error messages from appearing in /var/log/syslog
Sometimes you have a bunch of useless errors creating unnecessary disk I/O on your server, disk I/O that should be used towards serving your user’s requests efficiently.
In this case a site running on lighttpd keeps logging several times per second the following message, creating too much noise and making it hard to see meaningful things I should pay attention to could appear on /var/log/syslog.
Aug 7 19:36:03 ip-172-30-1-251 lighttpd[287019]: message repeated 44 times: [ 2020-08-07 19:36:02: (mod_openssl.c.1796) SSL: 1 error:14209102:SSL routines:tls_early_post_process_client_hello:unsupported protocol]
I tried disabling syslog error messages for SSL, and all syslog output on the lighttpd configuration to no avail. Good thing you can configure rsyslog in Linux to do amazing things with log messages before they make it into the log.
To silence this message, all I had to do was edit an rsyslog config file to filter out my undesired message, and restart the service (no need to restart your host os)
- Edited /etc/rsyslog.d/50-default.conf before any mention of /var/log/syslog, to have the following condition (ideally at the top of the config file):
[pastacode lang=”bash” manual=”if%20%24msg%20contains%20’tls_early_post_process_client_hello’%20then%20stop” message=”” highlight=”” provider=”manual”/]
- Restarted the rsyslog service, no more noise on /var/log/syslog
sudo service rsyslog restart
[CODING/SOLVED] gradle build (android) breaks after upgrading a dependency with NullPointerException thrown at ProgramClass.constantPoolEntryAccept
You’ve just upgraded one of your Android project’s dependencies and when you ./gradlew assembleRelease
the build process breaks.
You invoke it again with --stacktrace
to find the following exception:
[pastacode lang=”java” manual=”java.lang.NullPointerException%0Aat%20proguard.classfile.ProgramClass.constantPoolEntryAccept(ProgramClass.java%3A537)%0Aat%20proguard.shrink.UsageMarker.markConstant(UsageMarker.java%3A1246)%0Aat%20proguard.shrink.UsageMarker.visitRequiresInfo(UsageMarker.java%3A1040)%0Aat%20proguard.classfile.attribute.module.ModuleAttribute.requiresAccept(ModuleAttribute.java%3A138)%0Aat%20proguard.shrink.UsageMarker.visitModuleAttribute(UsageMarker.java%3A739)%0Aat%20proguard.classfile.attribute.module.ModuleAttribute.accept(ModuleAttribute.java%3A99)%0Aat%20proguard.classfile.ProgramClass.attributesAccept(ProgramClass.java%3A619)%0Aat%20proguard.shrink.UsageMarker.markProgramClassBody(UsageMarker.java%3A124)%0Aat%20proguard.shrink.UsageMarker.visitProgramClass(UsageMarker.java%3A94)%0Aat%20proguard.classfile.visitor.MultiClassVisitor.visitProgramClass(MultiClassVisitor.java%3A67)%0Aat%20proguard.classfile.visitor.MultiClassVisitor.visitProgramClass(MultiClassVisitor.java%3A67)%0Aat%20proguard.classfile.visitor.ClassNameFilter.visitProgramClass(ClassNameFilter.java%3A128)%0Aat%20proguard.classfile.ProgramClass.accept(ProgramClass.java%3A430)%0Aat%20proguard.classfile.ClassPool.classesAccept(ClassPool.java%3A124)%0Aat%20proguard.classfile.visitor.AllClassVisitor.visitClassPool(AllClassVisitor.java%3A45)%0Aat%20proguard.classfile.visitor.MultiClassPoolVisitor.visitClassPool(MultiClassPoolVisitor.java%3A85)%0Aat%20proguard.classfile.ClassPool.accept(ClassPool.java%3A110)%0Aat%20proguard.shrink.Shrinker.execute(Shrinker.java%3A90)%0Aat%20proguard.ProGuard.shrink(ProGuard.java%3A381)%0Aat%20proguard.ProGuard.execute(ProGuard.java%3A145)%0Aat%20proguard.ProGuard.main(ProGuard.java%3A572)” message=”” highlight=”” provider=”manual”/]
This is a ProGuard bug, which my friend, has been solved by the ProGuard team ages ago, and your build environment is using an old ProGuard version.
Add this to your build.gradle
to force it to use the latest version (as of today it’s 6.2.2, check the latest version here)
[pastacode lang=”java” manual=”buildscript%20%7B%0A%20%20%20%20…%0A%20%20%20%20dependencies%20%7B%0A%20%20%20%20%20%20%20%20…%0A%20%20%20%20%20%20%20%20classpath%20’net.sf.proguard%3Aproguard-gradle%3A6.2.2’%0A%20%20%7D%0A%7D%0A%7D” message=”force a newer proguard version for your android build” highlight=”1,3,5″ provider=”manual”/]
[BOULDERING] Comp Problem sent at Earth Treks Englewood
[BOULDERING] 3 week project at The Spot Denver Colorado