Recently the host OS in Azure is being upgraded to a new version. As a part of this, a DLL of networking agent called
WVCDLL has to be upgraded too. My team member tried and got a werid error. Attaching debugger to the program, the IP
stopped at a logging entry, which did not make much sense. Then I looked closely at the stack trace:
From the stack trace the problem occurred at 00007ffe 27321171, the call instruction. The symbol name indicates that
the code in WVCDLL is attempting to call a function defined in OsLogger via import table. Therefore the error is caused
by either the dependency does not exist or it is a wrong version.
In native code, a dependency DLL can be set to delay loaded for two reasons:
The main program may not call any function in the dependency at all.
Even if it does, the function may be called late in the execution.
Delay-loading can improve the program load speed for obvious reason. The downside is we may not see the dependency load
failure when the program is loaded, as previous failure shows.
In this case, a closer look turned out that the problem was caused by wrong version was used during the build. This is
a typical scenario of diamond dependencies:
Program depends on DLL A and B.
DLL A and B are developed and distributed independently.
Both A and B depend on DLL C.
Unlike managed code, native code does not check the version of DLL being loaded. Suppose DLL A version 1 uses DLL C
version 1.1, DLL B version 2 uses DLL C version 1.2 which has a different import table, then we link A and B into the
same program, although the build process is successful, at runtime it will fail to resolve function entries.
Most teams in Microsoft have compoentized the code base and migrated the source repository from centralized SD to
distributed Git – I may write another post to talk about how the engineering system looks like. Since the release
cadence of different components are completely different, this problem may not be rare to see.
Lastly, in case anyone does not know one can check the version of loaded module in windbg using lmvm:
0:019> lmvm wvcdll
Browse full module list
start end module name
00007ffe`27310000 00007ffe`273ad000 wvcdll (deferred)
Image path: xxxxxxxxxxxxxxxxxxxxxxxxxxxx\wvcdll.dll
Image name: wvcdll.dll
Browse all global symbols functions data
Timestamp: Tue Mar 8 11:36:39 2016 (56DF29C7)
...
File version: 6.0.36.1
Product version: 6.0.36.1
...
PrivateBuild: (by corecet on CY1AZRBLD15)
FileDescription: xxxxxxxxxxxxxxxxxxxxxxxx r_mar_2016 (6f6baea) Microsoft Azure®
Based on the version and other info, one can load matching PDB for ease of debugging.
Our projects are mostly C# and C++ code, and MSBuild is the engine to drive the entire build process. If the code base
is large, the log file can be very large. For instance, our build log is over 600 MB and counts for over 5 million
lines of text. Although the logging is designed to be human readable, it is a challenge to read and extract useful
information in such a huge file. I would like to share a few tips and hope this info makes your work easier.
Firstly we need a decent editor that is capable to handling large files. I use VIM all the time, it works well in this
scenario. Some people like Sublime Text, which I have no experience, but it seems to be equally powerful. Notepad is a
bad choice, it can handle large file but the performance is so bad and I always wonder why it still exists in Windows.
Notepad++ and Visual Studio Code refuse to load large files, so avoid them. Finding a nice editor is what every
developer should do.
Node
Normally MSBuild builds multiple projects in paralle. Different loggers can be used, we only use the default console
logger, which is multiple-processor aware. In the log file, each chunk of log, which can be one or multiple lines, is
prefixed with the build node number, for instance:
Build started 4/28/2016 4:05:25 PM.
1>Project "X:\bt\713442\repo\src\dirs.proj" on node 1 (default targets).
1>Building with tools version "12.0".
...
1>Project "X:\bt\713442\repo\src\dirs.proj" (1) is building "X:\bt\713442\repo\src\deploy\dirs.proj" (2) on node 1 (VerifyAlteredTargetsUsed;Build target(s)).
2>Building with tools version "12.0".
...
2>Project "X:\bt\713442\repo\src\deploy\dirs.proj" (2) is building "X:\bt\713442\repo\src\deploy\WABO\dirs.proj" (6) on node 1 (VerifyAlteredTargetsUsed;Build target(s)).
6>Building with tools version "12.0".
Suppose you are inspecting a particular project, try to determine the node, then follow all the lines with the right
node as prefix.
Start and finish
A project is started by loading the specific version of tool:
6>Building with tools version "12.0".
When the build is finished, the last line looks like:
6>Done Building Project "X:\bt\713442\repo\src\deploy\WABO\dirs.proj" (VerifyAlteredTargetsUsed;Build target(s)).
Each target also has the similar pattern. If you want to search the start and end, you may use the following regex
pattern (applies to VIM):
project .* on node\|done building project
Copying files
Sometimes the project output could have missing files or wrong version of dependencies. Typically one can check
CopyFileToOutputDirectory to see how the files are copied, for instance:
54>CopyFilesToOutputDirectory:
Copying file from "X:\bt\713442\repo\src\Tools\Hugin\obj\amd64\Hugin.exe" to "X:\bt\713442\repo\out\retail-amd64\Hugin\Hugin.exe".
The following regex can be used:
copying file from .*obj.* to
Once you are familiar with the structure of build log and know a little bit into MSBuild targets, it is quite
straightforward to search the log and investigate the issue. If you have any question drop a comment and I will try to
answer.
Given an interface, it is straightforward to generate implementations of all members (or missing ones of abstract class)
using either Visual Studio or ReSharper. Sometimes, code generation in IDE may not be enough for handling boilerplate
code. Recently I have to deal with a WCF service proxy, where retry and logic mechanism needs to be put in every
method. Basically the scenario is the following:
There is a WCF service running on multiple-replica setup, and the interface is changing constantly.
The client side wants to call the service and has to deal with unreliable channel with reconnect. However, the
client code simply takes an interface object and uses it just like all other libraries, i.e. without any special
logic of retry / reconnect / timeout.
It is preferable to not make any change to the client code. Finding all references of the interface and fixing the
use pattern is not an option, and it may be huge amount of work anyway.
My solution is providing a service proxy, which implements the service interface and forwards all calls to the actual
service endpoint with proper logic. As one can imagine, boilerplate code has to be put in every method of the proxy
class, something like:
Thanks to reflection, it is fairly easy to inspect the interface type to discover the full details, and then generate
the boilaterplate code automatically. This can be performed in both C# and PowerShell. However, to integrate this
process with the build, it is simpler to use PowerShell. The basic process is the following:
Load the interface assembly using System.Reflection.Assembly.LoadFrom
Load the interface type using Assembly.GetType
Discover all methods using Type.GetMethods
For each method, generate the code based on ReturnType and the list of parameters using GetParameters
For the parameter type, we need to check if it is input, output, or ref parameter and handle the code generation
accordingly. Another thing to consider is handling the generic types, which Name or ToString() will generate a
string like:
which will not work unless converting to C# declaration List<FileStream>. The issue of type resolving can be handled
using embedded C# code in the script:
If you are interested, you may download the script here.
Next, one can include the auto-generated code in the build by modifying csproj file:
The last question is where and how to generate the code in the right location, i.e. $(IntermediateOutputPath). Since
the code generation is needed before C# compiler is invoked and it requires the interface assembly being accessible (as
well as its dependencies), the best approach is to define a target AfterResolveReference. This target is empty by
default, it is invoked after all project references are resolved and handled properly and before Compile target is
started. One can also use BeforeCompile target, which is invoked a moment after “AfterResolveReference” and right
before CoreCompile. To get more details on this, read Microsoft.Common.targets in .NET framework directory on the
machine.
With this change in csproj file, the only human-written code is a partial class with CallWithRetry method which is
referenced by all interface methods. No matter how interface changes, the maitainence cost is limited to the partial
class itself.
In previous post I discussed how to use SOS to analyze the memory consumption issue in managed code. An object leak
issue is found in a store component where neither ref-count nor GC is used to manage the object life cycle – it simply
trusts the client to do the right thing. Unfortunately we did not. As a part of upscaling the clusters, memory
consumption must be under control, over 20 GB of usage is unacceptable. After the fix, our program on a specific
production cluster which used to take 8.4 GB on fresh start now consumes 2.5 GB of private working set. Peak time
consumption is reduced significantly. Moreover, the codex size, which reflects the amount of objects being managed by
the control center, is reduced by 86% on a cluster. Proper RCA and high code quality is proven to be an effective way
to push the platform to a new level.
Recently I read a great article Major Linux Problems on the Desktop 2016
edition with pain and sadness, yet
I agree with most points in it. People have wished and talked many years, but 2016 is not a year for Linux on the
desktop. Let’s face the reality, Android has much greater chance to be the “Desktop” OS than Linux 5 years down the
road.
I don’t remember what was the first version of Linux distro I used, probably Slackware 3.x.
It was fun to fiddle with pkgtool, change the font in the text console, and play with my own Unix-like environment.
Then it’s Red Hat Linux 5.2,
Turbolinux, and other distros. In my part-time project, I led the effort to
run Linux with X Window in embedded environment with 16-MB flash, used MIT-SHM
to accelerate the graphics, and trained the team to migrate from Windows to Linux development platform. Later I built
HPC cluster using the first version of Fedora, and worked with Linux and Unix for a few years (primarily for scientific
computing purpose) until I joined Microsoft. Although I did play with other OSes such as FreeBSD, Tru64, HP-UX, Mac OS,
occasionaly, Linux is my favorite one for work and tinkering.
Year after year, Linux devs have been trying to replace Windows on the desktop. Obviously, integrated desktop
environment is the critical part of this effort. 15 years ago, there was fvwm95 which mimicked Win95 like the
following:
Then KDE showed up and has been greatly improved over time. Some PCs were even preloaded with Linux (and failed
miserably). For people who are tech savvy and willing to spend time, Linux can be a decent desktop OS after some
tweakings and customizations. However hackers are only tiny percentage of whole population. For most ordinary
people, Linux doesn’t work. We should analyze the problems and learn our lesson on this.