About Skype
Skype is a
global communication icon, THE brand name for internet calling. The Luxemburg-based company was founded in
2003 and has been growing tremendously ever since.
To get an
idea on how big Skype is, some numbers:
1. 300 million registered users
2. On average 65 million active online
users
3. Currently 300 million of video
calling minutes per day or 1.8 billion hours a year
Impressive
isn't it?
But numbers
are one thing, connecting people is what really matters. When visiting Lync Conference 2014/Las Vegas
last February, I used Skype to keep in touch with my wife and 2 young
kids. A great example of how a devoted Lync
user still has to rely on a consumer client to connect with peers. You can imagine that I can't wait for Lync -
Skype video federation, so I can do this straight from my beloved Lync client !
The Acquisition
On May 10,
2011 Microsoft announced that they would acquire Skype and on October 13, 2011
the deal was finalized and approved by regulators. Microsoft payed around 8,5 billion for Skype,
which is a huge amount of money but not a bad deal considering the fact that
Facebook just bought WhatsApp for 19 Billion.
Both have a massive user base but Skype probably has a bit more
technology to offer: voice codecs/technologies and experience with using voice
on the public internet. For Microsoft
these are great assets and complementary with their Enterprise Communication solutions
like Lync and Exchange.
Skype Federation V1 : can you hear me?
General
To better
understand what's new in Skype Federation V2, we will have a look on Skype
Federation V1 and how it works.
With the
release of Skype client version 6.0, Skype to Lync federation version 1 (V1)
was enabled. Windows Live Messenger
users needed to download the latest Skype client and merge their Microsoft
account with their Skype account. All their Messenger buddies would be merged with
their Skype contact list and the Messenger client was phased out gradually.
V1
federation supported following features on both clients: presence, peer-to-peer IM, adding contacts
from both clients, peer-to-peer audio with holding and resuming calls.
On the Lync client you can also block communication and transfer calls to a
different Lync contact.
Architecture
Skype V1 Federation architecture. Source : Lync & Skype session on LyncConf13
To make two
voice systems work with each other, you need to solve two problems: signaling
and codec compatibility. Signaling
allows two systems to exchange control information like call setup and media
negotiation, codecs allow media to be encoded into a binary stream
which can be send over the network.
Signaling
For Skype
federation with Lync, Microsoft reused the existing PIC federation
infrastructure between Lync and Windows Live Messenger. The great advantage of this approach was that
existing PIC-enabled Lync deployments were ready for Skype federation at launch,
without a change in configuration.
The control
channel between the Lync Edge server and the WLM Federation service in the Microsoft
Cloud is a SIP/TLS channel. This means that the signaling protocol used is SIP,
secured with TLS encryption.
Note: that's why those certificates on the Lync Edge
are very important: if one of both parties do not trust each other's
certificate, TLS negotiation will fail and the connection is refused
Now let's
talk about the signaling protocol used between Skype and the Microsoft Federation
Cloud. In the V1 architecture scheme taken from the Lync Conference 2013
session you can see that MSNP is referenced as the signaling protocol. Either the Skype client is now using MSNP as
the protocol for presence and IM, or there is some kind of interface between
the Skype network and the MSNP gateway. (I'm currently checking on this)
As a
conclusion we can state that the "signaling gateway" in the Federation cloud is
translating between SIP/TLS and the proprietary MSNP protocol, just like it did
with Windows Live Messenger federation.
Voice
For voice interoperability
there was a mismatch in codec and media negotiation on both clients. As a
solution Microsoft created an audio gateway within the Microsoft Federation
Cloud to resolve the media negotiation differences.
In case Skype uses a different codec than G722 to talk to the Federation Cloud, transcoding between both codecs is also required. Transcoding is the process that receives
a stream encoded with codec A, decodes the stream to a generic format, encodes
it again with codec B and sends it out to the destination. Transcoding
is a resource intensive process and always adds delay to a delay-sensitive
audio stream. If two codecs with lossy compression are used, it also degrades quality since the second encoding process
drops additional audio information.
I'm currently checking which codecs Skype uses to talk to the Federation Cloud, but based on the V1 architecture schema I would believe this is G722.
Codecs
During call
setup, both endpoints provide a list of their supported codecs to agree on the
codec being used for the media stream.
Using Snooper tool we can easily retrieve this information within a SIP
INVITE message from a Lync client to a Skype endpoint:
The numbers
in red indicate codec priority, the numbers in light blue are sample rate.
8000hz is considered narrowband quality, 16000 hz is considered wideband
audio.
One
exception is G722, due to an error in the RFC and backwards compatibility
requirements it is always advertised as 8000hz while it is in fact 16000Hz
wideband.
For a Lync
to Skype call, these codecs are advertised in order of preference:
1. G722 wideband 16000 Hz (advertised
as 8000hz)
2. G711 narrowband U-Law 8000Hz
3. G711 narrowband A-Law 8000Hz
4. Microsoft's RT Audio narrowband
8000Hz
As you can see in the following capture, Skype audio calls are using the G722 mono
codec:
Want to know more? Codecs used in Lync is the place to be.
Connection Flow
Skype Call to a Lync client inside the
corporate network
When a call is made between a Skype client and a
Lync user in the internal corporate network, both media and control messages
are flowing from the Lync client through the Edge server and the Microsoft Federation
Cloud:
To
verify this I've made a network capture for a Skype to Lync call on our own
Edge server. As you can see an UDP audio stream with the G722 codec is flowing
between the AV Edge service (82.143.85.XX) and the Microsoft Federation cloud
(134.170.96.XX).
Skype Call to a Lync Client outside the
corporate network
When the
Lync user is outside the corporate network, the Edge server is bypassed for
media and a direct connection to the Microsoft Federation Cloud is created.
Again
we verify this by creating a network capture on the client, and as expected
there now is a direct UDP G722 audio stream flowing between the Microsoft
Federation cloud (134.170.96.XX) and the local Lync client (192.168.1.157).
Important
to know is that for this test call, both Skype and Lync were running on the
same computer. A direct connection between
both clients would be preferred, but because of implementation and codec
differences the Audio Gateway in the Microsoft Cloud is used to facilitate the
call.
Security
Signaling
information is encrypted as we are using SIP over TLS as the signaling
protocol.
The media
stream is NOT encrypted:
1. In previous network captures we are
able to retrieve the codec information & payload
2. during media negotiation RTP is
advertised instead of Secure-RTP (SRTP)
Note: This
is the reason why you need to execute the command "Set-CsMediaConfiguration
-EncryptionLevel SupportEncryption" on your Lync Servers to enable Skype federation. By default Lync only
accepts encrypted media and by setting the EncryptionLevel to "SupportEncryption",
encryption is preferred but not required. More info
Conclusion
1. Skype Federation V1 enabled presence
and peer-to-peer IM/audio with Lync
2. Lync to Skype federation V1 was
completely based on the existing WLM Federation
3. All signaling traffic from Lync to
the MS Federation cloud is SIP/TLS
4. All signaling traffic from Skype to
the MS Federation cloud is proprietary MSNP
5. The MSNP Gateway in the Federation
Cloud translates between signaling protocols
6. Media streams flow through an audio
gateway in the Federation cloud
7. The audio gateway in the Federation
Cloud does transcoding which increases latency
8. G722 is used between Lync and the
Microsoft Federation cloud
9. Signaling is always encrypted, the
media stream is not
Skype Federation V2: I can see you!
General
Skype
Federation V2 was announced during the excellent keynote on Lync Conference 2014. The
feature catching the most attention was support for peer-to-peer video calling
with Skype federation. A nice demo was
used to show off this new capability and I would recommend watching
the keynote if case
you haven't done so already.
What is
often forgotten is that from a technology perspective, a lot has happened on both
products to make this work. Let's take the jump and dive deep into Skype
Federation V2!
Architecture
This is the
architecture for Skype federation V2
Skype V2 Architecture
The
signaling architecture hasn't changed a lot: the Microsoft Federation cloud
still has an MSNP gateway which translates between proprietary MSNP and
SIP/TLS.
The biggest
difference with V2 Federation is found at the media stack. Both the Skype and Lync client now support
each other's technologies allowing a peer-to-peer media connection. As a result, the audio gateway within the
Microsoft Federation cloud is no longer required for V2 calls.
Signaling
While from
an architecture perspective not much has changed, the MSNP gateway in the
Microsoft Federation cloud has received some improvements. This version now supports V2 clients and forks
incoming calls to both V1 and V2 clients.
(more about forking later)
What also
has changed is the Skype cloud itself. In an interview somewhere in December, Microsoft's
cloud Chief Scott Guthrie stated that Skype now runs on Azure. While we can only speculate on details, you
can bet Microsoft redesigned the Skype cloud architecture so it makes use of
the great and flexible architecture Azure has to offer. Microsoft also invested in improved call
setup and control capabilities within the Skype cloud.
Audio Codec
The reason
why we did transcoding in the V1 federation was because media negotiation and
codecs on both clients where not compatible.
In the last few months, Microsoft silently introduced the Skype specific
SILK codec in Lync with the November 2013 Cumulative Update and an upcoming CU
will enable compatibility with Skype.
If we take a look at the offered media codecs,
SILK is available in both narrowband and wideband versions:
In the
opposite direction Skype will get support for Forward Error Correction. FEC is
a technique currently used by Lync to improve audio quality on bad network
connections. FEC adds additional packets with redundant control information so
the receiving side can reconstruct the stream despite a higher degree of packet
loss on the network.
Note: Please note
that Lync will only use the SILK codec for Skype to Lync federation calls. While the SILK codec is known for its
excellent quality to bandwidth ratio, it will not be used for Lync to Lync
calls.
Lync
already has RTAudio support which is even a bit more efficient for bandwidth than
SILK (in narrowband scenarios). Unfortunately I could not find any decent
quality comparison between both.
Right now we can only guess on how both codecs
compare in real-life use cases as no real data is available. I speculate that
Microsoft is currently testing this but if and when SILK will be considered as
the primary narrowband codec is just guessing.
Video Codec
To be able
to support peer-to-peer video between Skype and Lync, video compatibility
between both clients is required. Lync
2013 supports H264 Mode 1 and guess what... a future release of Skype will also introduce
support for H264 Mode 1.
Note: Many
vendors only support H264 mode 0 while Lync's implementation of the layered
H264 Mode 1 has many advantages. For an extensive and detailed deep dive on how
H264 mode 1 works within Lync, I would recommend you to read Jeff Schertz's
excellent blog post "video interoperability in Lync 2013".
Just like with audio, both parties will
negotiate a common video codec through the signaling protocols and if that would
fail, the call is not established. Future
client releases will introduce compatibility and enable Skype video.
Connection Flow
Note: These
connection flows will only be used when both Lync and Skype clients are a
supported V2 version. If one of the clients is a "legacy" V1 client, the
connectivity flow of V1 Federation will always be used.
Skype Call to a Lync client inside the
corporate network
Compared to
a V1 call there are 3 major differences:
1. Both clients now use the SILK codec
and compatible media negotiation
2. As a consequence, transcoding within
the Federation cloud is no longer required
3. Media traffic is now encrypted
Using V2 we
now have removed the transcoding hop in the connection which will result in lower
delay and improved quality. On top of that our Skype to Lync call is now encrypted.
Skype Call to a Lync Client outside the
corporate network
Because we
now have two compatible clients, a peer-to-peer direct media path can be used
if no firewalls are blocking the connection.
This results in the most optimized path between both endpoints.
V1 and V2 Interoperability
In a
perfect world everybody would update their software soon after a new release is
made available. Just looking at how much people still use Windows XP (upgrade
guys!!), we all know that it can take a while before this actually happens.
For Skype
federation this means that for a considerable amount of time both V1 and V2
clients will be out there. How does
Microsoft handle this?
With calls FROM Lync TO Skype, the MSNP gateway in the Federation Cloud will fork the
call to all registered endpoints (being V1 or V2). If the Skype user takes the call on a V1
endpoint, only V1 capabilities will be supported for that call. If the Skype user takes the call on a V2
endpoint, V2 capabilities like video will be available.
Calls FROM
Skype TO Lync are routed through the Skype
cloud to the Lync Federation Edge service. V1 clients will make use of V1 Skype
infrastructure components, while V2 clients will automatically make use of V2
Skype infrastructure.
Enterprise NAT Traversal
Another
Lync technology that made it into Skype is support for NAT traversal techniques
based on the ICEv19 protocol in combination with STUN and TURN. These suites of protocols allow clients
behind a NAT device to detect their public IP addresses and use it during
connection setup. For a detailed
explanation of the process I will refer to Jeff Schertz with his great "STUN vs TURN" blog post.
Dreaming about the future...
Note:
everything here is pure speculation and my personal wishlist. Neither of these features might actually make it sooner or later!
Multi-party
support
Right now only one-to-one calls are supported between Lync and Skype. It would be great to
have mixed multi-party IM and audio/video calls support for federation.
Conferencing
support
Wouldn't it be amazing that Skype users could join a Lync meeting and view shared content?
SILK in
Lync
Not really required for me, but if intensive testing proofs that SILK has a much better quality than RTAudio while using about the same bandwidth, I would like to see it enabled for Lync to Lync calls
Xbox One support
Imaging seeing yourself in the couch in front of your Xbox One and issue the voice command "skype attend meeting". 2 seconds later the Xbox joined the meeting with your colleagues who are participating from a Lync Room System enabled meeting room at the office! Using Kinect gestures you switch between speaker views and meeting content.
Bring it on!
Sources
Lync
Blog: Lync Federation will continue as WLM transitions to Skype
Skype
numbers
Skype on WikiPedia
Skype &
Lync session on Lync Conference 2013
Skype & Lync
session on Lync Conference 2014
Matt
Landis: Some Thoughts on Skype & Federation
Matt
Landis: Breaking, Skype to Lync Connectivity is live today