January 27, 2017

Attending an MPEG meeting as an academic researcher

I recently attended an MPEG meeting for the first time. I am now used to attending academic conferences (for the best and the worst) but I had never attended a meeting of a standardization group before. Overall, my feedback is very positive and I will probably embrace a bit more the standard circus in the future (hopefully I will not wait forty years before attending another standard meeting).

I especially appreciate the commitment of researchers during the MPEG sessions. The attendees are engaged into a "technical/scientific conversation" with the researcher who presents his contribution. It is in no way comparable to the experience of most academic talks. I identified some of the key differences between a meeting group at MPEG and a standard session in an academic event:
  • The scope of a meeting group is very narrow. For example, I attended the meeting of the ad-hoc group in charge of discussing projections of 360° videos into 2D maps. Every attendee had good reasons to attend this meeting in particular, so free riders were minority. In an academic conference, the Program Chairs tries to schedule the presentations so that papers sharing a similar topic are gathered, but the objectives of these academic papers are often quite different. Instead the contributions during a standard session share the same objectives, which inevitably invite researchers who are experts in the domain to argue about the pros and cons of every contribution, including their owns.
  • The presentation is not the end, it is the beginning of something wider, which is to eventually contribute to a common (un-authored) document. The chairman is in charge of writing a consensual document after the meeting and a presenter aims at convincing attendees that his contribution is worth being included in this document without reserve. In an academic conference, the motivation of the presenter is to be present so that an accepted paper is not withdrawn from the digital library due to no-show.
  • When a presenter is invited to introduce his contributions, it is no showtime. He usually stays at his seat and he scrolls over the document that every attendee has previously opened (most people had a look at the contributions beforehand). There is no talk, no slides, no formalism. Only the presenter, his contribution, and engaged attendees. The debate related to a contribution can be two-minutes long or one hour-long. I found it much more lively than well-formatted slide-based talks.
I also appreciate this feeling of being useful as a "public scientist" in a population that is mostly comprised of private researchers. A scientist has various ways to disseminate the knowledge he is supposed to produce with respect to his public funding and salary. Academic conference is the most common way. Some scientists create start-ups. Some scientists develop strong ties with companies and spend most of their energy collaborating in projects. Good reasons to disseminate in standard meetings include:
  • The contribution from a public academic researcher is (usually) not driven by mercantile private interests. We are supposed to provide something that is closer to The Scientific Truth than what other researchers from competing companies can claim. I understand that one of the missions of a public researcher attending a standard meeting is to ensure that what will eventually become a widely used standard is not an aggregation of patented technologies but rather a scientifically solid and open solution.
  • Every scientist hopes that the fruit of his research will be eventually exploited, whether indirectly to contribute to a better knowledge or directly by integration into an object that is useful to the society. In applied research topics such as computer science, the academic conferences are not necessarily the best way to convey ideas to the companies that are in capacity to exploit a scientific result. The academic world is mostly fuzzy and closed. A standard group appears as a direct way to enable the exploitation of scientific ideas, without restriction.
Of course, the experience of attending an MPEG meeting also includes annoyances: A lot of time is spent at orchestrating the various standard sub-groups, some guys can ruin a whole meeting by interfering with every presenter, the circus is full of jargon and bizarre usages, which prevent a newcomer to join, political and business games exist ... but the advantages are also numerous (including but not restricted to the above). Overall, the balance is in my opinion positive.

October 2, 2015

Can Multipath Boost the Network Performances of Real-time Media?

I would like to emphasize now a paper that deals with multipath networking for video streaming. This paper is:
Varun Singh, Saba Ahsan, and Jörg Ott, “MPRTP: Multipath Considerations for Real-time Media”, in Proc. 4th ACM Multimedia Systems Conference (MMSys '13), Oslo, Norway, Feb. 2013
and it has led to multiple actions in IETF standardization group.

There are multiple routes between two hosts in the current Internet. This statement tends to be even truer when considering the flattening Internet topology, where Internet Service Providers (ISPs) have multiple options to reach a distant host. It is also truer with the multiple network interfaces available in the modern mobile devices and the multiple wireless network accesses that co-exist in the urban environment. The question now is about the exploitation of these multiple routes. The network protocols that are in used today stick to the traditional monopath paradigm. Yet, scientists have shown that leveraging multipath can bring many advantages, including better traffic load balancing, higher throughput and more robustness. 

This paper, which is already two years old, studies multipath opportunities for the specific case of conversational and interactive communication systems between mobile devices (e.g. Skype). These applications are especially challenging because the traffic between communicating hosts should meet tight real-time bounds. The idea of this paper is to study whether the most widely used network protocol for the applications, namely Real Time Transport Protocol (RTP), can be turned into a multipath protocol. They thus propose a backwards-compatible extension to RTP called Multipath RTP (MPRTP).

In short, this paper presents the MPRTP extension and evaluates its performance in several scenarios. First, the authors comment the main challenges that an extension of RTP protocol must face in order to split a single RTP stream into multiples subflows. Second, the authors present the protocol details as well as the algorithms that are considered to solve these challenges. Third, simulations are conducted to evaluate the performance of the proposal.

Authors point out that a MPRTP protocol should be able to adapt to bandwidth changes on the paths by redistributing the traffic load among them in a smooth way to avoid oscillations. This is especially important in the case of mobile communications where quick capacity changes are common. To guarantee fast adaptation, the authors propose packet-scheduling mechanisms that do not abruptly reallocate traffic among congested and non-congested paths if a path becomes suddenly congested. 

Other important issue is the variation on packet inter-arrival time (packet-skew) among the different paths.  The fact of having multiple diverse paths make harder to estimate the right buffer size to prevent this issue. To overcome this problem the authors propose an adaptive playout buffer, which individually considers the path skew in each path. They also privilege the selection of paths with similar latencies.

The choice of suitable transmission paths should consider the path characteristics in terms of QoS metrics as losses, latency or capacity. The authors propose several extensions to the RTP protocol, including a new RTP reporting message (where the receiver provides QoS data per sub-flow) and a scheduling algorithm (where the sender uses these reports to decide a traffic distribution among the available paths). 

All the aforementioned extensions are always designed to be backwards compatibility, i.e. traditional RTP hosts can interoperate with hosts equipped with MPRTP extensions in single-path scenarios. 

An exhaustive battery of simulations is conducted to evaluate the MPRTP performance in a broad range of scenarios: (i) path properties (losses, delays, and capacities) vary along time; (ii) paths share a common bottleneck, and (iii) MPRTP is deployed over mobile terminals using WLAN and/or 3G paths. These evaluations show that (1) the dynamic MPRTP performance is not far from the static performance for single and multipath cases, (2) MPRTP successfully offloads traffic from congested paths to the other ones keeping some proportional fairness among them, and (3) on lossy links multipath is more robust and produces fewer losses with respect to single path.

Overall, this paper addresses a significant problem (how to make a real-time UDP-based protocol multipath) with a comprehensive study. It is one of the first attempts to exploit multipath functionalities in the framework of multimedia communications, and especially with tight real time limitations. This paper thus perfectly completes the works that have been done by the network community on multipath TCP protocols. That being said, many problems related to multipath multimedia protocols are still open. Among others, let us cite rate-adaptive streaming and multiview video in the context of multipath.

September 19, 2015

Understanding an Exciting New Feature of HEVC: Tiles

As an editor of the IEEE R-letter, I write every now and then some short "letters" (one page long easy-going text) about a recent research article that I found especially interesting. I think it is appropriate to have also these letters put in this blog. Thus, I will publish them also here.

The paper I'd like to emphasize is:
Kiran Misra, Andrew Segall, Michael Horowitz, Shilin Xu, Arild Fuldseth, and Minhua Zhou, “An Overview of Tiles in HEVC”, IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No 6, December 2013

The High Efficiency Video Coding (HEVC) standard significantly improves coding efficiency (gains reported as 50% when compared to the state-of-the-art MPEG-4 AVC H264), and thus is expected to become popular despite the increase in computational complexity. HEVC also provides various new features, which can be exploited to improve the delivery of multimedia systems. Among them, the concept of tiles is in my opinion a promising novelty that is worth attention. The paper "An Overview of Tiles in HEVC" provides an excellent introduction to this concept.

The goal of a video decoder (respectively encoder) is to convert a video bit-stream (respectively the original sequence of arrays of pixel values) into a sequence of arrays of pixel values (respectively a bit-stream). The main idea that is now adopted in video compression is the hierarchical structure of video stream data. The bit-stream is cut into independent Group of Pictures (GOP), each GOP being cut into frames, which have temporal dependencies with regards to their types: Intra (I), Predicted (P) or Bidirectional (B) pictures. Finally, each frame is cut into independent sets of macroblocks, called slices in the previous encoders.

The novelty brought by HEVC is the concept of tile, which is at the same "level" as slice in the hierarchical structure of video stream data.

The motivations for both slices and tiles are, at least, twofold: error concealment and parallel computing. First, having an independently parsable unit within a frame can break the propagation of errors. Indeed, due to the causal dependency between frames, an error in a frame can make the decoder unable to process a significant portion of the frames occurring after the loss event. Slices and tiles limit, at least from a spatial perspective, the propagation of an error on the whole frame. Second, the complexity of recent video and the requirements of high-speed CPU speed (which unfortunately requires power and generates heat) can be partially addressed by parallelizing the decoding computation task across multiples computing units, regardless of whether these are cores in many-cores architectures or computing units in Graphics Processing Units (GPUs). The independency of slides and tiles is expected to facilitate the implementation of video decoder on parallel architectures.

Unfortunately, the concept of slices suffers in practice from serious weaknesses, which tiles are expected to fix.

In the paper, the authors introduce the main differences between tiles and slices, which are two concepts that, at a first glance, can be confused. They focus on the motivation for parallel computation.

The first part of the paper explains in details the main principles between both approaches, in particular the fact that tiles are aligned with the boundaries of Coded Tree Blocks (CTD), which provides more flexibility to the partitioning. This brings several benefits: a tile is more compact, which leads to a better correlation between pixels within a tile when compared to the correlation between pixels in a slice. Tiles also require less headers, among other advantages.

The authors also introduce the known constraints to be taken into account when one wants to use tiles today. The whole Section 3 is about the tile proposal in HEVC, and the main challenges to be addressed for a wide adoption. Next, the authors present some examples when tiles are useful. Both parts are written so that somebody being just familiar with the concepts can understand both the limitations behind the concept of tiles and how these weaknesses have been addressed in practice.

The last part of the paper, in Section 5, deals with some experiments, which demonstrate the efficacy of HEVC for lightweight bit-streams and parallel architectures. At first authors assess the parallelization and the sensibility of network parameters, including the Maximum Transmission Unit (MTU), on the performances of slices versus tiles. They finally measure the performances of stream rewriting for both approaches.

In short, the paper shows that tiles appear to be more efficient than slices on a number of aspects. The paper proposes a rigorous, in-depth, introduction of the main advantages of tiles. This can foster research on the integration of tiles into next-generation multimedia delivery systems.

September 2, 2015

Uploading innovative engineers: 15 years remaining

Four years ago, I wrote an outrageous post about how "un-geek" are French engineers in average. Since 2011, many things have changed in France: code is expected to be (soon) taught in elementary schools, successful geek entrepreneurs are in the spotlight, geek-ish schools and co-working hacker spaces flourish, ... It will take time, but, hopefully, France in 2025 will be geek-friendly. Now, what about innovation?

Entrepreneurship has become a cause nationale in France, with a lot of initiatives and announcements. Analysts try to decipher the structural problems regarding innovation in France, in particular an excellent study (in french) about innovation "ecosystems" was released yesterday. Everything said in this article is 100% true... but it misses a point: how "un-innovative" are the French higher-educated people in average.

As a teacher in a high-education engineering school, I have headed an "Innovation & Entrepreneurship" course for 8 years (with some success stories here and there). Every student must follow this course. From my experience of teaching this innovation course to around 180 students every year, I can just recall that the average higher-educated students (usually coming from Classes Préparatoires) struggle to:
  • Deal with uncertainties. The most brilliant scientific students are those who excel at finding solutions to problems. But what about when there is no clearly identified problem? And what about when any solution to a problem has its pros and cons? Most of the students who would have not enrolled in an Entrepreneurship program if they had the choice are very uncomfortable with uncertainties. They are the right targets for innovation mindset re-formatting.
  • Convince. The French education system does not include any training in talking, debating, arguing, more generally communication skills. When every US kid should defend a point in a science fair, the same age French kids are taught how to raise the hand before talking, the quieter the better. Oral debates barely exist at french school. As a matter of fact, it is frequent that students give their very first "public" talk when they are 20 years old. Teaching the art of pitching is necessary for every student.
  • Accept being a failure and a rebel. This is especially true during brainstorming and creativity sessions where it is common that somebody, say Jo, suggests a high-risk or out-of-the-box idea but almost immediately the fear of being judged makes Jo himself overturn his own damned idea. I'd love to put Jo in more creativity training sessions so that he becomes self-confident enough.
The percentage of engineers who have these three core competencies (an innovation-friendly mindset) in 2015 is as low as the percentage of engineers who had a geek-friendly mindset in 2011. Solutions like super-hyped incubators or  state-owned VCs are right but they are similar to providing xDSL broadband connections to geeks in 2000s. It is cool for the happy fews, but it does not change the mindset of the others.

In my opinion, a successful innovation ecosystem is such that everybody in the society (especially every higher-educated worker) has an innovation-friendly mindset. Everybody means here people who do not aim to become entrepreneur and even those who are not directly related to innovation. No society can afford that a majority of higher-educated people have not developed in particular these three key competencies at school. The structural reasons behind this failure for average higher-educated workers are in my opinion more critical than an imperfect innovative ecosystem for a tiny fraction of innovators. Indeed, the lack of inclinations toward uncertainties, communication skills and rebel-attitude is a transmissible disease for any innovative ecosystem.

It is the mission of teachers in high-education institutions to fight the stigmata of twenty years of un-innovative mindset formatting. The special "Entrepreneurship" programs that are commonly offered in other higher-education institutions (or in online courses) do not contribute to this mission because these programs enroll volunteering students who have already overcome their innovation-related mindset limitations. These students are not the right target. To set up a profoundly innovation-friendly ecosystem in 2030, we have to train all higher-educated students now so that innovation will be pervasive in the society, especially at schools, in community groups and in the traditional companies. Hopefully, the ecosystem will then be friendly to entrepreneurs... 

March 24, 2015

Ten years as an academic scientist: preamble of my HdR

Here is the preamble of my HdR, which I will defend on April the 7th 2015 at Rennes.

I defended my PhD thesis ten years ago. At that time, my research domains included peer-to-peer systems, mobile ad-hoc networks and large-scale virtual worlds. Today, these topics hardly get any attention from the academic world. Although most papers published in the early 2000s advocated that centralized systems would never scale, today's most popular services, which are used by billions of users, rely on a centralized architecture powered by data-centers. In the meantime, the open virtual worlds based on 3D graphical representation (e.g. Second Life) fell short of users while social networks based on static text-based web pages (e.g. Twitter and Facebook) have exploded. I do not want to blame myself for having worked in areas that have not proved to be as critical as they were supposed to be. Instead, I would like to emphasize that I work in an ever-changing area, which is highly sensitive to the development of new technologies (e.g. big data middleware), of new hardware (e.g. smartphone), and of new social trends (e.g. user-generated content).

I envy the scientists who are able to precisely describe a multi-year research plan, and to stick to it. I am not one of them. But I am not ashamed to admit that my research activity is mostly driven by short-term intuition and opportunities and that the process of academic funding directly impacts my work. Indeed, despite all of the above, I have built a research work, which I retrospectively find consistent. And more importantly, I have been relatively successful in advising PhD students and managing post-docs, all of them having become better scientists to some extents.

In very short, I have developed during the past ten years a more solid expertise in (i) theoretical aspects of optimization algorithms, (ii) multimedia streaming, and (iii) Internet architecture. I have applied these triple expertise to a specific set of applications: massive multimedia interactive services. I provide in this manuscript an overview of the activities that have been developed under my lead since 2006. It is a subset of selected studies, which are in my opinion the most representative of my core activity.

I hope you will have as much fun reading this document as I had writing it.

November 7, 2014

A Dataset for Cloud Live Rate-Adaptive Video

There is an audience for non-professional video "broadcasters", like gamers, online courses teachers and witnesses of public events. To meet this demand, live streaming service providers such as ustream, livestream, twitch or dailymotion have to find a solution for the delivery of thousands of good quality live streams to millions of viewers who consume video on a wide range of devices (from smartphone to HDTV). Yet, in current live streaming services, the video is encoded on the computer of the broadcaster and streamed to the data-center of the service provider, which in most cases chooses to simply forward the video it get from the broadcaster. The problem is that many viewers cannot properly watch the streams due to mismatches between encoding video parameters (i.e. video rate and resolution) and features of viewers’ connections and devices (i.e. connection bandwidth and device display).

To address this issue, adaptive streaming working along with cloud computing could be the answer. Whereas adaptive streaming allows managing the diversity of end-viewers requirements by encoding several video representations at different rates and resolutions, cloud computing provides the CPU resources to live transcode all these alternate representations from the broadcaster-prepared raw video.

It is well known that the QoE of an end-viewer watching a stream depends on the encoded video and the parameters values used in the transcoding. But, in this new scenario in the cloud, we also need to consider the transcoding CPU requirements. In the “cloud video” era, the selection of video encoding parameters should take into account not only the client (for the QoE), but also the data-center (for the allocated CPU). To set the video transcoding parameters, the cloud video service provider should know the relations among transcoding parameters, CPU resources and end-viewers QoE, ideally for any kind of video encoded on the broadcaster side.

We would like to announce the publication of a dataset containing CPU and QoE measurements corresponding to an extensive battery of transcoding operations in http://dash.ipv6.enstb.fr/dataset/transcoding/ with the purpose of contributing to research in this topic. Most of the credits for this work (and so this post) have to be given to Ramon Aparicio-Pardo.

To elaborate the dataset, we have used four types of video content, four resolutions (from 224p up to 1080p) and bit rates values ranging from 100 kbps up to 3000 kbps. Initially, we have encoded each of the four video streams into 78 different combinations of rates and resolutions, emulating the encoding operations at the broadcaster side. Then, we transcode each of these broadcaster-prepared videos into all the representations with lower resolutions and bit rates values than the original one. The overall number of these operations, representing the cloud-transcoding, was 12168. For each one of these operations, we have measured the CPU cycles required to generate the transcoded representation and we have estimated the end-viewers’ satisfaction using the Peak Signal to Noise Ratio (PSNR) score). We depict a basic sketch of these operations for one specific case where the broadcaster encoded its raw video with 720p resolution at 2.25 Mbps and we transcode it into a 360p video at 1.6Mbps.

We give below an appetizer of how these CPU cycles and satisfaction decibels vary with transcoding parameters. They show some examples of the kind of results that you will find in the dataset, here a broadcaster-prepared video of type “movie,” 1080p resolution and encoded at 2750 kbps. If you wonder how the rest of figures look like, 558 curves and their corresponding 12168 measurements of cycles of hard CPU work and decibels of viewers’ satisfaction are waiting for you in http://dash.ipv6.enstb.fr/dataset/transcoding/

October 13, 2014

Toward a new public higher education system

My previous post was quite harsh about the way the French government addresses the MOOC phenomenon. I would like now to be more constructive (and also to demonstrate that I am not only a moaner). So, basically, what would I do if I was French ministry of Higher Education! In short:
  • I'd shut down FUN. To be competitive, such project requires investment an order of magnitude greater than the planned fundings. When the objective is to attract tens thousands of students, there is no room for small players. 
  • I'd stop fundings through call for proposals. These calls grant people who know how to write proposals and who, in the best case, release results years later. Moreover, and most importantly, these calls do not give the sense of responsibility to university managers. French higher education institutions have to learn how to promote their best professors and to make them "MOOC-able" instead of begging to the government as if "make a MOOC" was a right.
  • I'd massively invest on French-friendly start-ups. The focus should be on three main domains where the position of France is today weak: an European-scale portal, tools for scalable learning, online student evaluation. The investment can be leaded by a structure such as BPI France.
In the following, I give my personal analysis of the context. I first decompose the traditional functions of a higher education institution, and analyze the challenges.
  • Define the topic of the courses. In France, the institutions conceive curriculum, which are then checked by academic accrediting agencies like ABET in US, or CTI and AERES in France. Shortly put, the curriculums target young people (named students) and aim at developing their employability. Several courses form a consistent curriculum. As for the MOOCs, students are mainly workers, with a large diversity of motivations. The course is a unit, which should be independent. The topics are focused. It is thus quite different, but not fundamentally challenging.
  • Select the students. This is the main asset of the Grandes Ecoles. However, MOOC are (expectedly) scalable, so you can teach an unlimited number of students. The question is no more to filter the best students before the course. The aim now is to have the right audience for the course: as many students as possible, with a high motivation for the topic and the right background. As said in my previous post, portals like Coursera are far better than any French higher education institution. 
  • Build the course. Every MOOC creators agree that building a scalable online course is quite different from a traditional course for a small, on-site, population. MOOCs require new categories of workers. But the role of the teacher is still prominent. So far, the teachers have worked in traditional higher education institutions.
  • Deliver the course. A building full of classrooms is useless. What is needed is a great, scalable, full-featured learning management tool. Moreover, you need a competitive team of developers to implement online exercices which have an added value and increase the student experience. Here, again, I don't think that any traditional french higher education institution can compete in providing such tool. Only a team of excellent super-committed software developers can do it.
  • Assist students during their learning experience. The challenge of MOOCs is to provide the same kind of assistance as for a traditional course with one professor and a dozen of students, although the number of students is in the order of thousands. The power of community is the lever.
  • Evaluate the students. When students are spread all over the world, it is impossible to organize exams the usual way. Companies like ProctorU have developed offers, where either exam rooms are available anywhere in the planet, or specific, secured, online tools allow anybody to be monitored as if she was on-site.
In the traditional model, all these functions are fulfilled by higher education institutions. In the new model related to MOOC, I foresee that traditional institutions will be outperformed by start-ups on a subset of functions: create a portal to attract students, develop a scalable learning platform, and evaluate students worldwide. These functions require strong skills in software development, in empowering a community of open-source developers, in promotional activities and marketing, in worldwide staff management, in agile development, in reliable online infrastructure, in website design. My claim is that neither universities nor public structures have any of the above skills.

Instead, I suggest to give a special mission to BPI France to make sure that funding goes to the most brilliant European start-ups related to education, in particular on the aforementioned functions (attract students worldwide, develop scalable learning platforms, evaluate students). By investing on European SMEs, the emergence of a champion is possible. And if the public force is one of the main investors, it may also ensure some of the "public missions" (e.g. almost free access to knowledge). Examples of such brilliant European start-ups include OpenClassroomIversity and FutureLearn.

On their side, the traditional French higher education institutions have to evolve. I like the analogy between MOOC and scientific books. Not all professors write books. Not all institutions ask their teaching staff to write books. Excellent professors (experts in some area, extremely brilliant as teachers) attract editors because the books they may write can become a success. It is thus up to the institutions to decide whether they should promote their excellent professors so that they may be detected by editors. Being "MOOC-able" is now a criteria for hiring professors in EPFL according to its director. This is the kind of shift French institutions have also to embrace.