September 29, 2010

Open-source software in the Internet of Things: why we need repository-less package management system

Software has become one of the most critical User-Generated Content (UGC). The number of software that are daily created or updated is overwhelming: the SourceForge community aggregates more than 2 millions of software producers, contributing on 240,000 projects software. The increasing popularity of application stores (e.g. more than 180,000 applications in the Apple Store) confirms several trends in the software industry:
  • crowdsourced software has become a key economical argument. Apple typically takes advantage of the number of third-party applications that are available exclusively on its devices. The capacity to offer, in a short time, the largest and most diverse amount of software and services is a challenge. In this context, most large actors of the communication industry, including phone manufacturers and network operators, propose incentives for developers (from monetary compensation to open access to data and API), which tend to reinforce the proliferation of new software.
  • pervasive environments need crowdsourced software. The explosion of the number of devices, as well as commercial issues (especially the time-to-market), induce a gigantic demand for software development. Actually, this demand exceeds by far the capacity of classic software producers. For example, the strength and dynamism of the Linux community is a key factor explaining the rising popularity of Linux OS for small devices.
In comparison to classic UGC aggregation, the management of user-generated software is a challenging task. Indeed, modern software often consist of a huge number of small packages. These packages have inter-dependent relationships that may easily be broken during the deployment life-cycle. Thus finding an efficient and reliable way to maintain, distribute and install these software packages over billions of machines is definitely an issue. In the current approach, software distributors rely on a set of repositories, which are centralized servers collecting all the packages that have been certified. We distinguish two major drawbacks in this architecture:
  • the certification of packages. The software distributor plays the role of a certification authority. Users must deposit their packages if they want them to be integrated into the repositories. The distributor verifies the integrity of the submitted packages and makes the valid ones available for other users to download. As addressed in the EDOS project, there exist various approaches and tools facilitating the management of large repositories of packages. However, the centralized structure requires expensive infrastructure and extra human management. The process of certificating third party packages is slow and complex. Typically, developers complain about the increasing delay for software availability in the Apple Store. Clearly, a centralized certification of packages does not scale. It is also a severe threat for the privacy of users.
  • the delivery of packages. It has been emphasized by Microsoft researchers that a set of repositories can not ensure a fast, planet-scale, delivery of packages. However, massive delivery of software patches is a key security requirement. If the number of devices grows as it is commonly admitted in the Internet of Things vision, the limits of a centralized repository-based architecture will soon reach its limits. Moreover, devices in pervasive environment are not necessarily always connected to the Internet. We need to also rely on intermediate devices and opportunistic ad-hoc communications if one wants to upgrade all devices, including the tiniest ones.
We need to revisit, in a clean-slate approach, the package management system: a fully distributed (repository-less) system, which presupposes a modification on the common inter-dependent relationships between packages. We propose an internship, which is expected to be a small first step in that direction.

No comments:

Post a Comment