...
Contents
...
Code Insights for Apache StreamPipes
Apache StreamPipes
Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.
Background
StreamPipes has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing. Although, since we are approaching with full stream towards our `1.0` release, we want to project also to get more mature. Therefore, we want to address one of our Achilles' heels: our test coverage.
Don't worry, this issue is not about implementing myriads of tests for our code base. As a first step, we would like to make the status quo transparent. That means we want to measure our code coverage consistently across the whole codebase (Backend, UI, Python library) and report the coverage to codecov. Furthermore, to benchmark ourselves and motivate us to provide tests with every contributing, we would like to lock the current test coverage as an lower threshold that we always want to achieve (meaning in case we drop CI builds fail etc). With time we then can increase the required coverage lever step to step.
More than monitoring our test coverage, we also want to invest in better and more clean code. Therefore, we would like to adopt sonarcloud for our repository.
Tasks
- [ ] calculate test coverage for all main parts of the repo
- [ ] send coverage to codeCov
- [ ] determine coverage threshold and let CI fail if below
- [ ] include sonarcloud in CI setup
- [ ] include automatic coverage report in PR validation (see an example here ) -> optional
- [ ] include automatic sonarcloud report in PR validation -> optional
- [ ] what ever comes in your mind 💡 further ideas are always welcome
❗Important Note❗
Do not create any account in behalf of Apache StreamPipes in Sonarcloud or in CodeCov or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.
Relevant Skills
- basic knowledge about GitHub worfklows
Learning Material
- GitHub workflow docs
- Apache StreamPipes workflows
- Sonarcloud for Monorepos
- Using code cov for a monorepo: https://www.curtiscode.dev/post/tools/codecov-monorepo/ & https://docs.codecov.com/docs/flags
References
You can find our corresponding issue on GitHub here
Name and Contact Information
Name: Tim Bossenmaier
email: bossenti[at]apache.org
community: dev[at]streampipes.apache.org
website: https://streampipes.apache.org/
Improving End-to-End Test Infrastructure of Apache StreamPipes
Apache StreamPipes
Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.
Background
StreamPipes is has grown significantly throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing.
StreamPipes really shines when connecting Industrial IoT data. Such data sources typically originate from machine controllers, called PLCs (e.g., Siemens S7). But there are also new protocols such as OPC-UA which allow to browse available data within the controller. Our goal is to make connectivity of industrial data sources a matter of minutes.
Currently, data sources can be connected using the built-in module `StreamPipes Connect` from the UI. We provide a set of adapters for popular protocols that can be customized, e.g., connection details can be added.
To make it even easier to connect industrial data sources with StreamPipes, we plan to add an OPC-UA browser. This will be part of the entry page of StreamPipes connect and should allow users to enter connection details of an existing OPC-UA server. Afterwards, a new view in the UI shows available data nodes from the server, their status and current value. Users should be able to select values that should be part of a new adapter. Afterwards, a new adapter can be created by reusing the current workflow to create an OPC-UA data source.
This is a really cool project for participants interested in full-stack development who would like to get a deeper understanding of industrial IoT protocols. Have fun!
Tasks
- [ ] get familiar with the OPC-UA protocol
- [ ] develop mockups which demonstrate the user workflow
- [ ] develop a data model for discovering data from OPC-UA
- [ ] create the backend business logic for the OPC-UA browser
- [ ] create the frontend views to asynchronously browse data and to create a new adapter
- [ ] write Junit, Component and E2E tests
- [ ] what ever comes in your mind 💡 further ideas are always welcome
Relevant Skills
- interest in Industrial IoT and procotols such as OPC-UA
- Java development skills
- Angular/Typescript development skills
Anyways, the most important relevant skill is motivation and readiness to learn during the project!
Learning Material
StreamPipes documentation (over the past few years, with new features and contributors joining the project. However, as the project continues to evolve, e2e test coverage must also be improved to ensure that all features remain functional. Modern frameworks, such as Cypress, make it quite easy and fun to automatically test even complex application functionalities. As StreamPipes approaches its 1.0 release, it is important to improve e2e testing to ensure the robustness of the project and its use in real-world scenarios.
Tasks
- [ ] Write e2e tests using Cypress to cover most functionalities and user interface components of StreamPipes.
- [ ] Add more complex testing scenarios to ensure the reliability and robustness of StreamPipes in real-world use cases (e.g. automated tests for version updates)
- [ ] Add e2e tests for the new Python client to ensure its integration with the main system and its functionalities ([#774 | https://github.com/apache/streampipes/issues/774]])
- [ ] Document the testing infrastructure and the testing approach to allow for easy maintenance and future contributions.
❗ ***Important Note*** ❗
Do not create any account on behalf of Apache StreamPipes in Cypress or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.
Relevant Skills
- Familiarity with testing frameworks, such as Cypress or Selenium
- Experience with TypeScript or Java
- Basic knowledge of Angular is helpful
- Familiarity with Docker and containerization is a plus
Learning Material
References
You can find our corresponding issue on GitHub here
Name and Contact Information
Name: Philipp Zehnder
email: zehnder[at]apache.org
community: dev[at]streampipes.apache.org
website: https://streampipes.apache.org
/docs/docsuser-guide-introduction.html)Reference
Github issue can be found here: https://github.com/apache/streampipes/issues/1390
Name and contact information
- Mentor: Dominik Riemer (riemer[at]apache.org).
- Mailing list: (dev[at]streampipes.apache.org)
- Website: streampipes.apache.org
OPC-UA browser for Apache StreamPipes
Apache StreamPipes
Apache StreamPipes (incubating) is a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams. StreamPipes offers several modules including StreamPipes Connect to easily connect data from industrial IoT sources, the Pipeline Editor to quickly create processing pipelines and several visualization modules for live and historic data exploration. Under the hood, StreamPipes utilizes an event-driven microservice paradigm of standalone, so-called analytics microservices making the system easy to extend for individual needs.
Background
StreamPipes has is grown significantly over the past few years, with new features and contributors joining the project. However, as the project continues to evolve, e2e test coverage must also be improved to ensure that all features remain functional. Modern frameworks, such as Cypress, make it quite easy and fun to automatically test even complex application functionalities. As StreamPipes approaches its 1.0 release, it is important to improve e2e testing to ensure the robustness of the project and its use in real-world scenarios.
Tasks
- [ ] Write e2e tests using Cypress to cover most functionalities and user interface components of StreamPipes.
- [ ] Add more complex testing scenarios to ensure the reliability and robustness of StreamPipes in real-world use cases (e.g. automated tests for version updates)
- [ ] Add e2e tests for the new Python client to ensure its integration with the main system and its functionalities ([#774 | https://github.com/apache/streampipes/issues/774]])
- [ ] Document the testing infrastructure and the testing approach to allow for easy maintenance and future contributions.
❗ ***Important Note*** ❗
Do not create any account on behalf of Apache StreamPipes in Cypress or using the name of Apache StreamPipes for any account creation. Your mentor will take care of it.
Relevant Skills
- Familiarity with testing frameworks, such as Cypress or Selenium
- Experience with TypeScript or Java
- Basic knowledge of Angular is helpful
- Familiarity with Docker and containerization is a plus
Learning Material
References
You can find our corresponding issue on GitHub here
Name and Contact Information
Name: Philipp Zehnder
email: zehnder[at]apache.org
community: dev[at]streampipes.apache.org
website: https://streampipes.apache.org/
RocketMQ
throughout recent years. We were able to introduce a lot of new features and attracted both users and contributors. Putting the cherry on the cake, we were graduated as an Apache top level project in December 2022. We will of course continue developing new features and never rest to make StreamPipes even more amazing.
StreamPipes really shines when connecting Industrial IoT data. Such data sources typically originate from machine controllers, called PLCs (e.g., Siemens S7). But there are also new protocols such as OPC-UA which allow to browse available data within the controller. Our goal is to make connectivity of industrial data sources a matter of minutes.
Currently, data sources can be connected using the built-in module `StreamPipes Connect` from the UI. We provide a set of adapters for popular protocols that can be customized, e.g., connection details can be added.
To make it even easier to connect industrial data sources with StreamPipes, we plan to add an OPC-UA browser. This will be part of the entry page of StreamPipes connect and should allow users to enter connection details of an existing OPC-UA server. Afterwards, a new view in the UI shows available data nodes from the server, their status and current value. Users should be able to select values that should be part of a new adapter. Afterwards, a new adapter can be created by reusing the current workflow to create an OPC-UA data source.
This is a really cool project for participants interested in full-stack development who would like to get a deeper understanding of industrial IoT protocols. Have fun!
Tasks
- [ ] get familiar with the OPC-UA protocol
- [ ] develop mockups which demonstrate the user workflow
- [ ] develop a data model for discovering data from OPC-UA
- [ ] create the backend business logic for the OPC-UA browser
- [ ] create the frontend views to asynchronously browse data and to create a new adapter
- [ ] write Junit, Component and E2E tests
- [ ] what ever comes in your mind 💡 further ideas are always welcome
Relevant Skills
- interest in Industrial IoT and procotols such as OPC-UA
- Java development skills
- Angular/Typescript development skills
Anyways, the most important relevant skill is motivation and readiness to learn during the project!
Learning Material
- StreamPipes documentation (https://streampipes.apache.org/docs/docs/user-guide-introduction.html)
- [ur current OPC-UA adapter (https://github.com/apache/streampipes/tree/dev/streampipes-extensions/streampipes-connect-adapters-iiot/src/main/java/org/apache/streampipes/connect/iiot/adapters/opcua)
- Eclipse Milo, which we currently use for OPC-UA connectivity (https://github.com/eclipse/milo)
- Apache PLC4X, which has an API for browsing (https://plc4x.apache.org/)
Reference
Github issue can be found here: https://github.com/apache/streampipes/issues/1390
Name and contact information
- Mentor: Dominik Riemer (riemer[at]apache.org).
- Mailing list: (dev[at]streampipes.apache.org)
- Website: streampipes.apache.org
RocketMQ
GSoC Implement python client for RocketMQ 5.0
Apache RocketMQ
Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Page: https://rocketmq.apache.org
Background
RocketMQ 5.
GSoC Implement python client for RocketMQ 5.0
Apache RocketMQ
Apache RocketMQ is a distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
Page: https://rocketmq.apache.org
Background
RocketMQ 5.0 has released various language clients including Java, CPP, and Golang, to cover all major programming languages, a Python client needs to be implemented.
Related Repo: https://github.com/apache/rocketmq-clients
Task
The developer is required to be familiar with the Java implementation and capable of developing a Python client, while ensuring consistent functionality and semantics.
Relevant Skills
Python language
Basic knowledge of RocketMQ 5.0
Mentor
Yangkun Ai, PMC of Apache RocketMQ, aaronai@apache.org
...
[GSOC] [SkyWalking] Python Agent Performance Enhancement Plan
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about enhancing Python agent performance, the tracking issue can be seen here -< here -< https://github.com/apache/skywalking/issues/10408
[GSOC] [SkyWalking] Pending Task on K8s
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about a pending task on K8s.
ShenYu
Apache ShenYu Gsoc 2023 - Design and implement shenyu ingress-controller in k8s
Background
Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance. Currently, ShenYu has good usability and performance in microservice scenarios. However, ShenYu's support for Kubernetes is still relatively weak.
Tasks
1. Discuss with mentors, and complete the requirements design and technical design of shenyu-ingress-controller.
2. Complete the initial version of shenyu-ingress-controller, implement the reconcile of k8s ingress api, and make ShenYu as the ingress gateway of k8s.
3. Complete the ci test of shenyu-ingress-controller, verify the correctness of the code.
Relevant Skills
1. Know the use of Apache ShenYu
2. Familiar with Java and Golang
3. Familiar with Kubernetes and can use java or golang to develop Kubernetes Controller
Description
Issues : https://github.com/apache/shenyu/issues/4438
website : https://shenyu.apache.org/
EventMesh
Apache EventMesh EventMesh official website dos by version and demo show
Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.
Website: https://eventmesh.apache.org
GitHub: https://github.com/apache/incubator-eventmesh
Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3327
Background
We hope that the community can contribute to the maintenance of documents, including the archiving of Chinese and English content of documents of different release versions, the maintenance of official website documents, the improvement of project quick start documents, feature introduction, etc.
Task
1.Discuss with the mentors what you need to do
2. Learn the details of the Apache EventMesh project
3. Improve and supplement the content of documents on GitHub, maintain official website documents, record eventmesh quick user experience, and feature display videos
Recommended Skills
1.Familiar with MarkDown
2. Familiar with Java\Go
Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.org
Mike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache.orgapache/skywalking/issues/10408
[GSOC] [SkyWalking] Pending Task on K8s
Apache SkyWalking is an application performance monitor tool for distributed systems, especially designed for microservices, cloud native and container-based (Kubernetes) architectures. This task is about a pending task on K8s.
ShenYu
Apache EventMesh Integrate eventmesh runtime on Kubernetes
Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.
Website: https://eventmesh.apache.org
GitHub: https://github.com/apache/incubator-eventmesh
Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3327
Background
Currently, EventMesh has good usability
Apache ShenYu Gsoc 2023 - Design and implement shenyu ingress-controller in k8s
Background
Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance. Currently, ShenYu has good usability and performance in microservice scenarios. However, ShenYuEventMesh's support for Kubernetes is still relatively weak.
Tasks
1. Discuss with mentors, and complete the requirements design and technical design of shenyu-ingress-controller.
2. Complete the initial version of shenyu-ingress-controller, implement the reconcile of k8s ingress api, and make ShenYu as the ingress gateway of k8s.
3. Complete the ci test of shenyu-ingress-controller, verify the correctness of the code.
Relevant Skills
1. Know the use of Apache ShenYu
2. Familiar with Java and Golang
3. Familiar with Kubernetes and can use java or golang to develop Kubernetes Controller
Description
We hope the community can contribute EventMesh integration with the k8s.
Task
1.Discuss with the mentors your implementation idea
2. Learn the details of the Apache EventMesh project
3. Integrate EventMesh with the k8s
Recommended Skills
1.Familiar with Java
2.Familiar with Kubernetes
Mentor
Eason Chen, PPMC of Apache EventMesh, Issues : https://github.com/apache/shenyu/issuesqqeasonchen, chenguangsheng@apache.org
Mike Xue, PPMC of Apache EventMesh, /4438
website : https://shenyu.apachegithub.com/xwm1992, mikexue@apache.org/
EventMesh
Doris
[GSoC][Doris]Page Cache Improvement
Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page
Apache EventMesh EventMesh official website dos by version and demo show
Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.
Website: https://eventmeshdoris.apache.org
GitHubGithub: https://github.com/apache/incubator-eventmesh
Upstream Issue: https://github.com/apache/incubator-eventmesh/issues/3327
Background
We hope that the community can contribute to the maintenance of documents, including the archiving of Chinese and English content of documents of different release versions, the maintenance of official website documents, the improvement of project quick start documents, feature introduction, etc.
Task
1.Discuss with the mentors what you need to do
2. Learn the details of the Apache EventMesh project
3. Improve and supplement the content of documents on GitHub, maintain official website documents, record eventmesh quick user experience, and feature display videos
Recommended Skills
1.Familiar with MarkDown
2. Familiar with Java\Go
Background
Apache Doris accelerates high-concurrency queries utilizing page cache, where the decompressed data is stored.
Currently, the page cache in Apache Doris uses a simple LRU algorithm, which reveals a few problems:
- Hot data will be phased out in large queries
- The page cache configuration is immutable and does not support GC.
Task
- Phase One: Identify the impacts on queries when the decompressed data is stored in memory and SSD, respectively, and then determine whether full page cache is required.
- Phase Two: Improve the cache strategy for Apache Doris based on the results from Phase One.
Learning Material
Page: https://doris.apache.org
Github: Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.orgMike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache/apache/doris
Mentor
- Mentor: Yongqiang Yang, Apache Doris PMC member & Committer, yangyongqiang@apache.org
- Mentor: Haopeng Li, Apache Doris PMC member & Committer, lihaopeng@apache.org
- Mailing List: dev@doris.apache.org
[GSoC][Doris]Dictionary Encoding Acceleration
Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page
Apache EventMesh Integrate eventmesh runtime on Kubernetes
Apache EventMesh (incubating)
Apache EventMesh is a fully serverless platform used to build distributed event-driven applications.
Website: https://eventmesh.apache.org
GitHub: https://githubdoris.apache.org
Githubcom/apache/incubator-eventmeshUpstream Issue: https://github.com/apache/incubator-eventmesh/issues/3327doris
Background
Currently, EventMesh has good usability in microservice scenarios. However, EventMesh's support for Kubernetes is still relatively weak.We hope the community can contribute EventMesh integration with the k8s.
Task
1.Discuss with the mentors your implementation idea
2. Learn the details of the Apache EventMesh project
3. Integrate EventMesh with the k8s
Recommended Skills
1.Familiar with Java
2.Familiar with Kubernetes
In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example.
Task
- Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries.
- Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case.
Learning Material
Page: https://doris.apache.org
Github: Mentor
Eason Chen, PPMC of Apache EventMesh, https://github.com/qqeasonchen, chenguangsheng@apache.orgMike Xue, PPMC of Apache EventMesh, https://github.com/xwm1992, mikexue@apache/apache/doris
Mentor
- Mentor: Chen Zhang, Apache Doris Committer, zhangechen@apache.org
- Mentor: Zhijing Lu, Apache Doris Committer, luzhijing@apache.org
- Mailing List: dev@doris.apache.org
Commons Statistics
...
Dubbo GSoC 2023 - Refactor Connection
Background
At present, the abstraction of connection by client in different protocols in Dubbo is not perfect. For example, there is a big discrepancy between the client abstraction of connection in dubbo and triple protocols. As a result, the enhancement of connection-related functions in the client is more complicated, and the implementation cannot be reused. At the same time, the client also needs to implement a lot of repetitive code when extending the protocol.
Target
Reduce the complexity of the client part when extending the protocol, and increase the reuse of connection-related modules.
Dubbo GSoC 2023 - IDL management
Background
Dubbo currently supports protobuf as a serialization method. Protobuf relies on proto (Idl) for code generation, but currently lacks tools for managing Idl files. For example, for java users, proto files are used for each compilation. It is more troublesome, and everyone is used to using jar packages for dependencies.
Target
Implement an Idl management and control platform, support idl files to automatically generate dependency packages in various languages, and push them to relevant dependency warehouses
Dubbo GSoC 2023 - Refactor the http layer
Background
Dubbo currently supports the rest protocol based on http1, and the triple protocol based on http2, but currently the two protocols based on the http protocol are implemented independently, and at the same time, they cannot replace the underlying implementation, and their respective implementation costs are relatively high.
Target
In order to reduce maintenance costs, we hope to be able to abstract http. The underlying implementation of the target implementation of http has nothing to do with the protocol, and we hope that different protocols can reuse related implementations.
...